Juno MP3 Scraper
Posted: April 29th, 2003 | 14 Comments »Wrong coast, and dammit. While everyone else is having fun , resuscitation I’ve been in Florida, men’s health having spent Passover with family in a big golf resort in Palm Beach. I shouldn’t be complaining as it was really lovely and relaxing and the food was great and I got to play wise science-explaining uncle to my eight-year-old cousin and I gained a kilo or nine. But, as I am currently explaining to Gilbert in an AIM window, I would much rather have been camping in a tent in a backyard in San Jose. (I’m trying to keep up with the notes, but as usual, Webb is typing faster than I can read.)
If you’re wondering what it’s like taking part in a Jewish festival, let me give you a sample of the one before the one just gone: Purim. More specifically, a sample of what happens if you spend the Purim meal at my friends the Goldbergs, receiving wave after wave of spielers. These are bunches of kids who run around the Jewish neighbourhoods on Purim night, performing songs and sketches and collecting for charities. The Goldbergs live right in the heart of Golders Green, which is the most Jewish place in Britain, so the spielers were, at points, queueing three acts deep at the door.
Anyway, for the three-minute, digested version, see Purim 2003 in Golders Green: What You Missed. (That file is being shared using Open Content Network to help ease bandwidth use, but mainly just to prove to myself that it’s easy. If you have problems or just can’t be bothered with all the faffing and needing Java Web Start, try this link instead) Also, the photos are here.
(This post would have been made on Friday while I was still in Miami, but MT wasn’t playing dice – turns out to be some weird Trackback autodiscovery bug that made MT hang when certain links were included. Arse.)
I first saw the name “D.A. Barham” in 1994: she was one of the first customers at Delphi UK, pharm the tiny offshoot of the then-huge US online service, and I was doing phone support in my spare time from university. She was funny, smart and wild, making a big difference to the otherwise-dull forums. After that, I started seeing her name in lots of other places: not just online, but in the credits of almost every topical comedy show on TV. (I’m sure that nearly every person in Britain has laughed at one of her gags) In 1997, I was being shown around the offices of my new job when she recognised me and pulled me out into the hallway for a chat – we ended up working on the same game. The last time I saw her in person was when we went for a drink in Soho two years ago. The last time I heard from her was when she commented on one of my blog entries a mere three months ago. (I meant to mail her. Really, I did. Idiot.)
I had no idea she was dead until this morning, when I found Bruce Hyman’s send-off in the Guardian. It will be a few days until the stunned feeling wears off. She was too talented and way too fucking young. I have various memories of her illness, but I’d rather not dwell on them. (I know it’d piss her off.) On the illness itself: the most shocking Google query I’ve ever run.
UPDATE (30/04): Cover story of today’s Guardian G2 section.
Juno publish pages with short clips of all their new records. Stef asked for a way to grab them all at once, read with their ID3 tags set, information pills so he could wander off with them on his iPod, sildenafil then come back and buy all the ones he likes. Okay, Stef: here it is. You need the LWP and MP3::Tag modules installed. Edit the script to include your favourite genres from Juno’s range, then run it – it’ll download all the files to the directory the script is in. The main problem is that extensive use will completely hammer Juno’s server, so the next step for this is for someone to stick it on a cronjob and make the zipped-up collections available over BitTorrent or OCN.
A quick note on obtaining and installing modules for Perl: The easiest way of doing it on Unix or Mac OS X systems is the cpan command, which locates, downloads and installs modules automatically, given the name of the module you want. (It doesn’t get much simpler than that.) The equivalent for ActivePerl (which most Windows users run) is the PPM command. Both commands are briefly explained here. To download LWP, you should look for Bundle::LWP. However, if you’re running on Windows, an extra caveat: the PPM repositories don’t seem to have MP3::Tag. Fortunately, installing it manually is simple: Download this tarball, locate your Perlsitelib folder, make an MP3 folder within it, then drag the Tag.pm and Tag subfolder from the tarball into the new MP3 folder. Done!
A word of warning from someone who never quite “got” modules… last time I tried running cpan on my Mac OS X (a couple of weeks back) it spent an hour or two upgrading Perl. I was lucky. Apparently this can completely break your ability to start the machine…
Yow. This used to be a regular pain with older versions of perl, but I haven’t had it since 5.6. I thought Jagwire was 5.8, anyway?
I’m with phil on this one; for installs older than 5.6.x, if you try to get any recent modules or bundles without explicitly specifying the version number, it’ll get the highest version. This is fine and dandy, but if the bundle or module marks a newer version of perl as a prerequisite, then it’ll download the *entire* perl5.whatever, and attempt to install it. Yick.
On my OS X machine:
$ /usr/bin/perl –version
This is perl, v5.6.0 built for darwin
The cpan program (and the CPAN module that powers it, as opposed to CPAN the archive) is something of a relic, and the version with perl 5.6.0 (as default- still!- on Mac OS X) is even more crufty.
Handy hints for the first-time Mac OS X cpan user, then:
* configuration is time-consuming. Stick at it. If you see the question about following prerequisites, answer ‘ask’ not ‘follow’; this should prevent it trying to upgrade the whole of perl without you knowing.
* try and have wget or ncftpget available. They’re better at downloading things than plain ftp.
* the first thing you should install is ‘CPAN’, to upgrade it. Do *not* install Bundle::CPAN; this has more prerequisites, some of which trigger upgrading perl. You’ll probably have to reconfigure it, sadly. You could download the distribution manually to avoid the duplication, though: http://search.cpan.org/author/ANDK/CPAN-1.70/
* If you can get LWP to install, do so. It’s a somewhat hairy module, but it’ll help cpan no end.
* Consider CPANPLUS (whose executable is, confusingly, cpanp) instead. It’s a reengineered version of CPAN.pm, which aims to provide a more robust, extensible architecture for fetching modules. http://search.cpan.org/author/KANE/CPANPLUS-0.042/
Hope this helps.
wow. two bug fixes/essential enhancements:
1. stick “from juno.co.uk” in the album TAG so that you can group them all together.
2. prefix each track name/artist with ‘z ‘ – there might be a less clumsy way to do this (what sorts post z, in ASCII?), but you definitely want these kept away from your normal track/artist listings.
as I’ve just discovered to my cost
re: jaguar and 5.8, I heard from Apple that they royally screwed up there, and released with 5.6 by mistake. The guy in charge apparently got shouted at a bit. Nevertheless, something is rather weird with os x perl – try installing some XML modules and see what I mean. (oh, and any XML modules you do get running will need to be reinstalled after CPAN does it’s 5.8 upgradey badness.)
and whatever you do, when you install Bundle::LWP, DO NOT allow get to be aliased. OSX is not case sensitive, so lwp get overwrites GET, (or the other way round, I forget)
AND, upgrading to 5.8 when you have fink installed is just a world of trouble
I’ve made fix #1 that Stef suggested: each file is now album-tagged as “juno.co.uk”. Haven’t done a fix for the second one: it’s easier if you just give the junoscraper its own folder. Plus, I’ve now added a log feature – every time the scraper runs, it writes each downloaded filename to junolog.txt. It also checks this file for any filename it’s about to download, so it can skip files it’s done in the past. (So it should be much quicker to run every few days)
you didn’t do number 2, but as http://www.asciitable.com/ so helpfully points out ~ is ASCII 126, and so sorts nicely to the end of the all the playlists.
seriously, this is important 🙂
except it doesn’t. damnit apple and their clevercleverness.
use z inztead.
I really don’t understand what you want here. I mean, if I take your request literally, all you need is this line just before the tagging bit:
$f->{artist} = “z ” . $f->{artist};
… but I find it hard to believe that there is no better way of getting iTunes to separate one chunk of MP3s from another chunk of MP3s.
ok. you’d better come and take a look at my ipod then.
you can browse lists by album, OR song title, or Artist name. the ‘juno.co.uk’ works fine for gathering together all the juno tracks, when I’m in sample-listening mode, but when I’m listen-to-the-normal-music mode, my artist list is about trebled in size, most of which are 45 second clips. My artist browsing experience is sub-optimal, all of a sudden
If every juno track had z prepended to the song and artist name, the tracks would be at the end of both lists, and my ‘pod would still be usable. geddit?
was also thinking it might be good to stick the filename in the comments tag
Oh I seeeeee. Okay. Hmmm. Is it working okay with that extra line?
which line? this would be so much easier by email.
if you mean the juno.co.uk one, then it’s fine