Don Quixote (Don Quijote), probably the most famous Spanish text of all times, is available for free as an audiobook (in Spanish), thanks to a cultural project from the Government of Aragón (Spain). Unfortunately, someone decided that the best distribution method was to provide the users with 126 links (one MP3 file for each chapter). Wow!
Some time ago I tried to learn some Ruby, so I wrote a small script to download the files using wget (a popular command-line download manager). My solution was not very smart, as probably wget or other tools such as curl on their own were more than enough to handle the downloads using ranges or something, but my objective was to practice with Ruby.
After downloading the files, my MP3 player didn’t like the accents and whitespaces in the file names, so I wrote another small script to rename them to something more reasonable using regular expressions.
I mixed both scripts into one (quijote.rb script). Please note that the only goal of this implementation is to practice with different aspects of Ruby (you could more easily use wget {1..n} ranges, for example). You obviously need Ruby and wget to run it:
#Download all files from website
("01".."52").each {|i| `wget http://www.aularagon.org/files/espa/elquijote/p1/Parte%201%20Cap%C3%ADtulo-#{i}.mp3`}
("01".."74").each {|i| `wget http://www.aularagon.org/files/espa/elquijote/p2/Parte%202%20Cap%C3%ADtulo-#{i}.mp3`}
#Rename files to remove accents and whitespaces
Dir.foreach(".") do |f|
unless (m = f.match(/Parte\s(.*)\sCap.+tulo-(.*)\.mp3/)).nil?
File.rename(f.to_s, "P" + m[1] + "-Capitulo-" + m[2] + ".mp3")
end
end
However, notice that there’s still another stupid problem with the files once you’ve downloaded them. If you check the ID3 tags, you’ll find that the track name and album have been swapped! All chapters (tracks) are named “Don Quijote de la Mancha”, each belonging to a different Album (the chapter number)!
Track name: Don Quijote de la Mancha
Album: Cap‚tulo 01
This avoids song managers and MP3 players to catalog the files correctly, so I still need to complete the script to fix the ID3 tags of the files. Please be patient, it’s coming… (UPDATE: You can find it here).
Note: Some time after writing the script I discovered this post. In the comments there are some alternatives to download the files. Mainly, using cURL or the wonderful DownThemAll! Firefox extension.
Post a Comment