After a long period of silence I present you the following bash script for downloading books from http://springerlink.com. This is not a way to circumvent their login mechanisms, you will need proper rights to download books. But many students in Germany get free access to those ebooks via their universities. I for example study at the FU Berlin and put the script in my Zedat home folder and start the download process via SSH from home. Afterwards I download the tarball to my home system.
Read on for the script. Download the script (attached below), push it to your Zedat account, make it executable and run it. You’ll have to give it a link to a book-detail page like this one for example. Also take a look at the example call at the top of the script.
Requires bash, wget, iconv, egrep.
Note: Take a look at the comments, Faro has come up with an updated Bash script which properly handles ebooks which span multiple pages on SpringerLink and merges the pdf-files with pdftk. Thanks Faro!
Note: For those, who’d prefer a Python version over a Bash-version, take a look at my second attempt on a download script. The Bash version is abandoned. Long live the Python version!
#!/bin/bash if[["$1" == ""]]; then echo"Usage: $0 \"http://springerlink.com/content/.../?p=...\"" exit1 fi target=$1 # get whole page echo-n"Please wait, link source is being downloaded..." page=$(wget-q-O - "$target") echo"ok - done" echo-n"Validating link source..." # get title of page title_line=$(echo"$page"2>/dev/null | grep-n-m1'<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">' | egrep-o"^[[:digit:]]+") if[["$title_line" == ""]]; then echo"invalid URL" exit1 fi l=0 title="" whileread line; do if[["$l" == "$title_line"]]; then title=$(echo"$line" | egrep-o"[[:alnum:]].+[[:alnum:]]" | iconv -f"UTF-8"-t"ASCII//TRANSLIT") break fi; l=$(expr$l + 1) done<<(echo"$page") if[["$title" == ""]]; then echo"invalid URL" exit1 fi echo"ok - done" # check type type=$(echo"$page" | grep-o'<span id="ctl00_PageHeadingLabel".*</span>' | grep-o'>.*<' | egrep-o'[^<>]+') if[["$type" == "Book Chapter"]]; then echo"will download book chapter '$title'" echo wget-O"$title.pdf""$(dirname $target)/fulltext.pdf" elif[[$type == "Book"]]; then echo"will download book '$title'" echo mkdir"$title"2>/dev/null cd"$title" || exit1 # get links declare-a links; key=0 whilereadlink; do links[${key}]=$link key=$(expr$key + 1) done<<(echo"$page" | grep'/fulltext.pdf"><img' | egrep-o'href="[^"]+' | cut-c7-) # get front + back matter wget-O"0-front-matter.pdf""$(dirname $target)/front-matter.pdf" wget-O"$((${#links[@]}+1))-back-matter.pdf""$(dirname $target)/back-matter.pdf" # get chapters key=0 whileread chapter; do echo"$(($key+1)) - $chapter :: ${links[${key}]}" chapter=$(echo$chapter | iconv -f"UTF-8"-t"ASCII//TRANSLIT") wget-O"$(($key+1))-$chapter"".pdf""http://springerlink.com/${links[${key}]}" key=$(expr$key + 1) done<<(echo"$page" | egrep-o'^[[:blank:]]*<a href="/content/[^>]+&pi=[[:digit:]]+">[^>]+</a>' | \ egrep-o'>[^<]+' | cut-c2-) cd .. tar-cvjf"$title.tar.bz2""$title" rm"$title"/*.pdf rmdir"$title" else echo"unknown link type '$type'" fi
Update 01/09/09: - The script now includes chapter numbers in the file names - The script can now handle links to single book chapters - minor other cleanup
Update 02/20/09: - fixed types
Update 02/24/09: - rewrite script in Python
Attachment | Size |
---|---|
springer_download.sh | 2.33 KB |