Quantcast
Viewing all articles
Browse latest Browse all 12

Download script for springerlink.com Ebooks

After a long period of silence I present you the following bash script for downloading books from http://springerlink.com. This is not a way to circumvent their login mechanisms, you will need proper rights to download books. But many students in Germany get free access to those ebooks via their universities. I for example study at the FU Berlin and put the script in my Zedat home folder and start the download process via SSH from home. Afterwards I download the tarball to my home system.

Read on for the script. Download the script (attached below), push it to your Zedat account, make it executable and run it. You’ll have to give it a link to a book-detail page like this one for example. Also take a look at the example call at the top of the script.

Requires bash, wget, iconv, egrep.

Note: Take a look at the comments, Faro has come up with an updated Bash script which properly handles ebooks which span multiple pages on SpringerLink and merges the pdf-files with pdftk. Thanks Faro!

Note: For those, who’d prefer a Python version over a Bash-version, take a look at my second attempt on a download script. The Bash version is abandoned. Long live the Python version!

  1. #!/bin/bash
  2.  
  3. if[["$1" == ""]]; then
  4. echo"Usage: $0 \"http://springerlink.com/content/.../?p=...\""
  5. exit1
  6. fi
  7.  
  8. target=$1
  9.  
  10. # get whole page
  11. echo-n"Please wait, link source is being downloaded..."
  12. page=$(wget-q-O - "$target")
  13. echo"ok - done"
  14.  
  15. echo-n"Validating link source..."
  16.  
  17. # get title of page
  18. title_line=$(echo"$page"2>/dev/null | grep-n-m1'<h2 class="MPReader_Profiles_SpringerLink_Content_PrimitiveHeadingControlName">' | egrep-o"^[[:digit:]]+")
  19. if[["$title_line" == ""]]; then
  20. echo"invalid URL"
  21. exit1
  22. fi
  23. l=0
  24. title=""
  25. whileread line; do
  26. if[["$l" == "$title_line"]]; then
  27. title=$(echo"$line" | egrep-o"[[:alnum:]].+[[:alnum:]]" | iconv -f"UTF-8"-t"ASCII//TRANSLIT")
  28. break
  29. fi;
  30. l=$(expr$l + 1)
  31. done<<(echo"$page")
  32. if[["$title" == ""]]; then
  33. echo"invalid URL"
  34. exit1
  35. fi
  36. echo"ok - done"
  37.  
  38. # check type
  39. type=$(echo"$page" | grep-o'<span id="ctl00_PageHeadingLabel".*</span>' | grep-o'>.*<' | egrep-o'[^<>]+')
  40.  
  41. if[["$type" == "Book Chapter"]]; then
  42. echo"will download book chapter '$title'"
  43. echo
  44.  
  45. wget-O"$title.pdf""$(dirname $target)/fulltext.pdf"
  46. elif[[$type == "Book"]]; then
  47. echo"will download book '$title'"
  48. echo
  49.  
  50. mkdir"$title"2>/dev/null
  51. cd"$title" || exit1
  52.  
  53. # get links
  54. declare-a links;
  55. key=0
  56. whilereadlink; do
  57. links[${key}]=$link
  58. key=$(expr$key + 1)
  59. done<<(echo"$page" | grep'/fulltext.pdf"><img' | egrep-o'href="[^"]+' | cut-c7-)
  60.  
  61. # get front + back matter
  62. wget-O"0-front-matter.pdf""$(dirname $target)/front-matter.pdf"
  63. wget-O"$((${#links[@]}+1))-back-matter.pdf""$(dirname $target)/back-matter.pdf"
  64.  
  65. # get chapters
  66. key=0
  67. whileread chapter; do
  68. echo"$(($key+1)) - $chapter :: ${links[${key}]}"
  69. chapter=$(echo$chapter | iconv -f"UTF-8"-t"ASCII//TRANSLIT")
  70. wget-O"$(($key+1))-$chapter"".pdf""http://springerlink.com/${links[${key}]}"
  71. key=$(expr$key + 1)
  72. done<<(echo"$page" | egrep-o'^[[:blank:]]*<a href="/content/[^>]+&amp;pi=[[:digit:]]+">[^>]+</a>' | \
  73. egrep-o'>[^<]+' | cut-c2-)
  74.  
  75. cd ..
  76. tar-cvjf"$title.tar.bz2""$title"
  77. rm"$title"/*.pdf
  78. rmdir"$title"
  79. else
  80. echo"unknown link type '$type'"
  81. fi

Update 01/09/09: - The script now includes chapter numbers in the file names - The script can now handle links to single book chapters - minor other cleanup

Update 02/20/09: - fixed types

Update 02/24/09: - rewrite script in Python

AttachmentSize
springer_download.sh2.33 KB

Viewing all articles
Browse latest Browse all 12

Trending Articles