Kroko Just another WordPress weblog

December 29, 2008

Use wget or curl to download from RapidShare Premium

Filed under: Linux — Tags: , , — admin @ 12:03 am

The last days I needed to download a bunch of medical videos which have been uploaded to RapidShare by many other people. Although RapidShare (and all the other 1-click file-hosting services) is very convenient, it has some strict rules for free accounts, for example a guest has to wait for 120 seconds per 1 MB of downloaded data and – to make it worse – no download managers are allowed. Since “waiting” is not a game I like and since I intended to use either wget or curl to download the files, I decided to sign up for a RapidShare Premium account and then figure out how to use the aforementioned tools. Fortunately, registered users are permitted to use download managers and, as you will read in the following article, the Linux command line downloaders work flawlessly with a Premier account.

 

Theory

Rapidshare uses cookie-based authentication. This means that every time you log into the service, a cookie containing information which identifies you as a registered user is stored in your browser’s cookie cache. Both wget and curl support saving and loading cookies, so before using them to download any files, you should save such a cookie. Having done this, then the only required action in order download from RapidShare is to load the cookie, so that wget or curl can use it to authenticate you on the RapidShare server. This is pretty much the same you would do with a graphical download manager. The difference now is that you do it on the command line.

Below you will find examples about how to perform these actions using both wget and curl.

IMPORTANT: Please note that in order to use these command-line utilities or any other download managers with RapidShare, you will have to check the Direct Downloads option in your account’s options page.

Save your RapidShare Premium Account Cookie

Saving your RapidShare cookie is a procedure that needs to be done once.

The login page is located at:

https://ssl.rapidshare.com/cgi-bin/premiumzone.cgi

The login form requires two fields: login and password. These are pretty self-explanatory.

In the following examples, the RapidShare username is shown as USERNAME and the password as PASSWORD.

Using wget

In order to save your cookie using wget, run the following:

wget \
    --save-cookies ~/.cookies/rapidshare \
    --post-data "login=USERNAME&password=PASSWORD" \
    -O - \
    https://ssl.rapidshare.com/cgi-bin/premiumzone.cgi \
    > /dev/null

–save-cookies : Saves the cookie to a file called rapidshare under the ~/.cookiesdirectory (let’s assume that you store your cookies there)
–post-data : is the POST payload of the request. In other words it contains the data you would enter in the login form.
-O – : downloads the HTML data to the standard output. Since the above command is run only in order to obtain the cookie, this option prints the HTML data to stdout (Standard Output) and then discards it by redirecting stdout to /dev/null. If you don’t do this, wget will save the HTML data in a file called premiumzone.cgi in the current directory. This is just the Rapidshare HTML page, which is absolutely not needed.

Using curl

In order to save your cookie using curl, run the following:

curl \
    --cookie-jar ~/.cookies/rapidshare \
    --data "login=USERNAME&password=PASSWORD" \
    https://ssl.rapidshare.com/cgi-bin/premiumzone.cgi \
    > /dev/null

–cookie-jar : Saves the cookie to a file called rapidshare under the ~/.cookies directory (it has been assumed previously that cookies are stored there)
–data : contains the data you would enter in the login form.
Curl prints the downloaded page data to stdout by default. This is discarded by sending it to/dev/null.

Download files using your RapidShare Premium Account Cookie

Having saved your cookie, downloading files from RapidShare is as easy as telling wget/curl to load the cookie everytime you use them to download a file.

Downloading with wget

In order to download a file with wget, run the following:

wget -c --load-cookies ~/.cookies/rapidshare <URL>

-c : this is used in order to resume downloading of the file if it already exists in the current directory and is incomplete.
–load-cookies : loads your cookie.

Downloading with curl

In the same manner, in order to download a file with curl, run the following:

curl -L -O --cookie ~/.cookies/rapidshare <URL>

-L : Follows all redirections until the final destination page is found. This switch is almost always required as curl won’t follow redirects by default (read about how to check the server http headers with curl).
-O : By using this switch you instruct curl to save the downloaded data to a file in the current directory. The filename of the remote file is used. This switch is also required or else curl will print the data to stdout, which is something you won’t probably like.
–cookie : loads your Rapidshare account’s cookie.

Setting up a Download Server

Although most users would be satisfied with the above, I wouldn’t be surprised if you would want to go a bit further and try to setup a little service for your downloading pleasure. Here is a very primitive implementation of such a service. All you will need is standard command line tools.

This primitive server consists of the following:

  1. named pipe, called “dlbasket“. You will feed the server with URLs through this pipe. Another approach would be to use a listening TCP socket with NetCat.
  2. A script, which, among others, contains the main server loop. This loop reads one URL at a time from dlbasket and starts a wget/curl process in order to download the file. If dlbasket is empty, the server should just stay there waiting.

So, in short, the service would be the following:

cat <> dlbasket | ( while ... done )

All credit for the “cat <> dlbasket |” magic goes to Zart, who kindly helped me out at the #fedora IRC channel.

So, let’s create that service. The following assume that a user named “downloader” exists in the system and the home directory is /var/lib/downloader/. Of course you can set this up as you like, but make sure you adjust the following commands and the script’s configuration options accordingly.

First, create the named pipe:

mkfifo -m 0700 /var/lib/downloader/dlbasket

If it does not exist, create a bin directory in the user’s home:

mkdir -p /var/lib/downloader/bin

Also, create a directory where the downloaded files will be saved:

mkdir -p /var/lib/downloader/downloads

The following is a quick and dirty script I wrote which actually implements the service. Save it asrsgetd.sh inside the user’s bin directory:

#! /usr/bin/env bash  

#  rsgetd.sh - Download Service

#  Version 0.2

#  Copyright (C) 2007 George Notaras (http://www.g-loaded.eu/)
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License version 2 as
#  published by the Free Software Foundation.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.

# Special thanks to 'Zart' from the #fedora channel on FreeNode

# CONFIG START
HOMEDIR="/var/lib/downloader"
DLBASKET="$HOMEDIR/dlbasket"
DLDIR="$HOMEDIR/downloads/"
LOGFILE="$HOMEDIR/.downloads_log"
CACHEFILE="$HOMEDIR/.downloads_cache"
LIMIT="25k"
WGETBIN="/usr/bin/wget"
# Rapidshare Login Cookie
RSCOOKIE="$HOMEDIR/cookies/.rapidshare"
# CONFIG END

DATETIME="`date '+%Y-%m-%d %H:%M:%S'`"

cat <> $DLBASKET | (
        while read url ; do
                # First, check the cache if the file has been already downloaded
                if [ -f "$CACHEFILE" -a -n $(grep -i $(basename $url) "$CACHEFILE") ] ; then
                       echo "$DATETIME File exists in cache. Already downloaded - Skipping: $url" >> $LOGFILE
                else
                        echo "$DATETIME Starting with rate $LIMIT/s: $url" >> $LOGFILE
                        if [ $(expr match "$url" '[rapidshare.com]') = 1 ] ; then
                                # If it is a Rapidshare.com link, load the RS cookie
                                echo "RAPIDSHARE LINK"
                                $WGETBIN -c --limit-rate=$LIMIT --directory-prefix=$DLDIR --load-cookies $RSCOOKIE $url
                        else
                                $WGETBIN -c --limit-rate=$LIMIT --directory-prefix=$DLDIR $url
                        fi
                        echo "$DATETIME Finished: $url" >> $LOGFILE
                        echo $url >> $CACHEFILE
                fi
        done )

exit 0

As you might have already noticed, two extra files are created inside the home directory:.downloads_cache and .downloads_log. The first contains a list of all the urls that have been downloaded. Each new download is checked against this list, so that the particular URL is not processed if the file has already been downloaded. The latter file is a usual logfile stating the start and end times of each download. Feel free to adjust the script to your needs.

Here is some info about how you should start the service:

-1- You can simply start the script as a background process and then feed URLs to it. For example:

rsgetd.sh &
echo "<URL>" > /var/lib/downloader/dlbasket

-2- Use screen in order to run the script in the background but still be able to see its output by connecting to a screen session. Although this is not a screen howto, here is an example:

Create a new screen session and attach to it:

screen -S rs_downloads

While being in the session, run rsgetd.sh

rsgetd.sh

From another terminal feed the download basket (dlbasket) with urls:

echo "<URL>" > /var/lib/downloader/dlbasket
cat url_list.txt > /var/lib/downloader/dlbasket

Watch the files in the screen window as they are being downloaded.

Detach from the screen session by hitting the following:

Ctrl-a   d

Re-attach to the session by running:

screen -r

Note that you do not need to be attached to the screen session in order to add URLs.

Feeding the basket with URLs remotely

Assuming that a SSH server is running on the machine that runs rsgetd.sh, you can feed URLs to it by running the following from a remote machine:

ssh downloader@server.example.org cat \> /var/lib/downloader/dlbasket

Note that the > needs to be escaped so that it is considered as part of the command that will be executed on the remote server.

Now, feel free to add as many URLs as you like. After you hit the [Enter] key the url will be added to the download queue. When you are finished, just press Ctrl-D to end the URL submission.

Conclusion

This article provides all the information you need in order to use wget or curl to download files from your RapidShare Premium account. Also, information on how to set up a service that will assist you in order to commence downloads on your home server from a remote location has been covered.

The same information applies in all cases that wget and curl need to be used with websites that use cookie-based authentication.

The Use wget or curl to download from RapidShare Premium by George Notaras, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License. Terms and conditions beyond the scope of this license may be available at www.g-loaded.eu.

Powered by WordPress