Download Files with Python

Downloading files from different online resources is one of the most important and common programming tasks to perform on the web. The importance of file downloading can be highlighted by the fact that a huge number of successful applications allow users to download files. Here are just a few web application functions that require downloading files:

  • File sharing
  • Data mining
  • Retrieving website code (CSS, JS, etc)
  • Social media

These are just a few of the applications that come to mind, but I'm sure you can think of many more. In this article we will take a look at some of the most popular ways you can download files with Python.

Using the urllib.request Module

The urllib.request module is used to open or download a file over HTTP. Specifically, the urlretrieve method of this module is what we'll use for actually retrieving the file.

To use this method, you need to pass two arguments to the urlretrieve method: The first argument is the URL of the resource that you want to retrieve, and the second argument is the local file path where you want to store the downloaded file.

Let's take a look at the following example:

import urllib.request

print('Beginning file download with urllib2...')

url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'  
urllib.request.urlretrieve(url, '/Users/scott/Downloads/cat.jpg')  

In the above code, we first import the urllib.request module. Next we create a variable url that contains the path of the file to be downloaded. Finally, we call the urlretrieve method and pass it the url variable as the first argument, "/Users/scott/Downloads/cat.jpg" as second parameter for the file's destination. Keep in mind that you can pass any filename as the second parameter and that is the location and name that your file will have, assuming you have the correct permissions.

Run the above script and go to your "Downloads" directory. You should see your downloaded file named "cat.jpg".

Note: This urllib.request.urlretrieve is considered a "legacy interface" in Python 3, and it may be deprecated at some point in the future. Because of this, I wouldn't recommend using it in favor of one of the methods below. We've included it here due to is popularity in Python 2.

Using the urllib2 Module

Another way to download files in Python is via the urllib2 module. The urlopen method of the urllib2 module returns an object that contains file data. To read the contents of

Note that in Python 3, urllib2 was merged in to urllib as urllib.request and urllib.error. Therefore, this script works only in Python 2.

import urllib2

filedata = urllib2.urlopen('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg')  
datatowrite = filedata.read()

with open('/Users/scott/Downloads/cat2.jpg', 'wb') as f:  
    f.write(datatowrite)

The open method accepts two parameters, the path to the local file and the mode in which data will be written. Here "wb" states that the open method should have permission to write binary data to the given file.

Execute the above script and go to your "Downloads" directory. You should see the downloaded pdf document as "cat2.jpg"

Using the request Module

You can also download files using requests module. The get method of the requests module is used to download the file contents in binary format. You can then use the open method to open a file on your system, just like we did with the previous method, urllib2.urlopen.

Take a look at the following script:

import requests

print('Beginning file download with requests')

url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'  
r = requests.get(url)

with open('/Users/scott/Downloads/cat3.jpg', 'wb') as f:  
    f.write(r.content)

# Retrieve HTTP meta-data
print(r.status_code)  
print(r.headers['content-type'])  
print(r.encoding)  

In the above script, the open method is used once again to write binary data to local file. If you execute the above script and go to your "Downloads" directory, you should see your newly downloaded JPG file named "cat3.jpg".

With the requests module, you can also easily retrieve relevant meta-data about your request, including the status code, headers and much more. In the above script, you can see how we access some of this meta-data.

The same goes for extra parameters that are required on the HTTP GET request. If you need to add customer headers, for example, all you need to do is create a dict with your headers and pass it to your get request:

headers = {'user-agent': 'test-app/0.0.1'}  
r = requests.get(url, headers=headers)  

There are a ton more options and features to this library, so check out their great user guide for more info on how to use it.

Using the wget Module

One of the simplest way to download files in Python is via wget module, which doesn't require you to open the destination file. The download method of the wget module downloads files in just one line. The method accepts two parameters: the URL path of the file to download and local path where the file is to be stored.

import wget

print('Beginning file download with wget module')

url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'  
wget.download(url, '/Users/scott/Downloads/cat4.jpg')  

Execute the above script and go to your "Downloads" directory. Here you should see your newly downloaded "cat4.jpg" file.

Conclusion

In this article we presented four of the most commonly used methods to download files in Python. Personally, I prefer to use the request module for downloading files due to its combination of simplicity and power. However, your project may have constraints preventing you from using 3rd party libraries, in which case I'd use the urllib2 module (for Python 2) or the urllib.request module (for Python 3).

Which library do you prefer and why? Let us know in the comments!