Downloading files from different online resources is one of the most important and common programming tasks to perform on the web. The importance of file downloading can be highlighted by the fact that a huge number of successful applications allow users to download files. Here are just a few web application functions that require downloading files:
- File sharing
- Data mining
- Retrieving website code (CSS, JS, etc)
- Social media
These are just a few of the applications that come to mind, but I'm sure you can think of many more. In this article we will take a look at some of the most popular ways you can download files with Python.
Using the urllib.request Module
The urllib.request module is used to open or download a file over HTTP. Specifically, the urlretrieve
method of this module is what we'll use for actually retrieving the file.
To use this method, you need to pass two arguments to the urlretrieve
method: The first argument is the URL of the resource that you want to retrieve, and the second argument is the local file path where you want to store the downloaded file.
Let's take a look at the following example:
import urllib.request
print('Beginning file download with urllib2...')
url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'
urllib.request.urlretrieve(url, '/Users/scott/Downloads/cat.jpg')
In the above code, we first import the urllib.request
module. Next we create a variable url
that contains the path of the file to be downloaded. Finally, we call the urlretrieve
method and pass it the url
variable as the first argument, "/Users/scott/Downloads/cat.jpg" as second parameter for the file's destination. Keep in mind that you can pass any filename as the second parameter and that is the location and name that your file will have, assuming you have the correct permissions.
Run the above script and go to your "Downloads" directory. You should see your downloaded file named "cat.jpg".
Note: This urllib.request.urlretrieve
is considered a "legacy interface" in Python 3, and it may be deprecated at some point in the future. Because of this, I wouldn't recommend using it in favor of one of the methods below. We've included it here due to is popularity in Python 2.
Using the urllib2 Module
Another way to download files in Python is via the urllib2 module. The urlopen
method of the urllib2 module returns an object that contains file data. To read the contents of
Note that in Python 3, urllib2
was merged in to urllib
as urllib.request
and urllib.error
. Therefore, this script works only in Python 2.
import urllib2
filedata = urllib2.urlopen('http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg')
datatowrite = filedata.read()
with open('/Users/scott/Downloads/cat2.jpg', 'wb') as f:
f.write(datatowrite)
The open
method accepts two parameters, the path to the local file and the mode in which data will be written. Here "wb" states that the open
method should have permission to write binary data to the given file.
Execute the above script and go to your "Downloads" directory. You should see the downloaded pdf document as "cat2.jpg"
Using the request Module
You can also download files using requests module. The get
method of the requests
module is used to download the file contents in binary format. You can then use the open
method to open a file on your system, just like we did with the previous method, urllib2.urlopen
.
Take a look at the following script:
import requests
print('Beginning file download with requests')
url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'
r = requests.get(url)
with open('/Users/scott/Downloads/cat3.jpg', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
In the above script, the open
method is used once again to write binary data to local file. If you execute the above script and go to your "Downloads" directory, you should see your newly downloaded JPG file named "cat3.jpg".
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
With the requests
module, you can also easily retrieve relevant meta-data about your request, including the status code, headers and much more. In the above script, you can see how we access some of this meta-data.
The same goes for extra parameters that are required on the HTTP GET request. If you need to add customer headers, for example, all you need to do is create a dict
with your headers and pass it to your get
request:
headers = {'user-agent': 'test-app/0.0.1'}
r = requests.get(url, headers=headers)
There are a ton more options and features to this library, so check out their great user guide for more info on how to use it.
Using the wget Module
One of the simplest way to download files in Python is via wget module, which doesn't require you to open the destination file. The download
method of the wget
module downloads files in just one line. The method accepts two parameters: the URL path of the file to download and local path where the file is to be stored.
import wget
print('Beginning file download with wget module')
url = 'http://i3.ytimg.com/vi/J---aiyznGQ/mqdefault.jpg'
wget.download(url, '/Users/scott/Downloads/cat4.jpg')
Execute the above script and go to your "Downloads" directory. Here you should see your newly downloaded "cat4.jpg" file.
Conclusion
In this article we presented four of the most commonly used methods to download files in Python. Personally, I prefer to use the request
module for downloading files due to its combination of simplicity and power. However, your project may have constraints preventing you from using 3rd party libraries, in which case I'd use the urllib2
module (for Python 2) or the urllib.request
module (for Python 3).
Which library do you prefer and why? Let us know in the comments!