Using cURL in Python with PycURL

Introduction

In this tutorial, we are going to learn how to use PycURL, which is an interface to the cURL library in Python. cURL is a tool used for transferring data to and from a server and for making various types of data requests. PycURL is great for testing REST APIs, downloading files, and so on. Some developers prefer using Postman for testing APIs but PycURL is another suitable option to do so as it supports multiple protocols like FILE, FTPS, HTTPS, IMAP, POP3, SMTP, SCP, SMB, etc. Moreover, PycURL comes in handy when a lot of concurrent, fast, and reliable connections are required.

As mentioned above, PycURL is an interface to the libcURL library in Python; therefore PycURL inherits all the capabilities of libcURL. PycURL is extremely fast (it is known to be much faster than Requests, which is a Python library for HTTP requests), has multiprotocol support, and also contains sockets for supporting network operations.

Pre-requisites

Before you go ahead with this tutorial, please note that there are a few prerequisites. You should have a basic understanding of Python's syntax, and/or have at least beginner-level programming experience in some other language. Furthermore, you should have a good understanding of common networking concepts like protocols and their types, and the client-server mode of communication. Familiarity with these concepts is essential to understand the PycURL library.

Installation

The installation process for PycURL is fairly simple and straightforward for all operating systems. You just need to have libcURL installed on your system in order to use PycURL.

Mac/Linux OS

For Mac OS and Linux, PycURL installation is the simplest as it has no dependencies, and libcURL is installed by default. Simply run the following command in your terminal and the installation will be completed:

Installation via pip
$ pip install pycurl 
Installation via easy_install
$ easy_install pycurl

Windows OS

For Windows, however, there are a few dependencies that need to be installed before PyCURL can be used in your programs. If you are using an official distribution of Python (i.e. you've downloaded a Python version from the official website https://www.python.org) as well as pip, you simply need to run the following command in your command line and the installation will be done:

$ pip install pycurl

If you are not using pip, EXE and MSI installers are available at PycURL Windows. You can download and install them directly from there, like any other application.

Basic Code Examples

In this section, we are going to cover some PycURL coding examples demonstrating the different functionalities of the interface.

As mentioned in the introduction section, PycURL supports many protocols and has a lot of sophisticated features. However, in our examples, we will be working with the HTTP protocol to test REST APIs using HTTP's most commonly used methods: GET, POST, PUT and DELETE, along with a few other examples. We will write the syntax for declaring them in Python 3, as well as explain what they do.

So lets start!

Example 1: Sending an HTTP GET Request

A simple network operation of PycURL is to retrieve information from a given server using its URL. This is called a GET request as it is used to get a network resource.

A simple GET request can be performed using PycURL by importing the BytesIO module and creating its object. A CURL object is created to transfer data and files over URLs.

The desired URL is set using the setopt() function, which is used as setopt(option, value). The option parameter specifies which option to set, e.g. URL, WRITEDATA, etc., and the value parameter specifies the value given to that particular option.

The data retrieved from the set URL is then written in the form of bytes to the BytesIO object. The bytes are then read from the BytesIO object using the getvalue() function and are subsequently decoded to print the HTML to the console.

Here is an example of how to do this:

import pycurl
from io import BytesIO 

b_obj = BytesIO() 
crl = pycurl.Curl() 

# Set URL value
crl.setopt(crl.URL, 'https://wiki.python.org/moin/BeginnersGuide')

# Write bytes that are utf-8 encoded
crl.setopt(crl.WRITEDATA, b_obj)

# Perform a file transfer 
crl.perform() 

# End curl session
crl.close()

# Get the content stored in the BytesIO object (in byte characters) 
get_body = b_obj.getvalue()

# Decode the bytes stored in get_body to HTML and print the result 
print('Output of GET request:\n%s' % get_body.decode('utf8')) 

Output:

Output of GET request:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv = "Content-Type" content = "text/html; charset = utf-8">
<meta name="robots" content="index,nofollow">

<title>BeginnersGuide - Python Wiki</title>
<script type="text/javascript" src = "/wiki/common/js/common.js" ></script>

<script type = "text/javascript" >
<!--
var search_hint = "Search";
//-->
</script>
.
.
.

Example 2: Examining GET Response Headers

You can also retrieve the response headers of a website with the help of PycURL. Response headers can be examined for several reasons, for example, to find out what encoding has been sent with the response and whether that is according to the encoding provided by the server.

In our example, we'll be examining the response headers simply to find out various attribute names and their corresponding values.

In order to examine the response headers, we first need to extract them, and we do so using the HEADERFUNCTION option and display them using our self-defined function (display_header() in this case).

We provide the URL of the site whose response headers we wish to examine; HEADERFUNCTION sends the response headers to the display_header() function where they are appropriately formatted. The response headers are decoded according to the specified standard and are split into their corresponding names and values. The whitespaces between the names and values are stripped and they are then converted to lowercase.

The response headers are then written to the BytesIO object, are transferred to the requester and are finally displayed in the proper format.

from io import BytesIO
import pycurl

headers = {}

def display_header(header_line):
    header_line = header_line.decode('iso-8859-1')

    # Ignore all lines without a colon
    if ':' not in header_line:
        return

    # Break the header line into header name and value
    h_name, h_value = header_line.split(':', 1)

    # Remove whitespace that may be present
    h_name = h_name.strip()
    h_value = h_value.strip()
    h_name = h_name.lower() # Convert header names to lowercase
    headers[h_name] = h_value # Header name and value.

def main():
    print('**Using PycURL to get Twitter Headers**')
    b_obj = BytesIO()
    crl = pycurl.Curl()
    crl.setopt(crl.URL, 'https://twitter.com')
    crl.setopt(crl.HEADERFUNCTION, display_header)
    crl.setopt(crl.WRITEDATA, b_obj)
    crl.perform()
    print('Header values:-')
    print(headers)
    print('-' * 20)
    
main()

Output:

**Using PycURL to get Twitter Headers**
Header values:-
{'cache-control': 'no-cache, no-store, must-revalidate, pre-check=0, post-check=0', 'content-length': '303055', 'content-type': 'text/html;charset=utf-8', 'date': 'Wed, 23 Oct 2019 13:54:11 GMT', 'expires': 'Tue, 31 Mar 1981 05:00:00 GMT', 'last-modified': 'Wed, 23 Oct 2019 13:54:11 GMT', 'pragma': 'no-cache', 'server': 'tsa_a', 'set-cookie': 'ct0=ec07cd52736f70d5f481369c1d762d56; Max-Age=21600; Expires=Wed, 23 Oct 2019 19:54:11 GMT; Path=/; Domain=.twitter.com; Secure', 'status': '200 OK', 'strict-transport-security': 'max-age=631138519', 'x-connection-hash': 'ae7a9e8961269f00e5bde67a209e515f', 'x-content-type-options': 'nosniff', 'x-frame-options': 'DENY', 'x-response-time': '26', 'x-transaction': '00fc9f4a008dc512', 'x-twitter-response-tags': 'BouncerCompliant', 'x-ua-compatible': 'IE=edge,chrome=1', 'x-xss-protection': '0'}
--------------------

In cases where we have multiple headers with the same name, only the last header value will be stored. To store all values in multi-valued headers, we can use the following piece of code:

if h_name in headers:
    if isinstance(headers[h_name], list):
        headers[name].append(h_value)
    else:
        headers[h_name] = [headers[h_name], h_value]
else:
    headers[h_name] = h_value

Example 3: Sending Form Data via HTTP POST

A POST request is the one that sends data to a web server by enclosing it in the body of the HTTP request. When you upload a file or submit a form, you are basically sending a POST request to the designated server.

A POST request can be performed using PycURL by firstly setting the URL to send the form data to through the setopt function. The data to be submitted is first stored in the form of a dictionary (in key value pairs) and is then URL-encoded using the urlencode function found in the urllib.parse module.

We use the POSTFIELDS option in sending form data as it automatically sets the HTTP request method to POST, and it handles our pf data as well.

from urllib.parse import urlencode
import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://www.code-learner.com/post/')
data = {'field': 'value'}
pf = urlencode(data)

# Sets request method to POST,
# Content-Type header to application/x-www-form-urlencoded
# and data to send in request body.
crl.setopt(crl.POSTFIELDS, pf)
crl.perform()
crl.close()

Note: If you wish to specify another request method, you can use the CUSTOMREQUEST option to do so. Just write the name of the request method of your choice in the empty inverted commas following crl.CUSTOMREQUEST.

crl.setopt(crl.CUSTOMREQUEST, '')

Example 4: Uploading Files with Multipart POST

There are several ways in which you can replicate how a file is uploaded in a HTML form using PycURL:

  1. If the data to be sent via POST request is in a file on your system, you need to firstly set the URL where you wish to send the data. Then you specify your request method as HTTPPOST and use the fileupload option to upload the contents of the desired file.
import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://www.code-learner.com/post/')

crl.setopt(crl.HTTPPOST, [
    ('fileupload', (
        # Upload the contents of the file
        crl.FORM_FILE, './my-resume.doc',
    )),
])
crl.perform()
crl.close()

Note: If you wish to change the name and/or the content type of the file, you can do so by making slight modifications to the above code:

crl.setopt(crl.HTTPPOST, [
    ('fileupload', (
        # Upload the contents of this file
        crl.FORM_FILE, './my-resume.doc',
        # Specify a file name of your choice
        crl.FORM_FILENAME, 'updated-resume.doc',
        # Specify a different content type of upload
        crl.FORM_CONTENTTYPE, 'application/msword',
    )),
])
  1. For file data that you have in memory, all that varies in the implementation of the POST request is the FORM_BUFFER and FORM_BUFFERPTR in place of FORM_FILE as these fetch the data to be posted, directly from memory.
import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://www.code-learner.com/post/')

crl.setopt(crl.HTTPPOST, [
    ('fileupload', (
        crl.FORM_BUFFER, 'contact-info.txt',
        crl.FORM_BUFFERPTR, 'You can reach me at [email protected]',
    )),
])

crl.perform()
crl.close()

Example 5: Uploading a File with HTTP PUT

PUT request is similar in nature to POST request, except for the fact that it can be used to upload a file in the body of the request. You use a PUT request when you know the URL of the object you want to create or overwrite. Basically PUT replaces whatever currently exists at the target URL with something else.

If the desired data to be uploaded is located in a physical file, you first need to set the target URL, then you upload the file and open it. It's important for the file to be kept open while the cURL object is using it. Then the data is read from the file using READDATA.

Finally, the file transfer (upload) is performed using the perform function and the cURL session is then ended. Lastly, the file that was initially opened for the CURL object is closed.

import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://www.code-learner.com/post/')

dat_file = open('data.txt')

crl.setopt(crl.UPLOAD, 1)
crl.setopt(crl.READDATA, dat_file)

crl.perform()
crl.close()
dat_file.close()

If the file data is located in a buffer, the PycURL implementation is pretty much the same as that of uploading data located in a physical file, with slight modifications. The BytesIO object encodes the data using the specified standard. This is because READDATA requires an IO-like object and encoded data is essential for Python 3. That encoded data is stored in a buffer and that buffer is then read. The data upload is carried out and upon completing the upload, the cURL session is ended.

import pycurl
crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://www.code-learner.com/post/')

data = '{"person":{"name":"billy","email":"[email protected]"}}'
buffer = BytesIO(data.encode('utf-8'))

crl.setopt(crl.UPLOAD, 1)
crl.setopt(crl.READDATA, buffer)

crl.perform()
crl.close()

Example 6: Sending an HTTP DELETE Request

Another important and much used HTTP method is DELETE. The DELETE method requests that the server deletes the resource identified by the target URL. It can be implemented using the CUSTOMREQUEST function, as can be seen in the code sample below:

import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, "http://api.example.com/user/148951")
crl.setopt(crl.CUSTOMREQUEST, "DELETE")
crl.perform()
crl.close()

Example 7: Writing to a File

PycURL can also be used to save a response to a file. We use the open function to open the file and response is returned as a file object. The open function is of the form: open(file, mode). The file parameter represents the path and name of the file to be opened and mode represents the mode in which you want to open the file. In our example, it is important to have the file opened in binary mode (i.e. wb) in order to avoid the encoding and the decoding the response.

import pycurl

file = open('pycurl.md','wb')

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://wiki.python.org/moin/BeginnersGuide')
crl.setopt(crl.WRITEDATA, file)
crl.perform()
crl.close()

Conclusion

In this tutorial, we learnt about the PycURL interface in Python. We started off by talking about some of the general functions of PycURL and its relevance with the libcURL library in Python. We then saw the PycURL's installation process for different operating systems.

Lastly, we went through some of PycURL's general examples which demonstrated the various functionalities offered by PycURL, like the HTTP GET, POST, PUT, and DELETE methods. After following this tutorial, you should be able to fetch objects identified by a URL within a Python program with ease.