Accessing the Twitter API with Python

Introduction

One thing that Python developers enjoy is surely the huge number of resources developed by its big community. Python-built application programming interfaces (APIs) are a common thing for web sites. It's hard to imagine that any popular web service will not have created a Python API library to facilitate the access to its services. A few ideas of such APIs for some of the most popular web services could be found here. In fact, "Python wrapper" is a more correct term than "Python API", because a web API would usually provide a general application programming interface, while programming language-specific libraries create code to "wrap" around it into easy to use functions. Anyway, we'll use both terms interchangeably throughout this article.

In this blog post we concentrate on the Twitter API, show how the setting up of your credentials goes with Twitter, and compare a few Python wrappers based on the community engagement. Then we show a few examples of using the Twitter API for searching tweets, and creating a stream of realtime tweets on a particular subject. Finally, we'll explore the saved data.

An Overview of the Twitter API

There are many APIs on the Twitter platform that software developers can engage with, with the ultimate possibility to create fully automated systems which will interact with Twitter. While this feature could benefit companies by drawing insights from Twitter data, it's also suitable for smaller-scale projects, research, and fun. Here are a few of the most notable APIs provided by Twitter:

  • Tweets: searching, posting, filtering, engagement, streaming etc.
  • Ads: campaign and audience management, analytics.
  • Direct messages (still in Beta): sending and receiving, direct replies, welcome messages etc.
  • Accounts and users (Beta): account management, user interactions.
  • Media: uploading and accessing photos, videos and animated GIFs.
  • Trends: trending topics in a given location.
  • Geo: information about known places or places near a location.

There are many more possibilities with the Twitter APIs, which are not included in this list. Twitter is also constantly expanding its range of services by adding new APIs from time to time, and updating existing ones.

Getting Credentials

Before using the Twitter API, you first need a Twitter account, and to have obtained some credentials. The process of getting credentials could change with time, but currently it is as follows:

  • Visit the Application Management page at https://apps.twitter.com/, and sign in with your Twitter account
  • Click on the "Create New App" button, fill in the details and agree the Terms of Service
  • Navigate to "Keys and Access Tokens" section and take a note of your Consumer Key and Secret
  • In the same section click on "Create my access token" button
  • Take note of your Access Token and Access Token Secret

And that's all. The consumer key/secret is used to authenticate the app that is using the Twitter API, while the access token/secret authenticates the user. All of these parameters should be treated as passwords, and should not be included in your code in plain text. One suitable way is to store them in a JSON file "twitter_credentials.json" and load these values from your code when needed.

import json

# Enter your keys/secrets as strings in the following fields
credentials = {}  
credentials['CONSUMER_KEY'] = ...  
credentials['CONSUMER_SECRET'] = ...  
credentials['ACCESS_TOKEN'] = ...  
credentials['ACCESS_SECRET'] = ...

# Save the credentials object to file
with open("twitter_credentials.json", "w") as file:  
    json.dump(credentials, file)

Python Wrappers

Python is one of the programming languages with the biggest number of developed wrappers for Twitter API. Therefore, it's hard to compare them if you haven't used each of them for some time. Possibly a good way to choose the right tool is to dig into their documentation and look at the possibilities they offer, and how they fit with the specifics of your app. In this part, we'll compare the various API wrappers using the engagement of the Python community in their GitHub projects. A few suitable metrics for comparison would be: number of contributors, number of received stars, number of watchers, library's maturity in timespan since first release etc.

Table 1: Python libraries for Twitter API ordered by number of received stars.

Library # contributors # stars # watchers Maturity
tweepy 135 4732 249 ~ 8.5 years
Python Twitter Tools 60 2057 158 ~ 7 years
python-twitter 109 2009 148 ~ 5 years
twython 73 1461 100 NA
TwitterAPI 15 424 49 ~ 4.5 years
TwitterSearch 8 241 29 ~ 4.5 years

The above table listed some of the most popular Python libraries for the Twitter API. Now let's use one of them to search through tweets, get some data, and explore.

Twython Examples

We've selected the twython library because of its diverse features aligned with different Twitter APIs, its maturity - although there's no information when its first release was published, there's information that version 2.6.0 appeared around 5 years ago, and its support for streaming tweets. In our first example we'll use the Search API to search tweets containing the string "learn python", and later on we'll show a more realistic example using Twitter's Streaming API.

Search API

In this example we'll create a query for the Search API with a search keyword "learn python", which would return the most popular public tweets in the past 7 days. Note that since our keyword is composed of two words, "learn" and "python", they both need to appear in the text of the tweet, and not necessarily as a continuous phrase. First, let's install the library. The easiest way is using pip, but other options are also listed in the installation docs.

$ pip install twython

In the next step, we'll import the Twython class, instantiate an object of it, and create our search query. We'll use only four arguments in the query: q, result_type, count and lang, respectively for the search keyword, type, count, and language of results. Twitter also defines other arguments to fine-tune the search query, which can be found here.

# Import the Twython class
from twython import Twython  
import json

# Load credentials from json file
with open("twitter_credentials.json", "r") as file:  
    creds = json.load(file)

# Instantiate an object
python_tweets = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'])

# Create our query
query = {'q': 'learn python',  
        'result_type': 'popular',
        'count': 10,
        'lang': 'en',
        }

Finally we can use our Twython object to call the search method, which returns a dictionary of search_metadata and statuses - the queried results. We'll only look at the statuses part, and save a portion of all information in a pandas dataframe, to present it in a table.

import pandas as pd

# Search tweets
dict_ = {'user': [], 'date': [], 'text': [], 'favorite_count': []}  
for status in python_tweets.search(**query)['statuses']:  
    dict_['user'].append(status['user']['screen_name'])
    dict_['date'].append(status['created_at'])
    dict_['text'].append(status['text'])
    dict_['favorite_count'].append(status['favorite_count'])

# Structure data in a pandas DataFrame for easier manipulation
df = pd.DataFrame(dict_)  
df.sort_values(by='favorite_count', inplace=True, ascending=False)  
df.head(5)  
date favorite_count text user
1 Fri Jan 12 21:50:03 +0000 2018 137 2017 was the Year of Python. We set out to lea... Codecademy
3 Mon Jan 08 23:01:40 +0000 2018 137 Step-by-Step Guide to Learn #Python for #DataS... KirkDBorne
4 Mon Jan 08 11:13:02 +0000 2018 109 Resetter is a new tool written in Python and p... linuxfoundation
8 Sat Jan 06 16:30:06 +0000 2018 96 We're proud to announce that this week we have... DataCamp
2 Sun Jan 07 19:00:36 +0000 2018 94 Learn programming in Python with the Python by... humble

So we got some interesting tweets. Note that these are the most popular tweets containing the words "learn" and "python" in the past 7 days. To explore data back in history, you'll need to purchase the Premium or Enterprise plan of the Search API.

Streaming API

While the previous example showed a one-off search, a more interesting case would be to collect a stream of tweets. This is done using the Twitter Streaming API, and Twython has an easy way to do it through the TwythonStreamer class. We'll need to define a class MyStreamer that inherits TwythonStreamer and then override the on_success and on_error methods, as follows.

The on_success method is called automatically when twitter sends us data, while the on_error whenever a problem occurs with the API (most commonly due to constraints of the Twitter APIs). The added method save_to_csv is a useful way to store tweets to file.

Similar to the previous example, we won't save all the data in a tweet, but only the fields we are interested in, such as: hashtags used, user name, user's location, and the text of the tweet itself. There's a lot of interesting information in a tweet, so feel free to experiment with it. Note that we'll store the tweet location as present on the user's profile, which might not correspond to the current or real location of the user sending the tweet. This is because only a small portion of Twitter users provide their current location - usually in the coordinates key of the tweet data.

from twython import TwythonStreamer  
import csv

# Filter out unwanted data
def process_tweet(tweet):  
    d = {}
    d['hashtags'] = [hashtag['text'] for hashtag in tweet['entities']['hashtags']]
    d['text'] = tweet['text']
    d['user'] = tweet['user']['screen_name']
    d['user_loc'] = tweet['user']['location']
    return d


# Create a class that inherits TwythonStreamer
class MyStreamer(TwythonStreamer):     

    # Received data
    def on_success(self, data):

        # Only collect tweets in English
        if data['lang'] == 'en':
            tweet_data = process_tweet(data)
            self.save_to_csv(tweet_data)

    # Problem with the API
    def on_error(self, status_code, data):
        print(status_code, data)
        self.disconnect()

    # Save each tweet to csv file
    def save_to_csv(self, tweet):
        with open(r'saved_tweets.csv', 'a') as file:
            writer = csv.writer(file)
            writer.writerow(list(tweet.values()))

The next thing to do is instantiate an object of the MyStreamer class with our credentials passed as arguments, and we'll use the filter method to only collect tweets we're interested in. We'll create our filter with the track argument which provides the filter keywords, in our case "python". Besides the track argument, there are more possibilities to fine-tune your filter, listed in the basic streaming parameters, such as: collecting tweets from selected users, languages, locations etc. The paid versions of the Streaming API would provide much more filtering options.

# Instantiate from our streaming class
stream = MyStreamer(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'],  
                    creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])
# Start the stream
stream.statuses.filter(track='python')  

With the code above, we collected data for around 10,000 tweets containing the keyword "python". In the next part, we'll do a brief analysis of the included hashtags and user locations.

Brief Data Analysis

The Twitter API is a powerful thing, very suitable for researching the public opinion, market analysis, quick access to news, and other use-cases your creativity can support. A common thing to do, after you've carefully collected your tweets, is to analyse the data, where sentiment analysis plays a crucial role in systematically extracting subjective information from text. Anyway, sentiment analysis a huge field to be addressed in a small portion of a blog post, so in this part we'll only do some basic data analysis regarding the location and hashtags used by people tweeting "python".

Please note that the point of these examples is just to show what the Twitter API data could be used for - our small sample of tweets should not be used in inferring conclusions, because it's not a good representative of the whole population of tweets, nor its collection times were independent and uniform.

First let's import our data from the "saved_tweets.csv" file and print out a few rows.

import pandas as pd  
tweets = pd.read_csv("saved_tweets.csv")  
tweets.head()  
hashtags text user location
0 ['IBM'] RT @freschesolution: Join us TOMORROW with @OC... rbrownpa NaN
1 [] pylocus 1.0.1: Localization Package https://t.... pypi_updates2 NaN
2 [] humilis-push-processor 0.0.10: Humilis push ev... pypi_updates2 NaN
3 ['Python', 'python', 'postgresql'] #Python Digest is out! https://t.co/LEmyR3yDMh... horstwilmes Zürich
4 ['NeuralNetworks', 'Python', 'KDN'] RT @kdnuggets: A Beginners Guide to #NeuralNet... giodegas L'Aquila, ITALY

What are the most common hashtags that go with our keyword "python"? Since all the data in our DataFrame are represented as strings including brackets in the hashtags column, to get a list of hashtags we'll need to go from a list of strings, to a list of lists, to a list of hashtags. Then we'll use the Counter class to count the hashtags entries in our list, and print a sorted list of 20 most common hashtags.

from collections import Counter  
import ast

tweets = pd.read_csv("saved_tweets.csv")

# Extract hashtags and put them in a list
list_hashtag_strings = [entry for entry in tweets.hashtags]  
list_hashtag_lists = ast.literal_eval(','.join(list_hashtag_strings))  
hashtag_list = [ht.lower() for list_ in list_hashtag_lists for ht in list_]

# Count most common hashtags
counter_hashtags = Counter(hashtag_list)  
counter_hashtags.most_common(20)  
[('python', 1337),
 ('datascience', 218),
 ('bigdata', 140),
 ('machinelearning', 128),
 ('deeplearning', 107),
 ('django', 93),
 ('java', 76),
 ('ai', 76),
 ('coding', 68),
 ('100daysofcode', 65),
 ('javascript', 64),
 ('iot', 58),
 ('rstats', 52),
 ('business', 52),
 ('tech', 48),
 ('ruby', 45),
 ('programming', 43),
 ('cybersecurity', 43),
 ('angularjs', 41),
 ('pythonbot_', 41)]

Next, we can use the user location to answer - which areas of the world tweet most about "python"? For this step, we'll use the geocode method of the geopy library which returns the coordinates of a given input location. To visualise a world heatmap of tweets, we'll use the gmplot library. A reminder: our small data is not a real representative of the world.

from geopy.geocoders import Nominatim  
import gmplot

geolocator = Nominatim()

# Go through all tweets and add locations to 'coordinates' dictionary
coordinates = {'latitude': [], 'longitude': []}  
for count, user_loc in enumerate(tweets.location):  
    try:
        location = geolocator.geocode(user_loc)

        # If coordinates are found for location
        if location:
            coordinates['latitude'].append(location.latitude)
            coordinates['longitude'].append(location.longitude)

    # If too many connection requests
    except:
        pass

# Instantiate and center a GoogleMapPlotter object to show our map
gmap = gmplot.GoogleMapPlotter(30, 0, 3)

# Insert points on the map passing a list of latitudes and longitudes
gmap.heatmap(coordinates['latitude'], coordinates['longitude'], radius=20)

# Save the map to html file
gmap.draw("python_heatmap.html")  

The above code produced the heatmap in the following figure, showing a higher activity in "python" tweets in US, UK, Nigeria and India. One downside of the described approach is that we didn't do any data cleaning; there turned out to be many machine generated tweets coming from a single location, or multiple locations producing one same tweet. Of course these samples should be discarded, to get more realistic picture of the geographical distribution of humans tweeting "python". A second improvement would simply be to collect more data over longer and uninterrupted periods.

Tweet heatmap

Conclusions

In this blog post we presented a pretty modest part of the Twitter API. Overall, Twitter is a very powerful tool for understanding the public opinion, doing research and market analysis, and therefore its APIs are a great way for businesses to create automated tools for drawing insights related to their scope of work. Not only businesses, but individuals could also use the APIs for building creative apps.

We also listed a few of the most popular Python wrappers, but it's important to note that different wrappers implement different possibilities of the Twitter APIs. Therefore one should choose a Python wrapper according to its purpose. The two examples we showed with the Search and Streaming APIs, briefly described the process of collecting tweets, and some of the possible insights they could draw. Feel free to create ones yourself!

References

Author image
A research PhD student and coding enthusiast working in Data Science and Machine Learning.