Seaborn Scatter Plot - Tutorial and Examples

Introduction

Seaborn is one of the most widely used data visualization libraries in Python, as an extension to Matplotlib. It offers a simple, intuitive, yet highly customizable API for data visualization.

In this tutorial, we'll take a look at how to plot a scatter plot in Seaborn. We'll cover simple scatter plots, multiple scatter plots with FacetGrid as well as 3D scatter plots.

Import Data

We'll use the World Happiness dataset, and compare the Happiness Score against varying features to see what influences perceived happiness in the world:

import pandas as pd

df = pd.read_csv('worldHappiness2016.csv')

Plot a Scatter Plot in Seaborn

Now, with the dataset loaded, let's import PyPlot, which we'll use to show the graph, as well as Seaborn. We'll plot the Happiness Score against the country's Economy (GDP per Capita):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('worldHappiness2016.csv')

sns.scatterplot(data = df, x = "Economy (GDP per Capita)", y = "Happiness Score")

plt.show()

Seaborn makes it really easy to plot basic graphs like scatter plots. We don't need to fiddle with the Figure object, Axes instances or set anything up, although, we can if we want to. Here, we've supplied the df as the data argument, and provided the features we want to visualize as the x and y arguments.

These have to match the data present in the dataset and the default labels will be their names. We'll customize this in a later section.

Now, if we run this code, we're greeted with:

seaborn simple scatter plot tutorial

Here, there's a strong positive correlation between the economy (GDP per capita) and the perceived happiness of the inhabitants of a country/region.

Plotting Multiple Scatter Plots in Seaborn with FacetGrid

If you'd like to compare more than one variable against another, such as - the average life expectancy, as well as the happiness score against the economy, or any variation of this, there's no need to create a 3D plot for this.

While 2D plots that visualize correlations between more than two variables exist, some of them aren't fully beginner friendly.

Seaborn allows us to construct a FacetGrid object, which we can use to facet the data and construct multiple, related plots, one next to the other.

Let's take a look at how to do that:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

df = pd.read_csv('worldHappiness2016.csv')

grid = sns.FacetGrid(df, col = "Region", hue = "Region", col_wrap=5)
grid.map(sns.scatterplot, "Economy (GDP per Capita)", "Health (Life Expectancy)")

grid.add_legend()

plt.show()

seaborn facetgrid multiple scatter plots tutorial

Here, we've created a FacetGrid, passing our data (df) to it. By specifying the col argument as "Region", we've told Seaborn that we'd like to facet the data into regions and plot a scatter plot for each region in the dataset.

We've also assigned the hue to depend on the region, so each region has a different color. Finally, we've set the col_wrap argument to 5 so that the entire figure isn't too wide - it breaks on every 5 columns into a new row.

To this grid object, we map() our arguments. Specifically, we specified a sns.scatterplot as the type of plot we'd like, as well as the x and y variables we want to plot in these scatter plots.

This results in 10 different scatter plots, each with the related x and y data, separated by region.

We've also added a legend in the end, to help identify the colors.

Plotting a 3D Scatter Plot in Seaborn

Seaborn doesn't come with any built-in 3D functionality, unfortunately. It's an extension of Matplotlib and relies on it for the heavy lifting in 3D. Though, we can style the 3D Matplotlib plot, using Seaborn.

Let's set the style using Seaborn, and visualize a 3D scatter plot between happiness, economy and health:

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

df = pd.read_csv('2016.csv')
sns.set(style = "darkgrid")

fig = plt.figure()
ax = fig.add_subplot(111, projection = '3d')

x = df['Happiness Score']
y = df['Economy (GDP per Capita)']
z = df['Health (Life Expectancy)']

ax.set_xlabel("Happiness")
ax.set_ylabel("Economy")
ax.set_zlabel("Health")

ax.scatter(x, y, z)

plt.show()

Running this code results in an interactive 3D visualization that we can pan and inspect in three-dimensional space, styled as a Seaborn plot:

seaborn 3d scatter plot

Customizing Scatter Plots in Seaborn

Using Seaborn, it's easy to customize various elements of the plots you make. For example, you can set the hue and size of each marker on a scatter plot.

Let's change some of the options and see how the plot looks like when altered:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('2016.csv')

sns.scatterplot(data = df, x = "Economy (GDP per Capita)", y = "Happiness Score", hue = "Region", size = "Freedom")

plt.show()

Here, we've set the hue to Region which means that data from different regions will have different colors. Also, we've set the size to be proportional to the Freedom feature. The higher the freedom factor is, the larger the dots are:

seaborn customizing scatter plot

Or you can set a fixed size for all markers, as well as a color:

sns.scatterplot(data = df, x = "Economy (GDP per Capita)", y = "Happiness Score", hue = "red", size = 5)

Conclusion

In this tutorial, we've gone over several ways to plot a scatter plot using Seaborn and Python.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.