Seaborn Line Plot - Tutorial and Examples

Seaborn Line Plot - Tutorial and Examples

Introduction

Seaborn is one of the most widely used data visualization libraries in Python, as an extension to Matplotlib. It offers a simple, intuitive, yet highly customizable API for data visualization.

In this tutorial, we'll take a look at how to plot a Line Plot in Seaborn - one of the most basic types of plots.

Line Plots display numerical values on one axis, and categorical values on the other.

They can typically be used in much the same way Bar Plots can be used, though, they're more commonly used to keep track of changes over time.

Plot a Line Plot with Seaborn

Let's start out with the most basic form of populating data for a Line Plot, by providing a couple of lists for the X-axis and Y-axis to the lineplot() function:

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="darkgrid")

x = [1, 2, 3, 4, 5]
y = [1, 5, 4, 7, 4]

sns.lineplot(x, y)
plt.show()

Here, we have two lists of values, x and y. The x list acts as our categorical variable list, while the y list acts as the numerical variable list.

This code results in:

To that end, we can use other data types, such as strings for the categorical axis:

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="darkgrid")

x = ['day 1', 'day 2', 'day 3']
y = [1, 5, 4]

sns.lineplot(x, y)
plt.show()

And this would result in:

Note: If you're using integers as your categorical list, such as [1, 2, 3, 4, 5], but then proceed to go to 100, all values between 5..100 will be null:

import seaborn as sns

sns.set_theme(style="darkgrid")

x = [1, 2, 3, 4, 5, 10, 100]
y = [1, 5, 4, 7, 4, 5, 6]

sns.lineplot(x, y)
plt.show()

This is because a dataset might simply be missing numerical values on the X-axis. In that case, Seaborn simply lets us assume that those values are missing and plots away. However, when you work with strings, this won't be the case:

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme(style="darkgrid")

x = ['day 1', 'day 2', 'day 3', 'day 100']
y = [1, 5, 4, 5]

sns.lineplot(x, y)
plt.show()

However, more typically, we don't work with simple, hand-made lists like this. We work with data imported from larger datasets or pulled directly from databases. Let's import a dataset and work with it instead.

Import Data

Let's use the Hotel Bookings dataset and use the data from there:

import pandas as pd
df = pd.read_csv('hotel_bookings.csv')
print(df.head())

Let's take a look at the columns of this dataset:

          hotel  is_canceled reservation_status  ... arrival_date_month  stays_in_week_nights
0  Resort Hotel            0          Check-Out  ...               July                     0
1  Resort Hotel            0          Check-Out  ...               July                     0
2  Resort Hotel            0          Check-Out  ...               July                     1
3  Resort Hotel            0          Check-Out  ...               July                     1
4  Resort Hotel            0          Check-Out  ...               July                     2

This is a truncated view, since there are a lot of columns in this dataset. For example, let's explore this dataset, by using the arrival_date_month as our categorical X-axis, while we use the stays_in_week_nights as our numerical Y-axis:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

sns.set_theme(style="darkgrid")

df = pd.read_csv('hotel_bookings.csv')

sns.lineplot(x = "arrival_date_month", y = "stays_in_week_nights", data = df)
plt.show()

We've used Pandas to read in the CSV data and pack it into a DataFrame. Then, we can assign the x and y arguments of the lineplot() function as the names of the columns in that dataframe. Of course, we'll have to specify which dataset we're working with by assigning the dataframe to the data argument.

Now, this results in:

We can clearly see that weeknight stays tend to be longer during the months of June, July and August (summer vacation), while they're the lowest in January and February, right after the chain of holidays leading up to New Year.

Additionally, you can see the confidence interval as the area around the line itself, which is the estimated central tendency of our data. Since we have multiple y values for each x value (many people stayed in each month), Seaborn calculates the central tendency of these records and plots that line, as well as a confidence interval for that tendency.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

In general, people stay ~2.8 days on weeknights, in July, but the confidence interval spans from 2.78-2.84.

Plotting Wide-Form Data

Now, let's take a look at how we can plot wide-form data, rather than tidy-form as we've been doing so far. We'll want to visualize the stays_in_week_nights variable over the months, but we'll also want to take the year of that arrival into consideration. This will result in a Line Plot for each year, over the months, on a single figure.

Since the dataset isn't well-suited for this by default, we'll have to do some data preprocessing on it.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('hotel_bookings.csv')

# Truncate
df = df[['arrival_date_year', 'arrival_date_month', 'stays_in_week_nights']]
# Save the order of the arrival months
order = df['arrival_date_month']
# Pivot the table to turn it into wide-form
df_wide = df.pivot_table(index='arrival_date_month', columns='arrival_date_year', values='stays_in_week_nights')
# Reindex the DataFrame with the `order` variable to keep the same order of months as before
df_wide = df_wide.reindex(order, axis=0)

print(df_wide)

Here, we've firstly truncated the dataset to a few relevant columns. Then, we've saved the order of arrival date months so we can preserve it for later. You can put in any order here, though.

Then, to turn the narrow-form data into a wide-form, we've pivoted the table around the arrival_date_month feature, turning arrival_date_year into columns, and stays_in_week_nights into values. Finally, we've used reindex() to enforce the same order of arrival months as we had before.

Let's take a look at how our dataset looks like now:

arrival_date_year       2015      2016      2017
arrival_date_month
July                2.789625  2.836177  2.787502
July                2.789625  2.836177  2.787502
July                2.789625  2.836177  2.787502
July                2.789625  2.836177  2.787502
July                2.789625  2.836177  2.787502
...                      ...       ...       ...
August              2.654153  2.859964  2.956142
August              2.654153  2.859964  2.956142
August              2.654153  2.859964  2.956142
August              2.654153  2.859964  2.956142
August              2.654153  2.859964  2.956142

Great! Our dataset is now correctly formatted for wide-form visualization, with the central tendency of the stays_in_week_nights calculated. Now that we're working with a wide-form dataset, all we have to do to plot it is:

sns.lineplot(data=df_wide)
plt.show()

The lineplot() function can natively recognize wide-form datasets and plots them accordingly. This results in:

Customizing Line Plots with Seaborn

Now that we've explored how to plot manually inserted data, how to plot simple dataset features, as well as manipulate a dataset to conform to a different type of visualization - let's take a look at how we can customize our line plots to provide more easy-to-digest information.

Plotting Line Plot with Hues

Hues can be used to segregate a dataset into multiple individual line plots, based on a feature you'd like them to be grouped (hued) by. For example, we can visualize the central tendency of the stays_in_week_nights feature, over the months, but take the arrival_date_year into consideration as well and group individual line plots based on that feature.

This is exactly what we've done in the previous example - manually. We've converted the dataset into a wide-form dataframe and plotted it. However, we could've grouped the years into hues as well, which would net us the exact same result:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('hotel_bookings.csv')

sns.lineplot(x = "arrival_date_month", y = "stays_in_week_nights", hue='arrival_date_year', data = df)
plt.show()

By setting the arrival_date_year feature as the hue argument, we've told Seaborn to segregate each X-Y mapping by the arrival_date_year feature, so we'll end up with three different line plots:

This time around, we've also got confidence intervals marked around our central tendencies.

Customize Line Plot Confidence Interval with Seaborn

You can fiddle around, enable/disable and change the type of confidence intervals easily using a couple of arguments. The ci argument can be used to specify the size of the interval, and can be set to an integer, 'sd' (standard deviation) or None if you want to turn it off.

The err_style can be used to specify the style of the confidence intervals - band or bars. We've seen how bands work so far, so let's try out a confidence interval that uses bars instead:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('hotel_bookings.csv')

sns.lineplot(x = "arrival_date_month", y = "stays_in_week_nights", err_style='bars', data = df)
plt.show()

This results in:

And let's change the confidence interval, which is by default set to 95, to display standard deviation instead:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.read_csv('hotel_bookings.csv')

sns.lineplot(x = "arrival_date_month", y = "stays_in_week_nights", err_style='bars', ci='sd', data = df)
plt.show()

Conclusion

In this tutorial, we've gone over several ways to plot a Line Plot in Seaborn. We've taken a look at how to plot simple plots, with numerical and categorical X-axes, after which we've imported a dataset and visualized it.

We've explored how to manipulate datasets and change their form to visualize multiple features, as well as how to customize Line Plots.

If you're interested in Data Visualization and don't know where to start, make sure to check out our bundle of books on Data Visualization in Python:

Data Visualization in Python with Matplotlib and Pandas is a book designed to take absolute beginners to Pandas and Matplotlib, with basic Python knowledge, and allow them to build a strong foundation for advanced work with these libraries - from simple plots to animated 3D plots with interactive buttons.

It serves as an in-depth guide that'll teach you everything you need to know about Pandas and Matplotlib, including how to construct plot types that aren't built into the library itself.

Data Visualization in Python, a book for beginner to intermediate Python developers, guides you through simple data manipulation with Pandas, covers core plotting libraries like Matplotlib and Seaborn, and shows you how to take advantage of declarative and experimental libraries like Altair. More specifically, over the span of 11 chapters this book covers 9 Python libraries: Pandas, Matplotlib, Seaborn, Bokeh, Altair, Plotly, GGPlot, GeoPandas, and VisPy.

It serves as a unique, practical guide to Data Visualization, in a plethora of tools you might use in your career.

Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Project

Data Visualization in Python: Visualizing EEG Brainwave Data

# python# matplotlib# seaborn# data visualization

Electroencephalography (EEG) is the process of recording an individual's brain activity - from a macroscopic scale. It's a non-invasive (external) procedure and collects aggregate, not...

David Landup
Jovana Ninkovic
Details
Project

Data Visualization in Python: The Collatz Conjecture

# python# matplotlib# data visualization

The Collatz Conjecture is a notorious conjecture in mathematics. A conjecture is a conclusion based on existing evidence - however, a conjecture cannot be proven....

David Landup
Jovana Ninkovic
Details

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms