Manipulating and Visualizing Data with Pandas

David Landup
David Landup

Pandas is one of the most commonly used data science and analysis libraries in Python. The popularity of Pandas comes from the fact that it lets you easily create and edit data structures, making both data visualization and manipulation very straightforward.

Pandas allows the user to make dataframes out of either an entire dataset or subsets of that dataset, and then do things like cut, filter, merge, and otherwise edit those dataframes.

A Pandas DataFrame is a frame of two-dimensional, size-mutable, potentially heterogeneous tabular data.

More on that later.

Note that Pandas is more of a data manipulation library than a visualization library. While Pandas does allow you to create some plots with its methods and functions, it relies heavily on Matplotlib.

We will cover Matplotlib in the following lesson, but the preparation of data for visualization is a critical first step - which is a breeze to do with Pandas.

Pandas gives you complete control over a dataset, allowing you to select the entire dataset, just a single element in the dataset or anything in-between.

It supports a number of dataset manipulation techniques, like giving you the ability to merge, concatenate, join, and pivot tables. You can iterate through rows of a table and apply transformations to them, like stripping out unnecessary characters or dropping duplicate rows. All of these features make Pandas an excellent tool for data preprocessing.

Start course to continue
Lessson 3/12
You must first start the course before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.