Getting Started with Pandas
Pandas is an open-source Python package that provides numerous tools for data analysis. The package comes with several data structures that can be used for many different data manipulation tasks. It also has a variety of methods that can be invoked for data analysis, which come in handy when working on Data Science and Machine Learning problems.
It can present data in a way that is very intuitive and suitable for data analysis, via its Series
and DataFrame
data structures. The DataFrame
is a fundamental and key data structure in the framework, and you'll spend a lot of time working with them.
Additionally, Pandas has a variety of ways to work with different types of I/O operations very seamlessly. It can read data from a variety of formats, such as CSV, XSLX, JSON, etc.
Pandas Data Structures
Pandas has two main data structures for data storage:
- Series
- DataFrame
Let's go over those two first.
Series
A series is similar to a one-dimensional array. It can store data of any type. The values of a Pandas Series
are mutable but the size of a Series
is immutable and cannot be changed.
The first element in the series is assigned the index of 0
, while the last element is at index N-1
, where N
is the total number of elements in the series.