There are many ways of reading and writing CSV files in Python. There are a few different methods, for example, you can use Python's built in
open() function to read the CSV (Comma Separated Values) files or you can use Python's dedicated csv module to read and write CSV files. Depending on your use-case, you can also use Python's Pandas library to read and write CSV files.
In this article, you will see how to use Python's Pandas library to read and write CSV files. However, before that let's briefly see what a CSV file is.
What is a CSV File?
A CSV file is nothing more than a simple text file. However, it is the most common, simple, and easiest method to store tabular data. This particular format arranges tables by following a specific structure divided into rows and columns. It is these rows and columns that contain your data. A new line terminates each row to start the next row. Similarly, a comma, also known as the delimiter, separates columns within each row.
Take the following table as an example:
| City | State | Capital | Population | | ------------ | ------------ | ------- | ------------- | | Philadelphia | Pennsylvania | No | 1.581 Million | | Sacramento | California | Yes | 0.5 Million | | New York | New York | No | 8.623 Million | | Austin | Texas | Yes | 0.95 Million | | Miami | Florida | No | 0.463 Million |
Now, the above table will look as follows if we represent it in CSV format:
City, State, Capital, Population Philadelphia, Pennsylvania, No, 1.581 Million Sacramento, California, Yes, 0.5 Million New York, New York, No, 8.623 Million Austin, Texas, Yes, 0.95 Million Miami, Florida, No, 0.463 Million
As you can see, a comma separates all the values in columns within each row. However, you can use other symbols such as a semicolon (;) as a separator as well. Every row of the table becomes a new line of the CSV file. The core purpose of the CSV format is to help you present the tabular data compactly and concisely.
Now that you have understood what a CSV file is, it is time to study how Pandas'
read_csv() method is used to read CSV files in Python.
Reading and Writing CSV Files using Pandas
Pandas is a very powerful and popular framework for data analysis and manipulation. One of the most striking features of Pandas is its ability to read and write various types of files including CSV and Excel. You can effectively and easily manipulate CSV files in Pandas using functions like
We have to install Panda before using the framework. One of the easiest methods to install Pandas is to install Anaconda. It is a cross-platform Python Distribution for tasks like Python computing and data analysis. Once you install Anaconda, you will have access to Pandas and other libraries such as SciPy and NumPy without doing anything else.
Besides, you can also install Pandas via pip installer, as shown here:
$ pip install pandas
Reading CSV Files with read_csv()
Once you go through the installation, you can use the
read_csv() function to read a CSV file. We will try to read the "titanic.csv" file, which can be downloaded from this link.
First we have to import the Pandas library:
import pandas as pd
Now, we use the following line of code to actually read and parse the file:
titanic_data = pd.read_csv('titanic.csv')
When we execute this code, it will read the CSV file "titanic.csv" from the current directory. You can see from the script above that to read a CSV file, you have to pass the file path to the
read_csv() method of the Pandas library. The
read_csv() method then returns a Pandas DataFrame that contains the data of the CSV file.
You can display the first five rows of the CSV file via the
head() method of the Pandas DataFrame, as shown below:
You can also read a CSV file from an online resource by simply passing the URL of the resource to the
read_csv() method. Let's read the "titanic.csv" file located on a remote GitHub repository.
import pandas as pd titanic_data = pd.read_csv(r'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
By default, the
read_csv() method treats the values in the first row of a CSV file as column headers. However, you can pass your custom header names while reading a file via the
import pandas as pd col_names = ['Id', 'Survived', 'Passenger Class', 'Full Name', 'Gender', 'Age', 'SibSp', 'Parch', 'Ticket Number', 'Price', 'Cabin', 'Station'] titanic_data = pd.read_csv(r'E:\Datasets\titanic.csv', names=col_names, header=None)
To read a CSV file with custom headers, you need to pass a list of custom column names to the
names attribute of the
read_csv() method. If you want to overwrite the default header names, you can pass the
header=None keyword argument.
Let's now see the header names of the "titanic.csv" file that we read in the last example.
Here you can see your custom headers in the output from the DataFrame.
Writing CSV Files with to_csv()
The process of creating or writing a CSV file through Pandas can be a little more complicated than reading CSV, but it's still relatively simple. We use the
to_csv() function to perform this task. However, you have to create a Pandas DataFrame first, followed by writing that DataFrame to the CSV file.
Column names can also be specified via the keyword argument
columns, as well as a different delimiter via the
sep argument. Again, the default delimiter is a comma, ','.
Here is a simple example showing how to export a DataFrame to a CSV file via
import pandas as pd city = pd.DataFrame([['Sacramento', 'California'], ['Miami', 'Florida']], columns=['City', 'State']) city.to_csv('city.csv')
In the above example, we have created a DataFrame named
city. Subsequently, we have written that DataFrame to a file named "city.csv" using the
to_csv() function. The resulting CSV file should have the following contents:
City,State Sacramento,California Miami,Florida
The CSV file contain our custom headers, followed by the 2 rows of data contained in the DataFrame we created.
The article shows how to read and write CSV files using Python's Pandas library. To read a CSV file, the
read_csv() method of the Pandas library is used. You can also pass custom header names while reading CSV files via the
names attribute of the
read_csv() method. Finally, to write a CSV file using Pandas, you first have to create a Pandas DataFrame object and then call
to_csv method on the DataFrame.