Importing Multiple CSV Files into a Single DataFrame using Pandas in Python

Introduction

In this Byte we're going to talk about how to import multiple CSV files into Pandas and concatenate them into a single DataFrame. This is a common scenario in data analysis where you need to combine data from different sources into a single data structure for analysis.

Pandas and CSVs

Pandas is a very popular data manipulation library in Python. One of its most appreciated features is its ability to read and write various formats of data, including CSV files. CSV is a simple file format used to store tabular data, like a spreadsheet or database.

Pandas provides the read_csv() function to read CSV files and convert them into a DataFrame. A DataFrame is similar to a spreadsheet or SQL table, or a dict of Series objects. We'll see examples of how to use this later in the Byte.

Why Concatenate Multiple CSV Files

It's possible that your data is distributed across multiple CSV files, especially for a very large dataset. For example, you might have monthly sales data stored in separate CSV files for each month. In these cases, you'll need to concatenate these files into a single DataFrame to perform analysis on the entire dataset.

Concatenating multiple CSV files allows you to perform operations on the entire dataset at once, rather than applying the same operation to each file individually. This not only saves time but also makes your code cleaner, easier to understand, and easier to write.

Reading a Single CSV File into a DataFrame

Before we get into reading multiple CSV files, it might help to first understand how to read a single CSV file into a DataFrame using Pandas.

The read_csv() function is used to read a CSV file into a DataFrame. You just need to pass the file name as a parameter to this function.

Here's an example:

import pandas as pd

df = pd.read_csv('sales_january.csv')
print(df.head())

In this example, we're reading the sales_january.csv file into a DataFrame. The head() function is used to get the first n rows. By default, it returns the first 5 rows. The output might look something like this:

   Product  SalesAmount        Date  Salesperson
0    Apple          100  2023-01-01          Bob
1   Banana           50  2023-01-02        Alice
2   Cherry           30  2023-01-03        Carol
3    Apple           80  2023-01-03          Dan
4   Orange           60  2023-01-04        Emily
Get free courses, guided projects, and more

No spam ever. Unsubscribe anytime. Read our Privacy Policy.

Note: If your CSV file is not in the same directory as your Python script, you need to specify the full path to the file in the read_csv() function.

Reading Multiple CSV Files into a Single DataFrame

Now that we've seen how to read a single CSV file into a DataFrame, let's see how we can read multiple CSV files into a single DataFrame using a loop.

Here's how you can read multiple CSV files into a single DataFrame:

import pandas as pd
import glob

files = glob.glob('path/to/your/csv/files/*.csv')

# Initialize an empty DataFrame to hold the combined data
combined_df = pd.DataFrame()

for filename in files:
    df = pd.read_csv(filename)
    combined_df = pd.concat([combined_df, df], ignore_index=True)

In this code, we initialize an empty DataFrame named combined_df. For each file that we read into a DataFrame (df), we concatenate it to combined_df using the pd.concat function. The ignore_index=True parameter reindexes the DataFrame after concatenation, ensuring that the index remains continuous and unique.

Note: The glob module is part of the standard Python library and is used to find all the pathnames matching a specified pattern, in line with Unix shell rules.

This approach will compiles multiple CSV files into a single DataFrame.

Use Cases of Combined DataFrames

Concatenating multiple DataFrames can be very useful in a variety of situations. For example, suppose you're a data scientist working with sales data. Your data might be spread across multiple CSV files, each representing a different quarter of the year. By concatenating these files into a single DataFrame, you can analyze the entire year's data at once.

Or perhaps you're working with sensor data that's been logged every day to a new CSV file. Concatenating these files would allow you to analyze trends over time, identify anomalies, and more.

In short, whenever you have related data spread across multiple CSV files, concatenating them into a single DataFrame can make your analysis much easier.

Conclusion

In this Byte, we've learned how to read multiple CSV files into separate Pandas DataFrames and then concatenate them into a single DataFrame. This is a useful way to work with large, spread-out datasets. Whether you're a data scientist analyzing sales data, a researcher working with sensor logs, or just someone trying to make sense of a large dataset, Pandas' handling of CSV files and DataFrame concatenation can be a big help.

Last Updated: September 8th, 2023
Was this helpful?
Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details

Ā© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms