# Calculate Mean Across Multiple DataFrames in Pandas

## Introduction

The Pandas library offers a plethora of functions that make data manipulation and analysis super simple (or at least simpl**er**). One such function is the `mean()`

function, which allows you to calculate the average of values in a DataFrame. But what if you're working with multiple DataFrames? In this Byte, we'll explore how to calculate the mean across multiple DataFrames.

## Why Calculate Mean Across Multiple DataFrames?

There are numerous scenarios where you might have multiple DataFrames and need to calculate the mean across all of them. For example, you might have data spread across multiple DataFrames due to the size of the data, different data sources, or maybe the data is simply segmented for easier manipulation or storage in files. In these cases, calculating the mean across all these DataFrames can provide a holistic view of the data and can be useful for certain statistical analyses.

## Calculating Mean in a Single DataFrame

Before we get into calculating mean across multiple DataFrames, let's first understand how to calculate mean in a single DataFrame. Here's how we'd do it:

```
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5],
'B': [2, 3, 4, 5, 6],
'C': [3, 4, 5, 6, 7]
})
# Calculate mean
mean = df.mean()
print(mean)
```

When you run this code, you'll get the following output:

```
A 3.0
B 4.0
C 5.0
dtype: float64
```

In this simple example, the `mean()`

function calculates the mean of each column in the DataFrame.

## Extending to Multiple DataFrames

Now that we know how to calculate the mean in a single DataFrame, let's extend this to multiple DataFrames. To do this, it'd be easiest if we concatenated the DataFrames and then calculate the mean. This can be done using the `concat()`

method.

```
# Create two more DataFrames
df1 = pd.DataFrame({
'A': [6, 7, 8, 9, 10],
'B': [7, 8, 9, 10, 11],
'C': [8, 9, 10, 11, 12]
})
df2 = pd.DataFrame({
'A': [11, 12, 13, 14, 15],
'B': [12, 13, 14, 15, 16],
'C': [13, 14, 15, 16, 17]
})
# Concatenate DataFrames
df_concat = pd.concat([df, df1, df2])
# Calculate mean
mean_concat = df_concat.mean()
print(mean_concat)
```

The output will be:

```
A 8.0
B 9.0
C 10.0
dtype: float64
```

First we concatenate the three DataFrames using `pd.concat()`

. We then calculate the mean of the new concatenated DataFrame using the `mean()`

function.

**Note:** The `pd.concat()`

function concatenates along the vertical axis by default. If your DataFrames have the same columns, this is typically what you want.

However, if your DataFrames have different columns, you might want to concatenate along the *horizontal* axis. You can do this by setting the `axis`

parameter to 1: `pd.concat([df1, df2], axis=1)`

. This would be useful if they have different columns and you just want them in a common DataFrame to run analysis on, like with the `mean()`

method.

## Use Cases

Calculating the mean across multiple DataFrames in Pandas can help in a variety of scenarios. Let's see a few possible use-cases.

One of the most common scenarios is when you're dealing with a large dataset that's been split into multiple DataFrames for easier handling. In such cases, calculating the mean across these DataFrames can give you a more holistic understanding of your data.

Consider the case of a data analyst working with sales data from a multinational company. The data is split by region, each represented by a separate DataFrame. To get a global perspective on average sales, the analyst would need to calculate the mean across all these DataFrames.

```
import pandas as pd
# Assume we have three DataFrames for sales data in three different regions
df1 = pd.DataFrame({'sales': [100, 200, 300]})
df2 = pd.DataFrame({'sales': [400, 500, 600]})
df3 = pd.DataFrame({'sales': [700, 800, 900]})
# Calculate the mean across all DataFrames
mean_sales = pd.concat([df1, df2, df3]).mean()
print(mean_sales)
```

Output:

```
sales 500.0
dtype: float64
```

Another use-case could be time-series analysis, where you might have data split across multiple DataFrames, each representing a different time period. Calculating the mean across these DataFrames can provide better insights into trends and patterns over time.

## Conclusion

In this Byte, we calculated the mean across multiple DataFrames in Pandas. We started by understanding the calculation of mean in a single DataFrame, then extended this concept to multiple DataFrames. We also pointed out some use-cases where this technique would be particularly useful, like when dealing with split datasets or conducting time-series analysis.