Fixing "NameError: name 'df'/'pd' is not defined" in Python

Introduction

When using Pandas in Python, a library for data manipulation and analysis, you might have encountered an error like "NameError: name 'df'/'pd' is not defined". In this Byte, we'll show why these errors occur and how you can avoid them.

Understanding this 'df' NameError

The df name error usually occurs when you try to use a DataFrame object df before it has been defined. This is a common mistake when working with Pandas (or any Python script, really), which uses the DataFrame object to store data in two-dimensional size-mutable, potentially heterogeneous tabular form.

print(df)
NameError: name 'df' is not defined

The above error is thrown because df has not been defined before it's being accessed.

Declaring Variables Before Accessing

To avoid the NameError, you need to make sure that your DataFrame df is declared before it's accessed. This can be done by using the Pandas function pd.DataFrame() to create a DataFrame.

import pandas as pd

data = {
    'apples': [3, 2, 0, 1], 
    'oranges': [0, 3, 7, 2]
}

df = pd.DataFrame(data)

print(df)
   apples  oranges
0       3        0
1       2        3
2       0        7
3       1        2

The above code will work perfectly because df has been defined before it's being accessed.

Common Reasons for the 'df' NameError

There are several common situations that may cause the df error. As we just saw, one of these is attempting to use df before it's been declared. Another is when you mistakenly think a library or module has been imported, but it hasn't.

df = pd.DataFrame(data)

print(df)
NameError: name 'pd' is not defined

In the code above, the pandas module has not been imported, hence the NameError.

Scope-Related Issues with Variables

Another common trigger for the error is scope-related issues. If a DataFrame df is defined within a function, it will not be recognized outside that function. This is because df is local to the function and is not a global variable.

def create_df():
    df = pd.DataFrame(data)
    return df

print(df)
NameError: name 'df' is not defined

In this code, df is defined within the create_df() function and can't be accessed outside of it.

Avoiding Nested Scope Import of Pandas

In Python, the scope of a variable refers to the context in which it's "visible". The two most common types of scope are global (the code block from which it's accessible) and local (the function or method in which it's defined). When you import pandas as pd within a function (local scope), and then try to use it outside that function (global scope), you'll likely encounter the NameError.

Here's an example:

def my_function():
    import pandas as pd
    # some code here

my_function()
print(pd)

Running this code will give you a NameError: name 'pd' is not defined because the pandas module was imported in the local scope of the function and isn't accessible in the global scope.

To avoid this, always import pandas at the beginning of your script, outside any functions or methods, so it's available throughout your code.

Don't Import Pandas in try/except Blocks

We often see Python developers importing modules within try/except blocks to handle potential import errors. However, this can lead to unexpected NameErrors if not done correctly.

Consider the following code:

Get free courses, guided projects, and more

No spam ever. Unsubscribe anytime. Read our Privacy Policy.

try:
    import pandas as pd
except ImportError:
    print("pandas module not installed")

print(pd)

If Pandas isn't installed, the last print statement will raise a NameError: name 'pd' is not defined since pd was never able to be defined. To avoid this, ensure that you're only referencing the module within the try block or ensure it's installed before running the script. In this case, the except block should have either exited the script or had another fallback.

The 'pd' NameError

The NameError: name 'pd' is not defined in Python happens when you try to use pandas (aliased as pd) before importing it. When you use the alias pd to call pandas functions without importing Pandas as pd, Python doesn't recognize pd and raises a NameError.

Here's an example:

df = pd.DataFrame()

Running this code without importing pandas as pd will result in a NameError: name 'pd' is not defined.

Importing Pandas Before Usage

To resolve the NameError: name 'pd' is not defined, you need to import Pandas before using it. The standard convention is to import pandas at the beginning of your script and alias it as pd for easier use.

Here's how to do it:

import pandas as pd

df = pd.DataFrame()

This code will run without raising a NameError because pandas is imported before it's used.

Misspelling Issues with Pandas Module

While Python is case-sensitive, typos or incorrect capitalization can lead to a NameError. For instance, if you import Pandas as pd but later refer to it as PD or Pd, Python will raise a NameError: name 'PD' is not defined or NameError: name 'Pd' is not defined.

import pandas as pd

df = PD.DataFrame()  # This will raise a NameError

To avoid this, always ensure that you're consistent with the case when referring to pandas or any other Python modules.

Avoid Nested Scope Import of Pandas

Often, Python developers attempt to import modules within a function or a class, leading to a nested scope import. This can cause issues, particularly with Pandas, as the module might not be available in the global scope. Let's take a look at an example:

def some_function():
    import pandas as pd
    df = pd.DataFrame()

some_function()
print(df)

This code will throw a NameError because df is not defined in the global scope. The DataFrame df is only available within the function some_function.

Note: To avoid such issues, always import your modules at the top of your script, making them available throughout the entire scope of your program.

Using Correct Pandas Import Statement

Pandas is a popular Python library for data manipulation and analysis. It's conventionally imported with the alias pd. If you're seeing a NameError for pd, it's likely that you've either forgotten to import Pandas, or have imported it incorrectly. Here's how you should do it:

import pandas as pd

Once Pandas is imported with the alias pd, you can use it to create a DataFrame, like so:

df = pd.DataFrame()

Note: Always ensure that Pandas is imported correctly at the beginning of your script. If Pandas not installed, you can install it using pip: $ pip install pandas in your console.

Conclusion

In Python, a NameError typically indicates that a variable or module has been used before it has been defined. This can occur with Pandas (commonly aliased as pd) and with DataFrames (often named df). To avoid these errors, always ensure that your modules are imported at the top of your script, using the correct syntax. Also, make sure that variables are declared before they're accessed.

Last Updated: August 22nd, 2023
Was this helpful?
Project

Building Your First Convolutional Neural Network With Keras

# python# artificial intelligence# machine learning# tensorflow

Most resources start with pristine datasets, start at importing and finish at validation. There's much more to know. Why was a class predicted? Where was...

David Landup
David Landup
Details

Ā© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms