Fix Error "xlrd.biffh.XLRDError: Excel xlsx file; not supported"
Introduction
Python has a rich ecosystem of libraries that make it an ideal language for data analysis. One of those libraries is pandas
, which simplifies the process of reading and writing data between in-memory data structures and different file formats.
However, while working with Excel files using pandas.read_excel
, you might run into an error that looks like this:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
In this Byte, we'll dissect this error message, understand why it occurs, and learn how to fix it.
What is the Error "xlrd.biffh.XLRDError"
The xlrd.biffh.XLRDError
is a specific error message that you might encounter while working with the pandas
library in Python. This error is thrown when you try to read an Excel file with the .xlsx
extension using pandas.read_excel
method.
Here's an example of the error:
import pandas as pd
df = pd.read_excel('file.xlsx')
Output:
xlrd.biffh.XLRDError: Excel xlsx file; not supported
Cause of the Error
The xlrd.biffh.XLRDError
error is caused by a recent change in the xlrd
library that pandas
uses to read Excel files. The xlrd
library now only supports the older .xls
file format and no longer supports the newer .xlsx
file format.
This change can be a bit of a surprise if you've been using pandas.read_excel
with xlrd
. By default, pandas.read_excel
uses the xlrd
library to read Excel files, but as of xlrd
version 2.0.0, this library no longer supports .xlsx
files.
As developers, we've all been there...
How to Fix the Error
The solution to this error is simple. You just need to install openpyxl
and specify the engine
argument in the pandas.read_excel
method to use the openpyxl
library instead of xlrd
. The openpyxl
library supports both .xls
and .xlsx
file formats.
Here's how to do it:
First, you need to install the openpyxl
library. You can do this using pip:
$ pip install openpyxl
Then, you can specify the engine
argument in the pandas.read_excel
method like this:
import pandas as pd
df = pd.read_excel('file.xlsx', engine='openpyxl')
This code will read the Excel file using the openpyxl
library, and you will no longer encounter the xlrd.biffh.XLRDError
error.
Conclusion
In this Byte, we've learned about the xlrd.biffh.XLRDError
error that happens when using pandas.read_excel
to read .xlsx
files. We've learned why this error occurs and how to fix it by using the openpyxl
library.