Introduction
Pandas is one of the most commonly used Python libraries for data handling and visualization. The Pandas library provides classes and functionalities that can be used to efficiently read, manipulate and visualize data, stored in a variety of file formats.
In this article, we'll be reading and writing JSON files using Python and Pandas.
What is a JSON File?
JavaScript Object Notation (JSON) is a data format that stores data in a human-readable form. While it can technically be used for storage, JSON files are primarily used for serialization and information exchange between a client and server.
Although it was derived from JavaScript, it's platform-agnostic and is a widely-spread and used format - most prevalently in REST APIs.
Creating a JSON File
To create JSON files via Python, data has to be stored in a certain way. There are multiple ways of storing this data using Python. Some of the methods have been discussed in this article.
We'll first create a file using core Python and then read and write to it via Pandas.
Creating JSON Data via a Nested Dictionaries
In Python, to create JSON data, you can use nested dictionaries. Each item inside the outer dictionary corresponds to a column in the JSON file.
The key of each item is the column header and the value is another dictionary consisting of rows in that particular column. Let's create a dictionary that can be used to create a JSON file that stores a record of fictional patients:
patients = {
"Name":{"0":"John","1":"Nick","2":"Ali","3":"Joseph"},
"Gender":{"0":"Male","1":"Male","2":"Female","3":"Male"},
"Nationality":{"0":"UK","1":"French","2":"USA","3":"Brazil"},
"Age" :{"0":10,"1":25,"2":35,"3":29}
}
In the script above, the first item corresponds to the Name
column. The item value consists of a dictionary where dictionary items represent rows. The keys of the inner dictionary items correspond to the index numbers of rows, where values represent row values.
Since that might be a bit hard to visualize just like that, here's a visual representation:
In the Name
column, the first record is stored at the 0th index where the value of the record is John
, similarly, the value stored at the second row of the Name
column is Nick
and so on.
Creating JSON Data via Lists of Dictionaries
Another way to create JSON data is via a list of dictionaries. Each item in the list consists of a dictionary and each dictionary represents a row. This approach is a lot more readable than using nested dictionaries.
Let's create a list that can be used to create a JSON file that stores information about different cars:
cars = [
{"Name":"Honda", "Price": 10000, "Model":2005, "Power": 1300},
{"Name":"Toyota", "Price": 12000, "Model":2010, "Power": 1600},
{"Name":"Audi", "Price": 25000, "Model":2017, "Power": 1800},
{"Name":"Ford", "Price": 28000, "Model":2009, "Power": 1200},
]
Each dictionary item corresponds to a row in a JSON file. For example the first item in the first dictionary stores the value Honda
in the Name
column. Similarly, the value of the Price
column in the first row will be 10000
and so on.
Writing Data to a JSON File via Python
With our nested dictionary and a list of dictionaries, we can store this data in a JSON file. To achieve this, we'll use the json
module and the dump()
method:
import json
with open('E:/datasets/patients.json', 'w') as f:
json.dump(patients, f)
with open('E:/datasets/cars.json', 'w') as f:
json.dump(cars, f)
Now, we have two JSON files - patients.json
and cars.json
. The next step is to read these files via the Pandas library.
If you'd like to read more about Reading and Writing JSON to a File in Core Python, we've got you covered!
Reading JSON Files with Pandas
To read a JSON file via Pandas, we'll utilize the read_json()
method and pass it the path to the file we'd like to read. The method returns a Pandas DataFrame
that stores data in the form of columns and rows.
Though, first, we'll have to install Pandas:
$ pip install pandas
Reading JSON from Local Files
The following script reads the patients.json
file from a local system directory and stores the result in the patients_df
dataframe. The header of the dataframe is then printed via the head()
method:
import pandas as pd
patients_df = pd.read_json('E:/datasets/patients.json')
patients_df.head()
Running this code should yield:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Similarly, the following script reads the cars.json
file from the local system and then calls the head()
method on the cars_df
to print the header:
cars_df = pd.read_json('E:/datasets/cars.json')
cars_df.head()
Running this code should yield:
Reading JSON from Remote Files
The read_json()
method isn't limited to only reading local files. You can also read JSON files located on remote servers. You just have to pass the path of the remote JSON file to the function call.
Let's read and print out the head of the Iris Dataset - a really popular dataset containing information about various Iris flowers:
import pandas as pd
iris_data = pd.read_json("https://raw.githubusercontent.com/domoritz/maps/master/data/iris.json")
iris_data.head()
Running this code should yield us:
Writing JSON Data Files via Pandas
To convert a Pandas dataframe to a JSON file, we use the to_json()
function on the dataframe, and pass the path to the soon-to-be file as a parameter.
Let's create a JSON file from the tips
dataset, which is included in the Seaborn library for data visualization.
First off, let's install Seaborn:
$ pip install seaborn
Then, let's import it and load the tips it into a dataset:
import seaborn as sns
dataset = sns.load_dataset('tips')
dataset.head()
This is how the dataset looks like:
Seaborn's load_dataset()
function returns a Pandas DataFrame
, so loading the dataset like this allows us to simply call the to_json()
function to convert it.
Once we've gotten a hold of the dataset, let's save its content in a JSON file. We've set up a datasets
directory for this:
dataset.to_json('E:/datasets/tips.json')
Navigating to the E:/datasets
directory, you should see tips.json
. Opening the file, we can see JSON that correspond to records in the Pandas dataframe containing the tips
dataset:
{
"total_bill":{
"0":16.99,
"1":10.34,
"2":21.01,
"3":23.68,
"4":24.59,
"5":25.29,
...
}
"tip":{
"0":1.01,
"1":1.66,
"2":3.5,
"3":3.31,
"4":3.61,
"5":4.71,
...
}
"sex":{
"0":"Female",
"1":"Male",
"2":"Male",
"3":"Male",
"4":"Female",
"5":"Male",
...
}
"smoker":{
"0":"No",
"1":"No",
"2":"No",
"3":"No",
"4":"No",
"5":"No",
...
}
...
Conclusion
JSON is a widely-used format for data storage and exchange between a client and a server. Developers often use this format over formats like XML due to how lightweight and readable it is.
In this article, we've covered how to read and write JSON files using Python's popular Pandas library - from local to remote files.