Reading and Writing JSON Files in Python with Pandas

Introduction

Pandas is one of the most commonly used Python libraries for data handling and visualization. The Pandas library provides classes and functionalities that can be used to efficiently read, manipulate and visualize data, stored in a variety of file formats.

In this article, we'll be reading and writing JSON files using Python and Pandas.

What is a JSON File?

JavaScript Object Notation (JSON) is a data format that stores data in a human-readable form. While it can be technically be used for storage, JSON files are primarily used for serialization and information exchange between a client and server.

Although it was derived from JavaScript, it's platform-agnostic and is a widely-spread and used format - most prevalently in REST APIs.

Creating a JSON File

To create JSON files via Python, data has to be stored in a certain way. There are multiple ways of storing this data using Python. Some of the methods have been discussed in this article.

We'll first create a file using core Python and then read and write to it via Pandas.

Creating JSON Data via a Nested Dictionaries

In Python, to create JSON data, you can use nested dictionaries. Each item inside the outer dictionary corresponds to a column in the JSON file.

The key of each item is the column header and the value is another dictionary consisting of rows in that particular column. Let's create a dictionary that can be used to create a JSON file that stores a record of fictional patients:

patients = {
         "Name":{"0":"John","1":"Nick","2":"Ali","3":"Joseph"},
         "Gender":{"0":"Male","1":"Male","2":"Female","3":"Male"},
         "Nationality":{"0":"UK","1":"French","2":"USA","3":"Brazil"},
         "Age" :{"0":10,"1":25,"2":35,"3":29}
}

In the script above, the first item corresponds to the Name column. The item value consists of a dictionary where dictionary items represent rows. The keys of the inner dictionary items corresponds to the index numbers of rows, where values represent row values.

Since that might be a bit hard to visualize just like that, here's a visual representation:

patients visualized

In the Name column, the first record is stored at the 0th index where the value of the record is John, similarly, the value stored at the second row of the Name column is Nick and so on.

Creating JSON Data via Lists of Dictionaries

Another way to create JSON data is via a list of dictionaries. Each item in the list consists of a dictionary and each dictionary represents a row. This approach is a lot more readable than using nested dictionaries.

Let's create a list that can be used to create a JSON file that stores information about different cars:

cars = [
    {"Name":"Honda", "Price": 10000, "Model":2005, "Power": 1300},
    {"Name":"Toyota", "Price": 12000, "Model":2010, "Power": 1600},
    {"Name":"Audi", "Price": 25000, "Model":2017, "Power": 1800},
    {"Name":"Ford", "Price": 28000, "Model":2009, "Power": 1200},
         
]

Each dictionary items corresponds to a row in a JSON file. For example the first item in the first dictionary stores the value Honda in the Name column. Similarly, the value of the Price column in the first row will be 10000 and so on.

Writing Data to a JSON File via Python

With our nested dictionary and a list of dictionaries, we can store this data in a JSON file. To achieve this, we'll use the json module and the dump() method:

import json
with open('E:/datasets/patients.json', 'w') as f:
    json.dump(patients, f)
    
with open('E:/datasets/cars.json', 'w') as f:
    json.dump(cars, f)  

Now, we have two JSON files - patients.json and cars.json. The next step is to read these files via the Pandas library.

If you'd like to read more about Reading and Writing JSON to a File in Core Python, we've got you covered!

Reading JSON Files with Pandas

To read a JSON file via Pandas, we'll utilize the read_json() method and pass it the path to the file we'd like to read. The method returns a Pandas DataFrame that stores data in the form of columns and rows.

Though, first, we'll have to install Pandas:

$ pip install pandas

Reading JSON from Local Files

The following script reads the patients.json file from a local system directory and stores the result in the patients_df dataframe. The header of the dataframe is then printed via the head() method:

import pandas as pd
patients_df = pd.read_json('E:/datasets/patients.json')
patients_df.head()

Running this code should yield:
patients dataframe

Similarly, the following script reads the cars.json file from the local system and then calls the head() method on the cars_df to print the header:

cars_df = pd.read_json('E:/datasets/cars.json')
cars_df.head()

Running this code should yield:
cars dataframe

Reading JSON from Remote Files

The read_json() method isn't limited to only reading local files. You can also read JSON files located on remote servers. You just have to pass the path of the remote JSON file to the function call.

Let's read and print out the head of the Iris Dataset - a really popular dataset containing information about various Iris flowers:

import pandas as pd
iris_data = pd.read_json("https://raw.githubusercontent.com/domoritz/maps/master/data/iris.json")
iris_data.head()

Running this code should yield us:
iris plant dataframe

Writing JSON Data Files via Pandas

To convert a Pandas dataframe to a JSON file, we use the to_json() function on the dataframe, and pass the path to the soon-to-be file as a parameter.

Let's create a JSON file from the tips dataset, which is included in the Seaborn library for data visualization.

First off, let's install Seaborn:

$ pip install seaborn

Then, let's import it and load the tips it into a dataset:

import seaborn as sns

dataset = sns.load_dataset('tips')
dataset.head()

This is how the dataset looks like:
tips dataset header

Seaborn's load_dataset() function returns a Pandas DataFrame, so loading the dataset like this allows us to simply call the to_json() function to convert it.

Once we've gotten ahold of the dataset, let's save its content in a JSON file. We've set up a datasets directory for this:

dataset.to_json('E:/datasets/tips.json')

Navigating to the E:/datasets directory, you should see tips.json. Opening the file, we can see JSON that correspond to records in the Pandas dataframe containing the tips dataset:

{
   "total_bill":{
       "0":16.99,
       "1":10.34,
       "2":21.01,
       "3":23.68,
       "4":24.59,
       "5":25.29,
       ...
   }
   "tip":{
       "0":1.01,
       "1":1.66,
       "2":3.5,
       "3":3.31,
       "4":3.61,
       "5":4.71,
       ...
   }
   "sex":{
        "0":"Female",
        "1":"Male",
        "2":"Male",
        "3":"Male",
        "4":"Female",
        "5":"Male",
        ...
   }
    "smoker":{
        "0":"No",
        "1":"No",
        "2":"No",
        "3":"No",
        "4":"No",
        "5":"No",
        ...
    }
    ...

Conclusion

JSON is a widely-used format for data storage and exchange between a client and a server. Developers often use this format over formats like XML due to how lightweight and readable it is.

In this article, we've covered how to read and write JSON files using Python's popular Pandas library - from local to remote files.

Author image
About Usman Malik
Paris (France) Twitter
Programmer | Blogger | Data Science Enthusiast | PhD To Be | Arsenal FC for Life