Introduction
In this guide, we'll take a look at how to read and write JSON data from and to a file in Python, using the
json
module.
JSON (JavaScript Object Notation) is an extremely popular format for data serialization, given how generally applicable and lightweight it is - while also being fairly human-friendly. Most notably, it's extensively used in the web development world, where you'll likely encounter JSON-serialized objects being sent from REST APIs, application configuration, or even simple data storage.
Advice: If you'd like to read more about creating REST APIs with Python, read our "Creating a REST API in Python with Django" and "Creating a REST API with Django REST Framework"!
Given its prevalence, reading and parsing JSON files (or strings) is pretty common, and writing JSON to be sent off is equally as common. In this guide - we'll take a look at how to leverage the json
module to read and write JSON in Python.
Writing JSON to a File with Python with json.dump() and json.dumps()
To write JSON contents to a file in Python - we can use json.dump()
and json.dumps()
. These are separate methods and achieve different result:
json.dumps()
- Serializes an object into a JSON-formatted stringjson.dump()
- Serialized an object into a JSON stream for saving into files or sockets
Note: The "s" in "dumps" is actually short for "dump string".
JSON's natural format is similar to a map in computer science - a map of key-value
pairs. In Python, a dictionary is a map implementation, so we'll naturally be able to represent JSON faithfully through a dict
. A dictionary can contain other nested dictionaries, arrays, booleans, or other primitive types like integers and strings.
:::
Note: The built-in json package offers several convenience methods that allows us to convert between JSON and dictionaries.
:::
That being said, let's import the json
module, define a dictionary with some data and then convert it into JSON before saving to a file:
import json
data = {
'employees' : [
{
'name' : 'John Doe',
'department' : 'Marketing',
'place' : 'Remote'
},
{
'name' : 'Jane Doe',
'department' : 'Software Engineering',
'place' : 'Remote'
},
{
'name' : 'Don Joe',
'department' : 'Software Engineering',
'place' : 'Office'
}
]
}
# .dumps() as a string
json_string = json.dumps(data)
print(json_string)
This results in:
{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}
Here, we have a simple dictionary with a few employees
, each of which has a name
, department
and place
. The dumps()
function of the json
module dumps a dictionary into JSON contents, and returns a JSON string.
Once serialized, you may decide to send it off to another service that'll deserialize it, or, say, store it. To store this JSON string into a file, we'll simply open a file in write mode, and write it down. If you don't want to extract the data into an independent variable for later use and would just like to dump it into a file, you can skip the dumps()
function and use dump()
instead:
# Using a JSON string
with open('json_data.json', 'w') as outfile:
outfile.write(json_string)
# Directly from dictionary
with open('json_data.json', 'w') as outfile:
json.dump(json_string, outfile)
Any file-like object can be passed to the second argument of the dump()
function, even if it isn't an actual file. A good example of this would be a socket, which can be opened, closed, and written to much like a file.
Reading JSON from a File with Python with json.load() and json.loads()
The mapping between dictionary contents and a JSON string is straightforward, so it's easy to convert between the two. The same logic as with dump()
and dumps()
is applied to load()
and loads()
. Much like json.dumps()
, the json.loads()
function accepts a JSON string and converts it into a dictionary, while json.load()
lets you load in a file:
import json
with open('json_data.json') as json_file:
data = json.load(json_file)
print(data)
This results in:
{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}
Alternatively, let's read a JSON string into a dictionary:
import json
python_dictionary = json.loads(json_string)
print(python_dictionary)
Which also results in:
{'employees': [{'name': 'John Doe', 'department': 'Marketing', 'place': 'Remote'}, {'name': 'Jane Doe', 'department': 'Software Engineering', 'place': 'Remote'}, {'name': 'Don Joe', 'department': 'Software Engineering', 'place': 'Office'}]}
This one is especially useful for parsing REST API responses that send JSON. This data comes to you as a string, which you can then pass to json.loads()
directly, and you have a much more manageable dictionary to work with!
Sorting, Pretty-Printing, Separators and Encoding
When serializing your data to JSON with Python, the standard format aiming to minimize the required memory to transmit messages is not very readable since whitespaces are eliminated. While this is the ideal behavior for data transfer (computers don't care for readability, but do care about size) - sometimes you may need to make small changes, like adding whitespace to make it human readable.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
Note: json.dump()
/json.dumps()
and json.load()
/json.loads()
all provide a few options for formatting.
Pretty-Printing JSON in Python
Making JSON human readable (aka "pretty-printing") is as easy as passing an integer value for the indent
parameter:
import json
data = {'people':[{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data, indent=4))
This creases a 4-space indentation on each new logical block:
{
"people": [
{
"website": "stackabuse.com",
"from": "Nebraska",
"name": "Scott"
}
]
}
Another option is to use the command line tool - json.tool
. With it, you can pretty-print the JSON in the command line without affecting the transmitted string, and just impacting how it's displayed on the standard output pipe:
$ echo '{"people":[{"name":"Scott", "website":"stackabuse.com", "from":"Nebraska"}]}' | python -m json.tool
{
"people": [
{
"name": "Scott",
"website": "stackabuse.com"
"from": "Nebraska",
}
]
}
Sorting JSON Objects by Keys
In JSON terms:
"An object is an unordered set of name/value pairs."
The key order isn't guaranteed, but it's possible that you may need to enforce key order. To achieve ordering, you can pass True
to the sort_keys
option when using json.dump()
or json.dumps()
:
import json
data = {'people':[{'name': 'Scott', 'website': 'stackabuse.com', 'from': 'Nebraska'}]}
print(json.dumps(data, sort_keys=True, indent=4))
This results in:
{
"people": [
{
"from": "Nebraska",
"name": "Scott",
"website": "stackabuse.com"
}
]
}
ASCII Text and Encoding
By default, json.dump()
and json.dumps()
will ensure that text in the given Python dictionary is ASCII-encoded. If non-ASCII characters are present, then they're automatically escaped, as shown in the following example:
import json
data = {'item': 'Beer', 'cost':'£4.00'}
jstr = json.dumps(data, indent=4)
print(jstr)
{
"item": "Beer",
"cost": "\u00a34.00"
}
This isn't always acceptable, and in many cases you may want to keep your Unicode characters unchanged. To do this, set the ensure_ascii
option to False
:
jstr = json.dumps(data, ensure_ascii=False, indent=4)
print(jstr)
{
"item": "Beer",
"cost": "£4.00"
}
Skip Custom Key Data Types
If a key in your dictionary is of a non-primitive type (str
, int
, float
, bool
or None
), a TypeError
is raised when you try dumping JSON contents into a file. You can skip these keys via the skipkeys
argument:
jstr = json.dumps(data, skipkeys=True)
Enabling and Disabling Circular Check
If a property of a JSON object references itself, or another object that references back the parent object - an infinitely recursive JSON is created. Infinite recursion typically results in memory being allocated rapidly until a device runs out of memory, and in the case of dumping JSON, a RecursionError
is raised and the dumping is halted.
This is regulated by the check_circular
flag, which is True
by default, and prevents possible issues when writing circular dependencies. To turn it off, you can set it to `False:
jstr = json.dumps(data, check_circular=False)
Do note, however, that this is highly not recommended.
Enabling and Disabling NaNs
NaN-values, such as -inf
, inf
and nan
may creep into objects that you want to serialize or deserialize. JSON standard doesn't allow for NaN values, but they still carry logical values that you might want to transmit in a message. On another hand - you may want to enforce that NaN values aren't transmitted, and raise an exception instead. The allow_nan
flag is set to True
by default, and allows you to serialize and deserialize NaN values, replacing them with the JavaScript equivalents (Infinity
, -Infinity
and NaN
).
If you set the flag to False
instead - you'll switch to a strictly JSON-standardized format, which raises a ValueError
if your objects contain attributes with these values:
jstr = json.dumps(data, allow_nan=False)
Changing Separators
In JSON, the keys are separated from values with colons (:
) and the items are separated from each other with commas (,
):
key1:value1,
key2:value2
The default separators for reading and writing JSON in Python is (', ', ': ')
with whitespaces after the commas and colons. You can alter these to skip the whitespaces and thus make the JSON a bit more compact, or fully change the separators with other special characters for a different representation:
# Updated to not contain whitespaces after separators
jstr = json.dumps(data, separators=(',', ':'))
Compatibility Issues with Python 2
If you're using an older version of Python (2.x) - you may run into a TypeError
while trying to dump JSON contents into a file. Namely, if the contents contain a non-ASCII character, a TypeError
is raised, even if you pass the encoding argument, when using the json.dump()
method:
# Python 2.x
with open('json_data.json', 'w', encoding='utf-8') as outfile:
json.dump(json_string, outfile, ensure_ascii=False)
If you encounter this edge-case, which has since been fixed in subsequent Python versions - try using json.dumps()
instead, and write the string contents into a file instead of streaming the contents directly into a file.
Conclusion
In this guide, we introduced you to the json.dump()
, json.dumps()
, json.load()
, and json.loads()
methods, which help in serializing and deserializing JSON strings.
We've then taken a look at how you can sort JSON objects, pretty-print them, change the encoding, skip custom key data types, enable or disable circular checks and whether NaNs are allowed, as well as how to change the separators for serialization and deserialization.
With JSON being one of the most popular ways to serialize structured data, you'll likely have to interact with it pretty frequently, especially when working on web applications. Python's json
module is a great way to get started, although you'll probably find that simplejson is another great alternative that is much less strict on JSON syntax.