How to convert JSON to Python Object
Python's json
library has many utilities for encoding and decoding data in JSON format. In particular, the json.load()
method decodes an JSON read as a file, and the json.loads()
decode an JSON read as a string. In general, when decoding JSON files, the data is converted to Python dictionaries, but it is possible to convert it to a custom object by using the parameter object_hook
.
For instance, suppose you have the following JSON object:
json_obj = """{
"name" : "Felipe",
"email" : "[email protected]",
"age" : 29
}"""
and the following class:
class User():
name : str
email : str
age : int
def __init__(self, input):
self.name = input.get("name")
self.email = input.get("email")
self.age = input.get("age")
If we call json.loads()
with User
as the object_hook
parameter, the User.__init__()
method will be called with the JSON's corresponding dict
as input.
import json
user = json.loads(json_obj, object_hook = User)
print(f"User {user.name}, age {user.age}, email {user.email}")
User Felipe, age 29, email [email protected]
But what if you have a nested JSON?
json.loads()
actually calls the object_hook
function every time it reads a fully formed JSON object from the string. Consider the following JSON, returned from the Random User Generator API
json_obj = """{
"gender": "male",
"name": {
"title": "Mr",
"first": "Ian",
"last": "Walters"
},
"location": {
"street": {
"number": 3161,
"name": "Saddle Dr"
},
"city": "Bendigo",
"state": "Western Australia",
"country": "Australia",
"postcode": 4285,
"coordinates": {
"latitude": "-84.7903",
"longitude": "-29.1020"
},
"timezone": {
"offset": "+9:00",
"description": "Tokyo, Seoul, Osaka, Sapporo, Yakutsk"
}
},
"email": "[email protected]",
"login": {
"uuid": "6ee5b2e8-01c3-4314-8f7f-80059f5dd9ec",
"username": "lazyzebra585",
"password": "walter",
"salt": "afXmogsa",
"md5": "a40e87023b57a4a60c7cb398584cbac3",
"sha1": "74caf43400be38cce60a8da2e6d1c367246505c2",
"sha256": "1becdf34bcc6704726c7e9b38821a5792f9dd0689d30789fb5e099a6e51e860a"
},
"dob": {
"date": "1947-06-06T02:45:41.895Z",
"age": 75
},
"registered": {
"date": "2003-03-25T00:15:32.791Z",
"age": 19
},
"phone": "06-9388-6976",
"cell": "0469-101-424",
"id": {
"name": "TFN",
"value": "561493929"
},
"picture": {
"large": "https://randomuser.me/api/portraits/men/32.jpg",
"medium": "https://randomuser.me/api/portraits/med/men/32.jpg",
"thumbnail": "https://randomuser.me/api/portraits/thumb/men/32.jpg"
},
"nat": "AU"
}"""
Let's print the decoded JSON at each step to see what happens:
json.loads(json_obj, object_hook = print)
{'title': 'Mr', 'first': 'Ian', 'last': 'Walters'}
{'number': 3161, 'name': 'Saddle Dr'}
{'latitude': '-84.7903', 'longitude': '-29.1020'}
{'offset': '+9:00', 'description': 'Tokyo, Seoul, Osaka, Sapporo, Yakutsk'}
{'street': None, 'city': 'Bendigo', 'state': 'Western Australia', 'country': 'Australia', 'postcode': 4285, 'coordinates': None, 'timezone': None}
{'uuid': '6ee5b2e8-01c3-4314-8f7f-80059f5dd9ec', 'username': 'lazyzebra585', 'password': 'walter', 'salt': 'afXmogsa', 'md5': 'a40e87023b57a4a60c7cb398584cbac3', 'sha1': '74caf43400be38cce60a8da2e6d1c367246505c2', 'sha256': '1becdf34bcc6704726c7e9b38821a5792f9dd0689d30789fb5e099a6e51e860a'}
{'date': '1947-06-06T02:45:41.895Z', 'age': 75}
{'date': '2003-03-25T00:15:32.791Z', 'age': 19}
{'name': 'TFN', 'value': '561493929'}
{'large': 'https://randomuser.me/api/portraits/men/32.jpg', 'medium': 'https://randomuser.me/api/portraits/med/men/32.jpg', 'thumbnail': 'https://randomuser.me/api/portraits/thumb/men/32.jpg'}
{'gender': 'male', 'name': None, 'location': None, 'email': '[email protected]', 'login': None, 'dob': None, 'registered': None, 'phone': '06-9388-6976', 'cell': '0469-101-424', 'id': None, 'picture': None, 'nat': 'AU'}
So json.loads()
calls the object_hook
function every time it reads a fully formed JSON, that is, every time it closes a bracket pair {}
. Then, it creates the whole JSON object by using the result of the object_hook
function - note the None
(the return value of print
) in the last printed line.
We will show two work-arounds for this issue. The first is to modify our User.__init__()
method to be more flexible with respect to the input. We will do this using the __dict__
attribute. Every Python object has a __dict__
attribute that holds every attribute's name and value. Our modified __init__()
method will update this dictionary:
class User():
def __init__(self, input):
self.__dict__.update(input)
user = json.loads(json_obj, object_hook = User)
print(f"User {user.name.first} {user.name.last}, age {user.dob.age}, email {user.email}")
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
User Ian Walters, age 75, email [email protected]
Another possible work-around is to use the collections.namedtuple
class:
from collections import namedtuple
def create_user(input):
User = namedtuple('User', input.keys())
return User(**input)
user = json.loads(json_obj, object_hook=create_user)
print(f"User {user.name.first} {user.name.last}, age {user.dob.age}, email {user.email}")
User Ian Walters, age 75, email [email protected]
where namedtuple('User', input.keys())
creates a tuple subclass called User
with the input's keys as attributes names, and User(**input)
assigns the corresponding values for the attributes.