Reading and Writing YAML to a File in Python

Introduction

In this tutorial, we're going to learn how to use the YAML library in Python 3. YAML stands for Yet Another Markup Language.

In recent years it has become very popular for its use in storing data in a serialized manner for configuration files. Since YAML essentially is a data format, the YAML library is quite brief, as the only functionality required of it is the ability to parse YAML formatted files.

In this article we will start with seeing how data is stored in a YAML file, followed by loading that data into a Python object. Lastly, we will learn how to store a Python object in a YAML file. So, let's begin.

Before we move further, there are a few prerequisites for this tutorial. You should have a basic understanding of Python's syntax, and/or have done at least beginner level programming experience with some other language. Other than that, the tutorial is quite simple and easy to follow for beginners.

Installation

The installation process for YAML is fairly straight forward. There are two ways to do it; we'll start with the easy one first:

Method 1: Via Pip

The easiest way to install the YAML library in Python is via the pip package manager. If you have pip installed in your system, run the following command to download and install YAML:

$ pip install pyyaml

Method 2: Via Source

In case you do not have pip installed, or are facing some problem with the method above, you can go to the library's source page. Download the repository as a zip file, open the terminal or command prompt, and navigate to the directory where the file is downloaded. Once you are there, run the following command:

$ python setup.py install

YAML Code Examples

In this section, we will learn how to handle (manipulate) YAML files, starting with how to read them i.e. how to load them into our Python script so that we can use them as per our needs. So, let's start.

Reading YAML Files in Python

In this section, we will see how to read YAML files in Python.

Let's start by making two YAML formatted files.

The contents of the first file are as follows:

# fruits.yaml file

apples: 20
mangoes: 2
bananas: 3
grapes: 100
pineapples: 1

The contents of the second file are as follows:

# categories.yaml file

sports:

  - soccer
  - football
  - basketball
  - cricket
  - hockey
  - table tennis

countries:

  - Pakistan
  - USA
  - India
  - China
  - Germany
  - France
  - Spain

You can see that the fruits.yaml and categories.yaml files contain different types of data. The former contains information only about one entity, i.e. fruits, while the latter contains information about sports and countries.

Let's now try to read the data from the two files that we created using a Python script. The load() method from the yaml module can be used to read YAML files. Look at the following script:

# process_yaml.py file

import yaml

with open(r'E:\data\fruits.yaml') as file:
    # The FullLoader parameter handles the conversion from YAML
    # scalar values to Python the dictionary format
    fruits_list = yaml.load(file, Loader=yaml.FullLoader)

    print(fruits_list)

Output:

{ 'apples': 20, 'mangoes': 2, 'bananas': 3, 'grapes': 100, 'pineapples': 1 }

In the script above we specified yaml.FullLoader as the value for the Loader parameter which loads the full YAML language, avoiding the arbitrary code execution. Instead of using the load function and then passing yaml.FullLoader as the value for the Loader parameter, you can also use the full_load() function, as we will see in the next example.

Let's now try and read the second YAML file in a similar manner using a Python script:

# read_categories.py file

import yaml

with open(r'E:\data\categories.yaml') as file:
    documents = yaml.full_load(file)

    for item, doc in documents.items():
        print(item, ":", doc)

Since there are 2 documents in the categories.yaml file, we ran a loop to read both of them.

Output:

sports : ['soccer', 'football', 'basketball', 'cricket', 'hockey', 'table tennis']
countries : ['Pakistan', 'USA', 'India', 'China', 'Germany', 'France', 'Spain']

As you can see from the last two examples, the library automatically handles the conversion of YAML formatted data to Python dictionaries and lists.

Writing YAML Files in Python

Now that we have learned how to convert a YAML file into a Python dictionary, let's try to do things the other way around i.e. serialize a Python dictionary and store it into a YAML formatted file. For this purpose, let's use the same dictionary that we got as an output from our last program.

import yaml

dict_file = [{'sports' : ['soccer', 'football', 'basketball', 'cricket', 'hockey', 'table tennis']},
{'countries' : ['Pakistan', 'USA', 'India', 'China', 'Germany', 'France', 'Spain']}]

with open(r'E:\data\store_file.yaml', 'w') as file:
    documents = yaml.dump(dict_file, file)

The dump() method takes the Python dictionary as the first, and a File object as the second parameter.

Once the above code executes, a file named store_file.yaml will be created in your current working directory.

# store_file.yaml file contents:

- sports:

  - soccer
  - football
  - basketball
  - cricket
  - hockey
  - table tennis
- countries:

  - Pakistan
  - USA
  - India
  - China
  - Germany
  - France
  - Spain

Another useful functionality that the YAML library offers for the dump() method is the sort_keys parameter. To show what it does, let's apply it on our first file, i.e. fruits.yaml:

import yaml

with open(r'E:\data\fruits.yaml') as file:
    doc = yaml.load(file, Loader=yaml.FullLoader)

    sort_file = yaml.dump(doc, sort_keys=True)
    print(sort_file)

Output:

apples: 20
bananas: 3
grapes: 100
mangoes: 2
pineapples: 1

You can see in the output that the fruits have been sorted in the alphabetical order.

Conclusion

In this brief tutorial, we learned how to install Python's YAML library (pyyaml) to manipulate YAML formatted files. We covered loading the contents of a YAML file into our Python program as dictionaries, as well as serializing Python dictionaries in to YAML files and storing their keys. The library is quite brief and only offers basic functionalities.