Convert YAML Array into Java List with SnakeYAML

Introduction

YAML is one of the most popular data serialization language after JSON. Therefore, it’s sometimes called as a strict superset of JSON. It has been designed for human interaction and readability right from the beginning, hence, it’s known for its simplicity. It is designed with flexibility and accessibility in mind, so it works with all modern programming languages and a powerful format for writing configuration files. It is also used for data persistence, Internet messaging, cross-language data sharing, and many more options.

YAML was started in 2001 and it was termed as "Yet Another Markup Language" at that time. But later it was trademarked as "YAML Ain’t Markup Language". The basic structure of a YAML file is a map. It is also known as a dictionary, hash(map) or simply object-based upon the programming language that we opt to use.

Whitespace and indentation are used in YAML files to denote nesting.

Note: Only spaces may be used for indentation in YAML files; tab characters are not permitted. As long as the indentation is done consistently, it doesn't matter how many spaces are utilized.

YAML Syntax

A YAML format primarily uses 3 node types:

  • Maps/Dictionaries: A map node's content is an unordered collection of key/value node pairs, with the requirement that each key must be distinct. No further limitations are imposed on the nodes by YAML.
  • Arrays/Lists: An array node's content is an ordered collection of zero or more nodes. A sequence may include the same node more than once, in particular. It may even contain itself.
  • Literals (Strings, numbers, boolean, etc.): A sequence of zero or more Unicode characters can be used to represent the opaque data that makes up a scalar node's content.

In this article, we will specifically take a look at converting YAML array content into a List in Java. There are lots of open-source libraries available but the most popular out of them are Jackson and SnakeYAML. In this guide, we will use SnakeYaml as our library to parse the YAML content.

SnakeYAML

SnakeYAML is a YAML-parsing package that offers a high-level API for YAML document serialization and deserialization. The entry point for SnakeYAML is the Yaml class. The documents or the YAML files can be loaded using load() method or in batch via the loadAll() method. The methods take genuine YAML data in the form of String objects as well as InputStreams, which is a typical file type to encounter.

Given the <key>:<value> structure innate to YAML files, SnakeYAML naturally works well with Java Maps, but we may also use unique Java objects.

To include the library in our project, add the following dependency to your pom.xml file:

<dependencies>
    <dependency>
        <groupId>org.yaml</groupId>
        <artifactId>snakeyaml</artifactId>
        <version>1.33</version>
    </dependency>
</dependencies>

Or, if you're using Gradle:

compile group: 'org.yaml', name: 'snakeyaml', version: '1.33'

Reading a Simple YAML Array

Let's quickly start by reading a simple array from a YAML file. Consider that we have a YAML file with following data in our Java project’s resources folder:

- One
- Two
- Three
- Four

Then we can load the file content as an InputStream. Next, we will construct the Yaml instance which will then act as an entry point for accessing the library and the object to represent the YAML file contents programmatically. The load() method allows us to read and parse any InputStream with valid YAML data:

public void readYamlWithArray() {
    InputStream inputStream = this.getClass()
            .getClassLoader()
            .getResourceAsStream("number.yml");
    Yaml yaml = new Yaml();
    List<String> data = yaml.load(inputStream);
    System.out.println(data);
 }

The method will return a Java List of String data. If we print the data then it will give the following result:

[One, Two, Three, Four]

Reading a YAML Grouped Array

Sometimes we would like to define an array of content against a given key. This is called grouping of arrays into a YAML map node. A sample YAML of such sort looks like below:

languages:
  - Java
  - JavaScript
  - Python
  - Golang
  - Perl
  - Shell
  - Scala

This can be considered as Java Map containing a <key>:<value> where the value is an array. So the data will still be loaded as InputStream as we defined above. But the data must be defined as Map of List of Strings:

public void readYamlWithArrayGroup() {
    InputStream inputStream = this.getClass()
            .getClassLoader()
            .getResourceAsStream("language.yml");
    Yaml yaml = new Yaml();
    Map<String, List<String>> data = yaml.load(inputStream);
    System.out.println(data);
    // Extract values (list) from the map
    data.values()
            .stream()
            .collect(Collectors.toList())
            .get(0)
            .forEach(System.out::println);
}

Now if we read our data, it would look something like this:

{languages=[Java, JavaScript, Python, Golang, Perl, Shell, Scala]}
Java
JavaScript
Python
Golang
Perl
Shell
Scala

Reading a YAML Multi-Line Array of Arrays

Sometimes we come across a YAML file having data containing an array of arrays. For example, we group the courses and represent them as array of arrays like below:

courses:
  - - C
    - Java
    - Data Structures
    - Algorithms
  - - Big Data
    - Spark
    - Kafka
    - Machine Learning

This can be parsed as Java Map of List of List of String. We can again load the InputStream as we did earlier. But the data will be loaded as below:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

public void readYamlWithMultiLineArrayGroup() {
    InputStream inputStream = this.getClass()
            .getClassLoader()
            .getResourceAsStream("courses.yml");
    Yaml yaml = new Yaml();
    // Load yaml into a Map
    Map<String, List<List<String>>> data = yaml.load(inputStream);
    System.out.println(data);
    // Extract first sublist
    System.out.println("First Array Group:");
    data.values()
            .stream()
            .collect(Collectors.toList())
            .get(0)
            .get(0)
            .forEach(System.out::println);
    // Extract second sublist
    System.out.println("\nSecond Array Group:");
    data.values()
            .stream()
            .collect(Collectors.toList())
            .get(0)
            .get(1)
            .forEach(System.out::println);
}

So if we print the data, it would look something like below:

{courses=[[C, Java, Data Structures, Algorithms], [Big Data, Spark, Kafka, Machine Learning]]}

First Array Group:
C
Java
Data Structures
Algorithms

Second Array Group:
Big Data
Spark
Kafka
Machine Learning

Reading a Complex Nested YAML Content as Java Bean

We saw how we can handle the array type content separately, but with complex nested YAML files - having a map of maps with lists of lists is hard to intuitively parse through and difficult to deal with. Even in the last example where we only had two nested lists - handling them as lists gets fairly verbose.

In these cases, it's best to create a POJO that can be mapped to the nested YAML data. Let's first create a sample YAML containing the nested content of a website:

website: stackabuse
skills:
  - python
  - javascript
  - java
  - unix
  - machine learning
  - web development
tutorials:
  - graphs:
      name: Graphs in Python - Theory and Implementation
      tags:
        - python
        - data structures
        - algorithm
      contributors:
        - David Landup
        - Dimitrije Stamenic
        - Jovana Ninkovic
      last_updated: June 2022
  - git:
      name: Git Essentials - Developer's Guide to Git
      tags:
        - git
      contributors:
        - David Landup
        - François Dupire
        - Jovana Ninkovic
      last_updated: April 2022
  - deep learning:
      name: Practical Deep Learning for Computer Vision with Python
      tags:
        - python
        - machine learning
        - tensorflow
        - computer vision
      contributors:
        - David Landup
        - Jovana Ninkovic
      last_updated: October 2022
published: true

We need to define a parent Java class WebsiteContent that will consist of List of skills and a List of Map of tutorials which will again contain lists of tags and contributors:

public class WebsiteContent {
    private String website;
    private List<String> skills;
    private List<Map<String, Tutorial>> tutorials;
    private Boolean published;

    // Getters and setters

    @Override
    public String toString() {
        return "WebsiteContent{" +
                "website='" + website + '\'' +
                ", skills=" + skills +
                ", tutorials=" + tutorials +
                ", published=" + published +
                '}';
    }
}
public class Tutorial {

    private String name;
    private List<String> tags;
    private List<String> contributors;
    private String lastUpdated;

    // Getters and setters

    @Override
    public String toString() {
        return "Tutorial{" +
                "name='" + name + '\'' +
                ", tags=" + tags +
                ", contributors=" + contributors +
                ", lastUpdated='" + lastUpdated + '\'' +
                '}';
    }
}

Now we can again load the data from the file as InputStream as we did earlier. Next when we create our Yaml class object, we need to specify the data type we want to cast the data into. The new Constructor(WebsiteContent.class) tells SnakeYAML to read the data from the YAML file and map it to our WebsiteContent object.

The mapping is straightforward and the names of our object attributes must match the names of the YAML attributes.

public void readYamlAsBeanWithNestedArrays(){
    // Read file
    InputStream inputStream = this.getClass()
            .getClassLoader()
            .getResourceAsStream("website_content.yml");
            
    // YAML -> POJO
    Yaml yaml = new Yaml(new Constructor(WebsiteContent.class));
    WebsiteContent data = yaml.load(inputStream);
    
    // Print data
    System.out.println(data);
    System.out.println("\nList of Skills: ");
    data.getSkills().stream().forEach(System.out::println);
    System.out.println("\nList of Tutorials: ");
    data.getTutorials().stream().forEach(System.out::println);
}

Finally, when we print the data, it would look something like below:

WebsiteContent{website='stackabuse', skills=[python, javascript, java, unix, machine learning, web development], tutorials=[{graphs={name=Graphs in Python - Theory and Implementation, tags=[python, data structures, algorithm], contributors=[David Landup, Dimitrije Stamenic, Jovana Ninkovic], last_updated=June 2022}}, {git={name=Git Essentials - Developer's Guide to Git, tags=[git], contributors=[David Landup, François Dupire, Jovana Ninkovic], last_updated=April 2022}}, {deep learning={name=Practical Deep Learning for Computer Vision with Python, tags=[python, machine learning, tensorflow, computer vision], contributors=[David Landup, Jovana Ninkovic], last_updated=October 2022}}], published=true}

List of Skills: 
python
javascript
java
unix
machine learning
web development

List of Tutorials: 
{graphs={name=Graphs in Python - Theory and Implementation, tags=[python, data structures, algorithm], contributors=[David Landup, Dimitrije Stamenic, Jovana Ninkovic], last_updated=June 2022}}
{git={name=Git Essentials - Developer's Guide to Git, tags=[git], contributors=[David Landup, François Dupire, Jovana Ninkovic], last_updated=April 2022}}
{deep learning={name=Practical Deep Learning for Computer Vision with Python, tags=[python, machine learning, tensorflow, computer vision], contributors=[David Landup, Jovana Ninkovic], last_updated=October 2022}}

As we can see, SnakeYAML has successfully parsed and converted the WebsiteContent object and kept the inheritance and association with Tutorial object intact.

Conclusion

As YAML files are used widely for DevOps and configuration related data, it’s quite useful to parse and manipulate the data using code.

SnakeYAML allows us to manage YAML files in our Java project with ease, and it only requires a little bit of code to load YAML files into our project or write data into YAML files. Additionally, SnakeYAML offers formatting choices so you may adjust and personalize it to suit our needs.

Last Updated: June 15th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Arpendu Kumar GaraiAuthor

Full-Stack developer with deep knowledge in Java, Microservices, Cloud Computing, Big Data, MERN, Javascript, Golang, and its relative frameworks. Besides coding and programming, I am a big foodie, love cooking, and love to travel.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms