Introduction
YAML is one of the most popular data serialization language after JSON. Therefore, it’s sometimes called as a strict superset of JSON. It has been designed for human interaction and readability right from the beginning, hence, it’s known for its simplicity. It is designed with flexibility and accessibility in mind, so it works with all modern programming languages and a powerful format for writing configuration files. It is also used for data persistence, Internet messaging, cross-language data sharing, and many more options.
YAML was started in 2001 and it was termed as "Yet Another Markup Language" at that time. But later it was trademarked as "YAML Ain’t Markup Language". The basic structure of a YAML file is a map. It is also known as a dictionary, hash(map) or simply object-based upon the programming language that we opt to use.
Whitespace and indentation are used in YAML files to denote nesting.
Note: Only spaces may be used for indentation in YAML files; tab characters are not permitted. As long as the indentation is done consistently, it doesn't matter how many spaces are utilized.
YAML Syntax
A YAML format primarily uses 3 node types:
- Maps/Dictionaries: A map node's content is an unordered collection of key/value node pairs, with the requirement that each key must be distinct. No further limitations are imposed on the nodes by YAML.
- Arrays/Lists: An array node's content is an ordered collection of zero or more nodes. A sequence may include the same node more than once, in particular. It may even contain itself.
- Literals (Strings, numbers, boolean, etc.): A sequence of zero or more Unicode characters can be used to represent the opaque data that makes up a scalar node's content.
In this article, we will specifically take a look at converting YAML array content into a List in Java. There are lots of open-source libraries available but the most popular out of them are Jackson and SnakeYAML. In this guide, we will use SnakeYaml as our library to parse the YAML content.
SnakeYAML
SnakeYAML is a YAML-parsing package that offers a high-level API for YAML document serialization and deserialization. The entry point for SnakeYAML is the Yaml
class. The documents or the YAML files can be loaded using load()
method or in batch via the loadAll()
method. The methods take genuine YAML data in the form of String objects as well as InputStreams
, which is a typical file type to encounter.
Given the <key>:<value>
structure innate to YAML files, SnakeYAML naturally works well with Java Maps, but we may also use unique Java objects.
To include the library in our project, add the following dependency to your pom.xml
file:
<dependencies>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.33</version>
</dependency>
</dependencies>
Or, if you're using Gradle:
compile group: 'org.yaml', name: 'snakeyaml', version: '1.33'
Reading a Simple YAML Array
Let's quickly start by reading a simple array from a YAML file. Consider that we have a YAML file with following data in our Java project’s resources folder:
- One
- Two
- Three
- Four
Then we can load the file content as an InputStream
. Next, we will construct the Yaml
instance which will then act as an entry point for accessing the library and the object to represent the YAML file contents programmatically. The load()
method allows us to read and parse any InputStream
with valid YAML data:
public void readYamlWithArray() {
InputStream inputStream = this.getClass()
.getClassLoader()
.getResourceAsStream("number.yml");
Yaml yaml = new Yaml();
List<String> data = yaml.load(inputStream);
System.out.println(data);
}
The method will return a Java List
of String data. If we print the data
then it will give the following result:
[One, Two, Three, Four]
Reading a YAML Grouped Array
Sometimes we would like to define an array of content against a given key. This is called grouping of arrays into a YAML map node. A sample YAML of such sort looks like below:
languages:
- Java
- JavaScript
- Python
- Golang
- Perl
- Shell
- Scala
This can be considered as Java Map
containing a <key>:<value>
where the value is an array. So the data will still be loaded as InputStream
as we defined above. But the data
must be defined as Map
of List
of String
s:
public void readYamlWithArrayGroup() {
InputStream inputStream = this.getClass()
.getClassLoader()
.getResourceAsStream("language.yml");
Yaml yaml = new Yaml();
Map<String, List<String>> data = yaml.load(inputStream);
System.out.println(data);
// Extract values (list) from the map
data.values()
.stream()
.collect(Collectors.toList())
.get(0)
.forEach(System.out::println);
}
Now if we read our data
, it would look something like this:
{languages=[Java, JavaScript, Python, Golang, Perl, Shell, Scala]}
Java
JavaScript
Python
Golang
Perl
Shell
Scala
Reading a YAML Multi-Line Array of Arrays
Sometimes we come across a YAML file having data containing an array of arrays. For example, we group the courses and represent them as array of arrays like below:
courses:
- - C
- Java
- Data Structures
- Algorithms
- - Big Data
- Spark
- Kafka
- Machine Learning
This can be parsed as Java Map
of List
of List
of String
. We can again load the InputStream
as we did earlier. But the data will be loaded as below:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
public void readYamlWithMultiLineArrayGroup() {
InputStream inputStream = this.getClass()
.getClassLoader()
.getResourceAsStream("courses.yml");
Yaml yaml = new Yaml();
// Load yaml into a Map
Map<String, List<List<String>>> data = yaml.load(inputStream);
System.out.println(data);
// Extract first sublist
System.out.println("First Array Group:");
data.values()
.stream()
.collect(Collectors.toList())
.get(0)
.get(0)
.forEach(System.out::println);
// Extract second sublist
System.out.println("\nSecond Array Group:");
data.values()
.stream()
.collect(Collectors.toList())
.get(0)
.get(1)
.forEach(System.out::println);
}
So if we print the data
, it would look something like below:
{courses=[[C, Java, Data Structures, Algorithms], [Big Data, Spark, Kafka, Machine Learning]]}
First Array Group:
C
Java
Data Structures
Algorithms
Second Array Group:
Big Data
Spark
Kafka
Machine Learning
Reading a Complex Nested YAML Content as Java Bean
We saw how we can handle the array type content separately, but with complex nested YAML files - having a map of maps with lists of lists is hard to intuitively parse through and difficult to deal with. Even in the last example where we only had two nested lists - handling them as lists gets fairly verbose.
In these cases, it's best to create a POJO that can be mapped to the nested YAML data. Let's first create a sample YAML containing the nested content of a website:
website: stackabuse
skills:
- python
- javascript
- java
- unix
- machine learning
- web development
tutorials:
- graphs:
name: Graphs in Python - Theory and Implementation
tags:
- python
- data structures
- algorithm
contributors:
- David Landup
- Dimitrije Stamenic
- Jovana Ninkovic
last_updated: June 2022
- git:
name: Git Essentials - Developer's Guide to Git
tags:
- git
contributors:
- David Landup
- François Dupire
- Jovana Ninkovic
last_updated: April 2022
- deep learning:
name: Practical Deep Learning for Computer Vision with Python
tags:
- python
- machine learning
- tensorflow
- computer vision
contributors:
- David Landup
- Jovana Ninkovic
last_updated: October 2022
published: true
We need to define a parent Java class WebsiteContent
that will consist of List
of skills and a List
of Map
of tutorials which will again contain lists of tags and contributors:
public class WebsiteContent {
private String website;
private List<String> skills;
private List<Map<String, Tutorial>> tutorials;
private Boolean published;
// Getters and setters
@Override
public String toString() {
return "WebsiteContent{" +
"website='" + website + '\'' +
", skills=" + skills +
", tutorials=" + tutorials +
", published=" + published +
'}';
}
}
public class Tutorial {
private String name;
private List<String> tags;
private List<String> contributors;
private String lastUpdated;
// Getters and setters
@Override
public String toString() {
return "Tutorial{" +
"name='" + name + '\'' +
", tags=" + tags +
", contributors=" + contributors +
", lastUpdated='" + lastUpdated + '\'' +
'}';
}
}
Now we can again load the data from the file as InputStream
as we did earlier. Next when we create our Yaml
class object, we need to specify the data type we want to cast the data into. The new Constructor(WebsiteContent.class)
tells SnakeYAML to read the data from the YAML file and map it to our WebsiteContent
object.
The mapping is straightforward and the names of our object attributes must match the names of the YAML attributes.
public void readYamlAsBeanWithNestedArrays(){
// Read file
InputStream inputStream = this.getClass()
.getClassLoader()
.getResourceAsStream("website_content.yml");
// YAML -> POJO
Yaml yaml = new Yaml(new Constructor(WebsiteContent.class));
WebsiteContent data = yaml.load(inputStream);
// Print data
System.out.println(data);
System.out.println("\nList of Skills: ");
data.getSkills().stream().forEach(System.out::println);
System.out.println("\nList of Tutorials: ");
data.getTutorials().stream().forEach(System.out::println);
}
Finally, when we print the data
, it would look something like below:
WebsiteContent{website='stackabuse', skills=[python, javascript, java, unix, machine learning, web development], tutorials=[{graphs={name=Graphs in Python - Theory and Implementation, tags=[python, data structures, algorithm], contributors=[David Landup, Dimitrije Stamenic, Jovana Ninkovic], last_updated=June 2022}}, {git={name=Git Essentials - Developer's Guide to Git, tags=[git], contributors=[David Landup, François Dupire, Jovana Ninkovic], last_updated=April 2022}}, {deep learning={name=Practical Deep Learning for Computer Vision with Python, tags=[python, machine learning, tensorflow, computer vision], contributors=[David Landup, Jovana Ninkovic], last_updated=October 2022}}], published=true}
List of Skills:
python
javascript
java
unix
machine learning
web development
List of Tutorials:
{graphs={name=Graphs in Python - Theory and Implementation, tags=[python, data structures, algorithm], contributors=[David Landup, Dimitrije Stamenic, Jovana Ninkovic], last_updated=June 2022}}
{git={name=Git Essentials - Developer's Guide to Git, tags=[git], contributors=[David Landup, François Dupire, Jovana Ninkovic], last_updated=April 2022}}
{deep learning={name=Practical Deep Learning for Computer Vision with Python, tags=[python, machine learning, tensorflow, computer vision], contributors=[David Landup, Jovana Ninkovic], last_updated=October 2022}}
As we can see, SnakeYAML has successfully parsed and converted the WebsiteContent
object and kept the inheritance and association with Tutorial
object intact.
Conclusion
As YAML files are used widely for DevOps and configuration related data, it’s quite useful to parse and manipulate the data using code.
SnakeYAML allows us to manage YAML files in our Java project with ease, and it only requires a little bit of code to load YAML files into our project or write data into YAML files. Additionally, SnakeYAML offers formatting choices so you may adjust and personalize it to suit our needs.