Guide to Java 8 Collectors: toSet()

Guide to Java 8 Collectors: toSet()

Introduction

A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from.

Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source.

To "solidify" the changes, you collect the elements of a stream back into a Collection.

In this guide, we'll take a look at how to collect Stream elements to a map in Java 8.

Collectors and Stream.collect()

Collectors represent implementations of the Collector interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc.

All predefined implementations can be found within the Collectors class.

You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them.

To be able to use the class in our code we need to import it:

import static java.util.stream.Collectors.*;

Stream.collect() performs a mutable reduction operation on the elements of the stream.

A mutable reduction operation collects input elements into a mutable container, such as a Collection, as it processes the elements of the stream.

Guide to Collectors.toSet()

The toSet() method is used to collect a stream into a set. It works in a similar fashion to the toList() method, but ultimately collects into a different underlying data structure, by returning a Collector that accumulates the input elements into a new Set.

It's worth noting that there are no guarantees on the type, mutability, serializability, or thread-safety of the Set returned:

public static <T> Collector<T,?,Set<T>> toSet()

A Set doesn't allow duplicate elements or in more formal terms - sets contain no pair of elements a and b such that a.equals(b), and it can contain at most one null element.

If you collect a stream with duplicate elements into a Set - it's a quick way to prune away duplicates:

Stream<String> stream = 
    Stream.of("This", "forms", "forms", "a", "short", "a", "sentence", "sentence");
Set<String> sentenceSet = stream.collect(Collectors.toSet());

However, this example highlights an important characteristic of how Sets are populated - the elements don't retain their relative order when collected, like they do in, say, the toList() collector. This is because the default implementation of a Set is a HashSet, which orders elements based on their hashes and doesn't even guarantee the consistency of this order over time.

We'll take a look at how we can provide a custom implementation of a Set in a later section.

Running this piece of code results in:

[sentence, a, This, short, forms]

Since we rarely work with just Strings or primitive objects - let's define a simple class to represent a Book:

public class Book {
    private String title;
    private String author;
    private int releaseYear;
    private int soldCopies;

    // Constructor, getters and setters
}

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

And with that, let's create a simple list of a few books (with a duplicate entry):

List<Book> books = Arrays.asList(
    new Book("The Fellowship of the Ring", "J.R.R. Tolkien", 1954, 30),
    new Book("The Hobbit", "J.R.R. Tolkien", 1937, 40),
    new Book("Animal Farm", "George Orwell", 1945, 32),
    new Book("Nineteen Eighty-Four", "George Orwell", 1949, 50),
    new Book("Nineteen Eighty-Four", "George Orwell", 1949, 38)
);

A pretty typical pipeline consists of a filter() based on some Predicate before collecting back to a collection:

Set<String> booksGeorgeOrwell = books.stream()
                .filter(book->book.getAuthor()
                .equals("George Orwell") && book.getCopiesSold() >= 30)
                .map(Book::getTitle).collect(Collectors.toSet());

System.out.println(booksGeorgeOrwell);

If you'd like to read more about filtering and predicates - read our Java 8 Streams: Definitive Guide to the filter() Method and Functional Programming in Java 8: Definitive Guide to Predicates!

As we discussed earlier, Sets don't allow for duplicates. After we've inspected the clause we've given and which of the books in our List meet the criteria, we should have the following result:

[Animal Farm, Nineteen Eighty-Four]

Works great! The duplicate entry was pruned away, without explicit logic to prune away duplicates.

Conclusion

In this guide, we've taken a look at how to collect and convert a stream into a Set - using Java 8's Streams API, with the Collectors class.

Last Updated: January 10th, 2022
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

Want a remote job?

    © 2013-2022 Stack Abuse. All rights reserved.

    DisclosurePrivacyTerms