Guide to Java 8 Collectors: groupingBy()

Guide to Java 8 Collectors: groupingBy()

Introduction

A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from.

Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source.

To "solidify" the changes, you collect the elements of a stream back into a Collection.

Collectors and Stream.collect()

Collectors represent implementations of the Collector interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc.

All predefined implementations can be found within the Collectors class.

You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them.

To be able to use the class in our code we need to import it:

import static java.util.stream.Collectors.*;

Stream.collect() performs a mutable reduction operation on the elements of the stream.

A mutable reduction operation collects input elements into a mutable container, such as a Collection, as it processes the elements of the stream.

We'll be using Stream.collect() quite often in this guide, paired with the Collectors.groupingBy() collector.

Collectors.groupingBy()

The Collectors class is vast and versatile, and one of its many methods that's also the main topic of this article is Collectors.groupingBy(). This method gives us a similar functionality to the "GROUP BY" statement in SQL.

We use the Collectors.groupingBy() to group objects by a given specific property and store the end result in a map.

Let's define a simple class with a few fields, and a classic constructor and getters/setters. We'll be using this class to group instances of Students by their subject, city and age:

public class Student {
    private String subject;
    private String name;
    private String surname;
    private String city;
    private int age;

   // Constructors, Getters, Setters, toString()
}

Let's instantiate a List of students we'll be using in the examples to come:

List<Student> students = Arrays.asList(
    new Student("Math", "John", "Smith", "Miami", 19),
    new Student("Programming", "Mike", "Miles", "New York", 21),
    new Student("Math", "Michael", "Peterson", "New York", 20),
    new Student("Math", "James", "Robertson", "Miami", 20),
    new Student("Programming", "Kyle", "Miller", "Miami", 20)
);

The Collectors.groupingBy() method has three overloads within the Collectors class - each building opon the other. We'll cover each one in the proceeding sections.

Collectors.groupingBy() with a Classification Function

The first variant of the Collectors.groupingBy() method takes only one parameter - a classification function. Its syntax is as follows:

public static <T,K> Collector<T,?,Map<K,List<T>>> 
    groupingBy(Function<? super T,? extends K> classifier)

This method returns a Collector that groups the input elements of type T according to the classification function, and returns the result in a Map.

The classification function maps elements to a key of type K. As we mentioned, the collector makes a Map<K, List<T>>, whose keys are the values resulting from applying the classification function on the input elements. The values of those keys are Lists containing the input elements which map to the associated key.

This is the simplest variant of the three. Not to say that the others are more difficult to understand, it's just that this specific implementation takes the least arguments.

Let's group our students into groups of students by their subjects:

Map<String, List<Student>> studentsBySubject = students
    .stream()
    .collect(
        Collectors.groupingBy(Student::getSubject)
    );

After this one line executes, we have a Map<K, V> where in our case K would be either Math or Programming, and V represents a List of Student objects that were mapped into the subject K the student is currently taking. Now, if we just printed our studentBySubject map, we'd see two groups with a couple of students each:

{
Programming=[Student{name='Mike', surname='Miles'}, Student{name='Kyle', surname='Miller'}], 
Math=[Student{name='John', surname='Smith'}, Student{name='Michael', surname='Peterson'}, Student{name='James', surname='Robertson'}]
}

We can see that this looks somewhat similar to what we would expect in the result - there are 2 students currently taking a Programming class, and 3 taking Math.

Collectors.groupingBy() with a Classification Function and Downstream Collector

When just grouping isn't quite enough - you can also supply a downstream collector to the groupingBy() method:

public static <T,K,A,D> Collector<T,?,Map<K,D>> 
    groupingBy(Function<? super T,? extends K> classifier, 
               Collector<? super T,A,D> downstream)

This method returns a Collector that groups the input elements of type T according to the classification function, afterwards applying a reduction operation on the values associated with a given key using the specified downstream Collector.

As mentioned earlier, the reduction operation "reduces" the data we've collected by applying an operation that's useful in a specific situation.

If you'd like to read more about reduction in Java in great detail - read our Java 8 Streams: Definitive Guide to reduce()!

In this example we want to group the students by the city they're from, but not the entire Student objects. Say we'd like to just collect their names (reduce them to a name).

As the downstream here we'll be using Collectors.mapping() method, which takes 2 parameters:

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

  • A mapper - a function to be applied to the input elements and
  • A downstream collector – a collector which will accept mapped values

Collectors.mapping() itself does a pretty straightforward job. It adapts a collector accepting elements of one type to accept a different type by applying a mapping function to each input element before accumulation. In our case, we'll map each Student to their name and return those names as a list.

For simplicity's sake as we've only 5 students in our ArrayList, we only have Miami and New York as the cities. To group the students in the previously mentioned manner we need to run the following piece of code:

Map<String, List<String>> studentsByCity = students.stream()
              .collect(Collectors.groupingBy(
                  Student::getCity, 
                  Collectors.mapping(Student::getName, Collectors.toList())));
	
System.out.println(studentsByCity);

Note: instead of a List<String> we could've used a Set<String>, for instance. If we opt for that, we'd also need to replace the toList() part of our code to toSet().

This time around, we'll have a Map of cities, with a list of student names associated with a city. These are reductions of students, where we've reduced them to a name, though you could substitute this with any other reduction operation as well:

{New York=[Mike, Michael], Miami=[John, James, Kyle]}
Collectors.groupingBy() with Collectors.counting()

Again, reduction operations are very powerful and can be used to find the minimum, maximum, average, sums, as well as otherwise reduce collections into smaller cohesive wholes.

There's a wide variety of operations you can do via reduction, and if you'd like to learn more about the possibilities, again, read our Java 8 Streams: Guide to reduce()!

Instead of reducing students to their names, we can reduce lists of students to their counts, for instance, which can easily be achieved through Collectors.counting() as a wrapper for a reduction operation:

Map<Integer, Long> countByAge = students.stream()
                .collect(Collectors.groupingBy(
                    Student::getAge, 
                    Collectors.counting()));

The countByAge map will now contain groups of students, grouped by their age, and the values of these keys will be the count of students in each group:

{19=1, 20=3, 21=1}

Again, there's a wide variety of things you can do with reduction operations, and this is just a single facet of that.

Multiple Collectors.groupingBy()

A similar yet another powerful application of the downstream collector is that we can do another Collectors.groupingBy().

Say we want to first filter all our students by their age (those older than 20), and then group them up by their age. Each of these groups will have additional groups of students, grouped by their cities:

{
20={New York=[Student{name='Michael', surname='Peterson'}], Miami=[Student{name='James', surname='Robertson'}, Student{name='Kyle', surname='Miller'}]}, 
21={New York=[Student{name='Mike', surname='Miles'}]}
}

If you'd like to read more about the filtering, read our Java 8 Streams: Guide to filter()!

Collectors.groupingBy() with a Classification Function, Downstream Collector and Supplier

The third and final overloaded groupingBy() method variant takes the same two parameters as before, but with the addition of one more - a supplier method.

This method provides the specific Map implementation we want to use to contain our end result:

public static <T,K,D,A,M extends Map<K,D>> Collector<T,?,M> 
    groupingBy(Function<? super T,? extends K> classifier,
               Supplier<M> mapFactory,
               Collector<? super T,A,D> downstream)

This implementation differs from the previous one only slightly, both in code and in the works. It returns a Collector that groups the input elements of type T according to the classification function, afterwards applying a reduction operation on the values associated with a given key using the specified downstream Collector. Meanwhile, the Map is implemented using the supplied mapFactory supplier.

For this example we'll also just modify the previous example:

Map<String, List<String>> namesByCity = students.stream()
                .collect(Collectors.groupingBy(
                        Student::getCity,
                        TreeMap::new, 
                        Collectors.mapping(Student::getName, Collectors.toList())));

Note: We could've used any other Map implementation that Java offers - like a HashMap or a LinkedHashMap as well.

To recap, this code will give us a grouped list of students by the city they're from, and since we're using a TreeMap here, the cities' names will be sorted.

The only difference from earlier is that we've added another parameter - TreeMap::new that specifies the exact implementation of Map we want to use:

{Miami=[John, James, Kyle], New York=[Mike, Michael]}

This makes the process of collecting streams into maps much easier than having to stream again and re-insert elements back using a different implementation, such as:

Map<String, List<String>> namesByCity = students.stream().collect(Collectors.groupingBy(
                Student::getCity,
                Collectors.mapping(Student::getName, Collectors.toList())))
            .entrySet()
            .stream()
                    .sorted(comparing(e -> e.getKey()))
                    .collect(Collectors.toMap(
                            Map.Entry::getKey,
                            Map.Entry::getValue,
                            (a, b) -> {
                                throw new AssertionError();
                            },
                            LinkedHashMap::new
                    ));

Long, convoluted, multiple-streamed code like this can be fully replaced with a much simpler overloaded version when you use a Supplier.

This piece of code also results in the same output as before:

{Miami=[John, James, Kyle], New York=[Mike, Michael]}

Conclusion

The Collectors class is a powerful one and allows us to collect streams into collections in various ways.

You can define your own collectors, but the built-in collectors can get you very far as they're generic and can be generalized to the vast majority of tasks you can think of.

In this guide, we've taken a look at the groupingBy() collector, which groups entities based on a classification function (usually boiling down to a field of an object), as well as its overloaded variants.

You've learned how to use the basic form, as well as forms with downstream collectors and suppliers to simplify code and run powerful yet simple functional operations on streams.

Last Updated: November 28th, 2021
Was this article helpful?

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    Make Clarity from Data - Quickly Learn Data Visualization with Python

    Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

    From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

    © 2013-2021 Stack Abuse. All rights reserved.