Guide to Java 8 Collectors: collectingAndThen()

Guide to Java 8 Collectors: collectingAndThen()

Introduction

A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from.

Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source.

To "solidify" the changes, you collect the elements of a stream back into a Collection.

Collectors represent implementations of the Collector interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc.

All predefined implementations can be found within the Collectors class.

You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them.

To be able to use the class in our code we need to import it:

import static java.util.stream.Collectors.*;

Stream.collect() performs a mutable reduction operation on the elements of the stream.

A mutable reduction operation collects input elements into a mutable container, such as a Collection, as it processes the elements of the stream.

In this guide, we'll be taking a deep dive into the collectingAndThen() collector.

What Does collectingAndThen() Do?

The collectingAndThen() operation accepts two paramaters:

collectingAndThen(Collector d, Function f);

First it calls a pre-existing collector, d and performs a final function, f on the result of d.

Let's take a quick look at how we could use the collectingAndThen() method on a stream of integers:

Stream<Integer> s = Stream.of(12, 13, 14, 15)

Now, assume that you want to collect these values into an unmodifiable list of Integer objects. As a first attempt, we'd create a list of the Integer values:

List<Integer> list = Stream.of(12, 13, 14, 15)
    .collect(
    //Supplier
    () -> new ArrayList<Integer>(),
    //Accumulator
    (l, e) -> l.add(e),
    //Combiner
    (l, ar) -> l.addAll(ar)
);        

We've collected the stream's elements into a list using three parameters:

  • Supplier

  • Accumulator

  • Combiner

Still, for such a simple step, this is a bit too verbose. Luckily, we have the toList() method in the Collectors helper class. We could thus simplify the step by writing:

list = Stream.of(12, 13, 14, 15).collect(toList());

Granted, we've compacted the code into one line. Yet, when we check the class of the list that we've produced by:

System.out.println(list.getClass().getSimpleName());

This results in:

ArrayList

We wanted an unmodifiable list. And, ArrayList isn’t one. A simple fix would be to thus call the method unmodifiableList() from Collections:

List<Integer> ul = Collections.unmodifiableList(list);

And on checking what class we've got as a result:

System.out.println(ul.getClass().getSimpleName());

We get the output:

UnmodifiableRandomAccessList

Hey, but what is an UnmodifiableRandomAccessList? When you check the JDK's source code, you'll see that it extends UnmodifiableList

Whereby the UnmodifiableList:

Returns an unmodifiable view of the specified list. This [class] allows modules to provide users with "read-only" access to internal lists

Thus far, we've seemed to have fulfilled our aim of creating an unmodifiable list from a stream of int values, but we've had to work a lot for it.

This is the exact scenario that Java attempts to remedy with collectingAndThen().

What we want to do is collect the integers, and then do something else (convert the list into an unmodifiable one), which is exactly what we can do with collectingAndThen():

ul = Stream.of(12, 13, 14, 15)
    .collect(
    Collectors.collectingAndThen(
        Collectors.toList(),
        Collections::unmodifiableList
    )
);

And, our result, ul, is of the type: UnmodifiableList. Occam's Razor strikes again! Though, there's a lot more to be said about the method.

How does it really work? Is it efficient? When should you use it? How do we put it into practice?

This guide aims to answer all of these questions.

Definition of collectingAndThen()

Method Signature

The collectingAndThen() method is a factory method in the helper class - Collectors, a part of the Stream API:

public static <T, A, R, RR> Collector<T, A, RR> collectingAndThen(
    Collector<T, A, R> downstream, 
    Function<R, RR> finisher
) {...}

Whereby the parameters represent:

  • downstream: the initial collector that the Collectors class will call.
  • finisher: the function that the Collectors class will apply on downstream.

And, the generic types represent:

  • T: class type of the stream’s elements.
  • A: class type of the elements after the accumulation step of collector downstream.
  • R: class type of the elements after downstream finishes collecting.
  • RR: class type of the elements after you apply finisher on downstream.

And, the return value is:

  • Collector<T, A, RR>: a collector that results from the application of finisher on downstream.

Description

The official Javadoc states that the collectingAndThen() method is useful because it:

Adapts a Collector to perform an additional finishing transformation.

There's not much to be added to this - we oftentimes perform actions on collections after collecting them - and this makes it much easier and less verbose!

How Does collectingAndThen() Work?

The following UML activity diagram summarizes the flow of control in a collectingAndThen() operation. It's a high-level abstraction of what could always occur in such an operation -nonetheless, it shows how routines work in the streaming, collecting, and finishing steps:

When Should You Use collectingAndThen()?

1. When we need an object type other than what a single collect() operation offers:

List<Integer> list = Arrays.asList(1, 2, 3);

Boolean empty = list.stream()
    .collect(collectingAndThen(
        toList(),
        List::isEmpty
    )
);

Here, we managed to get a Boolean out of the List that collect() would've returned.

2. When we need to postpone processing until we can encounter all the elements in a given stream:

String longestName = people.stream()
    .collect(collectingAndThen(
        // Encounter all the Person objects 
        // Map them to their first names
        // Collect those names in a list
        mapping(
            Person::getFirstName,
            toList()
        ),
        // Stream those names again
        // Find the longest name
        // If not available, return "?"
        l -> {
            return l
                .stream()
                .collect(maxBy(
                    comparing(String::length)
                ))
                .orElse("?");
        }
    )
);

Here, for example, we only calculated the longest string after we read all the Person names.

3. And, when we need to wrap a list to make it unmodifiable:

List<Integer> ul = Stream.of(12, 13, 14, 15)
    .collect(
    Collectors.collectingAndThen(
        Collectors.toList(),
        Collections::unmodifiableList
    )
);

Is collectingAndThen() Efficient?

In some use cases, you can replace a collectingAndThen() operation without changing the result of your method. It thus begs the question: would using collectingAndThen() offer fast runtimes?

For example, assume you have a collection of names and you want to know which among them is the longest. Let's create a Person class, which would contain somebody's full name: first and last:

public class Person {
    private final String first;
    private final String last;
    
	// Constructor, getters and setters
}

And say you've got an ExecutionPlan that generates quite a few Person objects:

@State(Scope.Benchmark)
public class ExecutionPlan {
    private List<Person> people;
    
    @Param({"10", "100", "1000", "10000", "100000"})
    int count;
    
    @Setup(Level.Iteration)
    public void setup() {
        people = new ArrayList<>();        
        Name fakeName = new Faker().name();
        
        for (int i = 0; i < count; i++) {
            String fName = fakeName.firstName();
            String lName = fakeName.lastName();
            Person person = new Person(fName, lName);
            
            people.add(person);
        }
    }
    
    public List<Person> getPeople() {
        return people;
    }
}

Note: To easily generate many fake objects with sensible names - we use the Java Faker library. You can also include it in your Maven projects.

The ExecutionPlan class dictates the number of Person objects that you can test. Using a test harness (JMH), the count field would cause the for loop in setup() to emit as many Person objects.

We will find the longest first name using two approaches:

  1. Using the Stream API's intermediate operation, sort().
  2. Using collectingAndThen().

The first approach uses the withoutCollectingAndThen() method:

public void withoutCollectingAndThen() {
    Comparator nameLength = Comparator.comparing(String::length)
        .reversed();
    
    String longestName = people
        .stream()
        .map(Person::getFirstName)
        .sorted(nameLength)
        .findFirst()
        .orElse("?")
}

This approach maps a stream of Person objects to their first names. Then, it sorts the length of the names in a descending order. It uses the static comparing() method from the Comparator interface. Because comparing() would cause the sort to list in ascending order, we call reversed() on it. This will make the stream contain values which start with the largest and end with the smallest.

We conclude the operation by calling findFirst(), which selects the first, largest value. Also, because the result will be an Optional we transform it to a String with orElse().

The second approach uses the withCollectingAndThen() method:

public void withCollectingAndThen() {    
    Collector collector = collectingAndThen(
        Collectors.maxBy(Comparator.comparing(String::length)),
        s -> s.orElse("?")
    );
    
    String longestName = people.stream()
        .map(Person::getFirstName)
        .collect(collector);        
}

This approach is more concise because it contains the downstream collector, maxBy(), so we don't have to sort, reverse, and find the first element. This method is one of the Collectors class' many static methods. It's convenient to use because it returns one element only from a stream - the element with the largest value. The only thing that's left to us is to supply a Comparator implementation to help it work out this value.

In our case, we're looking for the String with the longest length so we use a Comparator.comparing(String::length). Here too, we need to deal with an Optional. The maxBy() operation produces one, which we then turn into a bare String in the finisher step.

If we benchmark these two methods on 10, 100, 1000, 10000 and 100000 Person instances using JMH - we get a pretty clear result:

Benchmark                                            (count)   Mode  Cnt        Score   Error  Units
CollectingAndThenBenchmark.withCollectingAndThen          10  thrpt    2  7078262.227          ops/s
CollectingAndThenBenchmark.withCollectingAndThen         100  thrpt    2  1004389.120          ops/s
CollectingAndThenBenchmark.withCollectingAndThen        1000  thrpt    2    85195.997          ops/s
CollectingAndThenBenchmark.withCollectingAndThen       10000  thrpt    2     6677.598          ops/s
CollectingAndThenBenchmark.withCollectingAndThen      100000  thrpt    2      317.106          ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen       10  thrpt    2  4131641.252          ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen      100  thrpt    2   294579.356          ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen     1000  thrpt    2    12728.669          ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen    10000  thrpt    2     1093.244          ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen   100000  thrpt    2       94.732          ops/s

Note: JMH assigns a score instead of measuring the time it takes to execute a benchmarked operation. The units used were operations per second so the higher the number is, the better, as it indicates a higher throughput.

When you test with ten Person objects, collectingAndThen() runs twice as fast as sort(). Whereas collectingAndThen() can run 7,078,262 operations in a second, sort() runs 4,131,641.

But, with ten thousand of those objects, collectingAndThen() displays even more impressive results. It runs six times as fast as sort()! On larger datasets - it very clearly outperforms the first option so if you're dealing with many records, you'll gain significant performance benefits from collectingAndThen().

Find the complete test results' report on GitHub. There entire test harness is also on this GitHub repository. Go ahead and clone it and run it on your local machine and compare the results.

Putting collectingAndThen() to Practice - Indoor Pollution Dataset Analysis

So far, we've seen that collectingAndThen() can adapt a collector with an extra step. Yet, this capability is even more powerful than you may think. You can nest collectingAndThen() within other operations that also return Collector instances. And remember, collectingAndThen() returns a Collector too. So, you can nest these other operations in it too:

stream.collect(groupingBy(
        groupingBy(
            collectingAndThen(
                downstream,
                finisher
            )
        )
    )    
);

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

This possibility opens up a slew of code design options. You can, for example, use it to group a stream's elements. Or, to partition them according to a given Predicate.

If you'd like to read more about Predicates - read our Functional Programming in Java 8: Definitive Guide to Predicates!

We will see how this works using data of the deaths that indoor air pollution causes. This data contains the mortality rates per 100,000 people. Our World in Data (OWID) has categorized it by age and by year. It contains findings from most of the world's countries and regions. Also, it covers the years from 1990 to 2017.

Domain Design

The domain contains three main classes: Mortality, CountryStats, and StatsSource. The Mortality class contains two fields: the ageGroup and mortality. In essence, the Mortality class is a value class.

See, we have the option of dealing with ageGroup and mortality values on their own. Yet, that's only bound to clutter up the client code. The String values representing age groups wouldn't make sense when you use them on their own. The same applies to the BigDecimal values representing mortality figures.

But, when you use these two together, they clarify what your domain is all about:

public class Mortality implements Comparable {
    private final String ageGroup;
    private final BigDecimal mortality;
    
    //Constructor and getters...
    
    @Override
    public int compareTo(Mortality other) {
        return Comparator.comparing(Mortality::getMortality)
            .compare(this, other);
    }
}

This class also implements the Comparable interface. This is important because it would helps us in sorting Mortality objects. The next class, CountryStats contains mortality data for different age groups. It's another value class and contains the name of a country/region. And, the year in which various deaths occurred across several age groups. It thus gives a snapshot of a country's mortality rates history:

public class CountryStats {
    private final String country;
    private final String code;
    private final String year;
    private final Mortality underFive;
    private final Mortality seventyPlus;
    private final Mortality fiftyToSixtyNine;
    private final Mortality fiveToFourteen;
    private final Mortality fifteenToFourtyNine;
    
    //Constructor and getters...
    
    public Mortality getHighest() {
        Stream<Mortality> stream = Stream.of(
            underFive,
            fiveToFourteen,
            fifteenToFourtyNine,
            fiftyToSixtyNine,
            seventyPlus
        );
        
        Mortality highest = stream.collect(
            collectingAndThen(
                Collectors.maxBy(
                    Comparator.comparing(
                        Mortality::getMortality
                    )
                ),
                m -> m.orElseThrow(
                    RuntimeException::new
                )
            )
        );
        
        return highest;
    }
}

Its getHighest() method helps us know which age group has the highest mortality rate. It uses the collector from maxBy() to know the Mortality object with the highest rate. But, it returns an Optional. Hence, we have an extra finishing step which unwraps the Optional. And it does so in manner which can throw a RuntimeException if the Optional is empty.

The last class, StatsSource handles the mapping of the CSV data to CountryStats. At heart, it acts as a helper class, which gives access to the CSV file containing the mortality rates. It uses the Apache Commons CSV library to read the CSV file containing the data:

public class StatsSource {
    private List<CountryStats> stats;
    
    public List<CountryStats> getStats() {
        if (stats == null) {
            File f; //Get CSV file containing data
            Reader in = new FileReader(f);
            CSVFormat csvf = CSVFormat
                .DEFAULT
                .builder()
                .setHeader()
                .setSkipHeaderRecord(true)
                .build();
            
            Spliterator split = csvf.parse(in)
                .splitIterator();
            
            stats = StreamSupport
                // Set `true` to make stream parallel
                // Set `false` to make sequential
                .stream(split, false)
                .map(StatsSource::toStats)
                .collect(toList());                
        }
        
        return stats;
    }
    
    public static CountryStats toStats(CSVRecord r) {
        // Constructor...
    }
}

Note how it maps the lines in the file to CountryStats objects using a stream. We had the option of using StreamSupport to create a parallel stream of lines by using a true flag. But, we opted to have a serial stream instead by passing false to StreamSupport.

The data in the CSV file comes in an alphabetical order from the source. Yet, by using a parallel stream, we would lose that order.

Using collectingAndThen() in Grouping

We want to present the data from the source in various, useful ways. We want to show, for example, pertinent data in categories of year, country, and mortality rate. A simple use case would be to present the data with only two headers. A country and the year that it suffered the highest mortality rates for children aged under five. In other terms, this is single-level grouping.

In a tabulated format, for example, we would wish to achieve this:

Country Year with highest mortality for kids under 5 years
Afghanistan 1997
Albania 1991
Nigeria 2000
Solomon Islands 2002
Zimbabwe 2011

A more complex one would be to list the countries by the years in which mortality occurred. And in those years, we would want to list the age group that suffered the highest mortality. In statistical terms, we're aiming for multi-level grouping of data. In simple terms, multi-level grouping is akin to creating many single-level groups. We could thus represent these statistics as:

Afghanistan

Year Age Group Reporting Highest Mortality
1990 Under 5 years
1991 Between 50 and 69 years
2000 Over 70 years
2001 Over 70 years
2010 Under 5 years

Papua New Guinea

Year Age Group Reporting Highest Mortality
1990 Over 70 years
1991 Over 70 years
2000 Between 5 and 14 years
2001 Between 5 and 14 years
2010 Between 15 and 49 years

And so on…for every country, from the year 1990 to 2017.

Single-level Grouping with collectingAndThen()

In declarative programming terms, we have three tasks we need the code to perform:

  1. Group the mortality data according to countries.
  2. For each country, find its highest mortality rate for children under five years.
  3. Report the year in which that high rate occurred.
Group by Country

One thing is worth to consider. The CSV file we're dealing with lists mortality data for every country several times. It lists 28 entries for each country. We could thus create a Map out of these entries. The key would be the country name and the value the CountryStats value. And, this is the exact thing the method shouldGroupByCountry() does:

private final StatsSource src = new StatsSource();
private List<CountryStats> stats = src.getStats();
private final Supplier exc = RuntimeException::new;

@Test
public void shouldGroupByCountry() {
    Map result = stats.stream().collect(
        Collectors.groupingBy(
            CountryStats::getCountry,
            Collectors.toList()
        )
    );
    
    System.out.println(result);
}

If you'd like to read more about groupingBy() read our Guide to Java 8 Collectors: groupingBy()!

This Map is large so just printing it out to the console would make it absolutely unreadable. Instead, we can format the output by inserting this code block right after calculating the result variable:

result.entrySet()
    .stream()
    .sorted(comparing(Entry::getKey))
    .limit(2)
    .forEach(entry -> {
     entry.getValue()
         .stream()
         .sorted(comparing(CountryStats::getYear))
         .forEach(stat -> {
             System.out.printf(
                 "%s, %s: %.3f\n",
                 entry.getKey(),
                 stat.getYear(),
                 stat.getUnderFive().getMortality()
             );
         });
    });

The result value is of the type, Map<String, List<CountryStats>>. To make it easier to interpret:

  • We sort the keys in an alphabetical order.
  • We instruct the stream to limit its length to only two Map elements.
  • We deal with outputting the details for every element using forEach().
    • We sort the value (a list of CountryStats values) from the key by year.
    • Then, we print the year and its mortality rate for children under five years.

With that done, we can now get an output such as this:

Afghanistan, 1990: 9301.998
Afghanistan, 1991: 9008.646
# ...
Afghanistan, 2016: 6563.177
Afghanistan, 2017: 6460.592
Albania, 1990: 390.996
Albania, 1991: 408.096
# ...
Albania, 2016: 9.087
Albania, 2017: 8.545
Find Highest Mortality Rate for Children Under 5 Years

We've been listing the mortality of children under five years for all the pertinent years. But, we are taking it a notch higher by selecting that one year that had the highest mortality.

Like collectingAndThen(), groupingBy() accepts a finisher parameter too. But, unlike collectingAndThen(), it takes a Collector type. Remember, collectingAndThen() takes a function.

Working with what we have then, we pass a maxBy() to groupingBy(). This has the effect of creating a Map of type: Map<String, Optional<CountryStats>>. It is a step in the right direction because we are now dealing with one Optional wrapping a CountryStats object:

result = stats.stream().collect(
    Collectors.groupingBy(
        CountryStats::getCountry,
        Collectors.maxBy(comparing::getUnderFive)
    )
);

Still, this approach doesn't produce the exact output we are after. Again, we have to format the output:

result.entrySet()
    .stream()
    .sorted(comparing(Entry::getKey))
    .limit(2)
    .forEach(entry -> {
        CountryStats stats = entry
            .getValue()
            .orElseThrow(exc);
        
        System.out.printf(
            "%s, %s: %.3f\n",
            entry.getKey(),
            stat.getYear(),
            stat.getUnderFive().getMortality()
        );
    });

So that we can get this output:

Afghanistan, 1997: 14644.286
Albania, 1991: 408.096

Granted, the output cites the correct figures we were after. But, there should be another way of producing such an output. And true enough, as we will see next, that way involves using collectingAndThen().

Cite the Year with the Highest Mortality Rate for Children Under 5 Years

Our main issue with the previous attempt is that it returned an Optional as the value of the Map element. And this Optional wrapped a CountryStats object, which in itself is an overkill. We need the Map elements to have the country name as the key. And the year as the value of that Map.

So, we will achieve that by creating the Map result with this code:

result = stats.stream().collect(
    groupingBy(
        CountryStats::getCountry,
        TreeMap::new,
        Collectors.collectingAndThen(
            Collectors.maxBy(
                Comparator.comparing(
                    CountryStats::getUnderFive
                )
            ),
            stat -> {
                return stat
                    .orElseThrow(exc)
                    .getYear();
            }
        )
    )
);

We've changed the previous attempt in three ways! First, we have included a Map factory (TreeMap::new) in the groupingBy() method call. This would make groupingBy() sort the country names in an alphabetical order. Remember, in the previous attempts we made sort() calls to achieve the same.

Yet, this is poor practice. We force an encounter of all the stream elements even before we apply a terminal operation. And that beats the whole logic of processing stream elements in a lazy fashion.

The sort() operation is a stateful intermediate operation. It would negate any gains we would gain if we used a parallel stream, for example.

Second, we have made it possible to get an extra step out of the maxBy() collector result. We have included collectingAndThen() to achieve that. Third, in the finishing step, we have transformed the Optional result from maxBy() into a year value.

And true enough, on printing the result to console, this is what we get:

{
Afghanistan=1997,
Albania=1991,
Algeria=1990,
American Samoa=1990,
Andean Latin America=1990,
Andorra=1990, Angola=1995,
Antigua and Barbuda=1990,
Argentina=1991,
...,
Zambia=1991,
Zimbabwe=2011
}
Multi-level Grouping with collectingAndThen()

You could say, the previous task focused on creating data that can fit in one table. One that has two columns: a country and year with the highest mortality of children under five. But, for our next task, we want to create data that fits many tables where each table contains two columns. That is, year with the highest mortality and the age group that was most affected.

Furthermore, each of these datasets should relate to a unique country. After the previous exercise, though, that isn't as hard as you might think. We could achieve the multi-level grouping with code that's as concise as this:

@Test
public void shouldCreateMultiLevelGroup() {
    Map result = stats.stream().collect(
        Collectors.groupingBy(
            CountryStats::getCountry,
            TreeMap::new,
            Collectors.groupingBy(
                CountryStats::getYear,
                TreeMap::new,
                Collectors.collectingAndThen(
                    Collectors.maxBy(
                        Comparator.comparing(
                            CountryStats::getHighest
                        )
                    ),
                    stat -> {
                        return stat
                            .orElseThrow(exc)
                            .getHighest()
                            .getAgeGroup();
                    }                  
                )
            )
        )
    );
    
    System.out.println(result);
}

Here, the only difference is that we've included an extra, outer groupingBy() operation. This ensures that the collection occurs for each country on its own. The inner groupingBy() sorts the country's data by year. Then, the collectingAndThen() operation uses the downstream collector maxBy(). This collector extracts the CountryStats with the highest mortality across all age groups.

And in the finishing step, we find the name of the age group with the highest mortality. With these done, we get an output such as this one on the console:

{
Afghanistan={
    1990=Under 5 yrs,
    1991=Under 5 yrs,
    1992=Under 5 yrs,
    ...,
    2014=Under 5 yrs,
    2015=Under 5 yrs,
    2016=Under 5 yrs,
    2017=Under 5 yrs
},
Albania={
    1990=Over 70 yrs,
    1991=Over 70 yrs,
    1992=Over 70 yrs,
    ...,
    2014=Over 70 yrs,
    2015=Over 70 yrs,
    2016=Over 70 yrs,
    2017=Over 70 yrs
},
..,
Congo={
    1990=Between 50 and 69 yrs,
    1991=Between 50 and 69 yrs,
    1992=Between 50 and 69 yrs,
    ...,
    2014=Over 70 yrs,
    2015=Over 70 yrs,
    2016=Over 70 yrs,
    2017=Between 50 and 69 yrs}
...
}

Using collectingAndThen() in Partitioning

We may encounter a use case where we've we want to know which country is at the edge. Meaning it shows indications of suffering from unacceptable mortality rates. Let's assume the rate at which mortality becomes a major point of concern is at 100,000.

Note: This is an arbitrary rate, set for illustration purposes. In general, risk is calculated by the number of deaths per 100,000, depending on the population of the country.

A country that enjoys a rate that's lower than this shows that it's mitigating the given risk factor. It's doing something about indoor pollution, for example. But, a country whose rate is near or at that rate shows that it could need some help:

Here, our aim is to find a way to partition the mortality data into two. The first part would contain the countries whose rates haven't hit the point of concern yet (x). But, we will be seeking the country whose rate is max in this group. This will be the country, which we will identify as needing help.

The second partition will contain the countries that are experiencing very high rates. And its max will be the country/region with the worst rates. The best collecting operation for this task would be the partitioningBy() method.

According to its official Javadoc, partitioningBy():

Returns a Collector which partitions the input elements according to a Predicate, reduces the values in each partition according to another Collector, and organizes them into a Map<Boolean, D> whose values are the result of the downstream reduction.

If you'd like to read more about partitioningBy() read our Java 8 Streams: Definitive Guide to partitioningBy()!

Going by this, we need a Predicate that checks whether mortality exceeds 100,000:

Predicate p = cs -> {
    return cs.getHighest()
        .getMortality()
        .doubleValue() > 100_000
};

Then, we will need a Collector that identifies the CountryStats not fulfilling the predicate. But, we would also need to know the CountryStats that doesn't meet the condition; but, is the highest. This object will be of interest because it would be about to hit the point-of-concern rate.

And as we'd seen earlier, the operation capable of such collecting is maxBy():

Collector c = Collectors.maxBy(
    Comparator.comparing(CountryStats::getHighest)
);

Still, we want plain CountryStats values in the Map which partitioningBy() will produce. Yet, with maxBy() alone we will get an output of:

Map<Boolean, Optional<String>> result = doPartition();

Hence, we'll rely on collectingAndThen() to adapt the Collector that maxBy() emits:

Collector c = Collectors.collectingAndThen(
    Collectors.maxBy(),
    s -> {
        return s.orElseThrow(exc).toString();
    }
);

And when we combine all these pieces of code, we end up with:

@Test
public void shouldCreatePartition() {
    Map result = stats.stream().collect(
        Collectors.partitioningBy(
            cs -> {
                return cs
                    .getHighest()
                    .getMortality()
                    .doubleValue() > 100_000;
            },
            Collectors.collectingAndThen(
                Collectors.maxBy(
                    Comparator.comparing(
                        CountryStats::getHighest
                    )
                ),
                stat -> {
                    return stat
                        .orElseThrow(exc)
                        .tostring();
                }
            )
        )
    );
    
    System.out.println(result);
}

On running this method, we get the output:

{
    false={
        country/region=Eastern Sub-Saharan Africa,
        year=1997, 
        mortality={
            ageGroup=Under 5 yrs,
            rate=99830.223
        }
    },
    true={
        country/region=World,
        year=1992,
        mortality={
            ageGroup=Over 70 yrs,
            rate=898396.486
        }
    }
}

These results mean that the Sub-Saharan region hasn't hit the point-of-concern yet. But, it could hit it anytime. Otherwise, we're not concerned with the "World" set because it has exceeded the set rate already, due to it being fixed.

Conclusion

The collectingAndThen() operation makes it possible to chain Collector results with extra functions. You can nest as many collectingAndThen() methods within each other. Other operations, which return Collector types, can work with this nesting approach too.

Near the end of this article, we found out that it can improve data presentation. The method also enabled us to refactor out inefficient operations like sort(). Using JMH, we measured and discovered how fast collectingAndThen() can run.

Find the complete code that this article has used in this GitHub repository.

Feel free to clone and explore the code in its entirety. Dig into the test cases, for example, to get a sense of the many uses of collectingAndThen().

Last Updated: April 12th, 2022
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Hiram KamauAuthor

In addition to catching code errors and going through debugging hell, I also obsess over whether writing in an active voice is truly better than doing it in passive.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

Want a remote job?

    © 2013-2022 Stack Abuse. All rights reserved.

    DisclosurePrivacyTerms