Introduction
A stream represents a sequence of elements and supports different kinds of operations that lead to the desired result. The source of a stream is usually a Collection or an Array, from which data is streamed from.
Streams differ from collections in several ways; most notably in that the streams are not a data structure that stores elements. They're functional in nature, and it's worth noting that operations on a stream produce a result and typically return another stream, but do not modify its source.
To "solidify" the changes, you collect the elements of a stream back into a Collection
.
Collectors represent implementations of the Collector
interface, which implements various useful reduction operations, such as accumulating elements into collections, summarizing elements based on a specific parameter, etc.
All predefined implementations can be found within the
Collectors
class.
You can also very easily implement your own collector and use it instead of the predefined ones, though - you can get pretty far with the built-in collectors, as they cover the vast majority of cases in which you might want to use them.
To be able to use the class in our code we need to import it:
import static java.util.stream.Collectors.*;
Stream.collect()
performs a mutable reduction operation on the elements of the stream.
A mutable reduction operation collects input elements into a mutable container, such as a Collection
, as it processes the elements of the stream.
In this guide, we'll be taking a deep dive into the
collectingAndThen()
collector.
What Does collectingAndThen() Do?
The collectingAndThen()
operation accepts two parameters:
collectingAndThen(Collector d, Function f);
First it calls a preexisting collector, d
and performs a final function, f
on the result of d
.
Let's take a quick look at how we could use the collectingAndThen()
method on a stream of integers:
Stream<Integer> s = Stream.of(12, 13, 14, 15)
Now, assume that you want to collect these values into an unmodifiable list of Integer
objects. As a first attempt, we'd create a list of the Integer
values:
List<Integer> list = Stream.of(12, 13, 14, 15)
.collect(
//Supplier
() -> new ArrayList<Integer>(),
//Accumulator
(l, e) -> l.add(e),
//Combiner
(l, ar) -> l.addAll(ar)
);
We've collected the stream's elements into a list using three parameters:
-
Supplier
-
Accumulator
-
Combiner
Still, for such a simple step, this is a bit too verbose. Luckily, we have the toList()
method in the Collectors
helper class. We could thus simplify the step by writing:
list = Stream.of(12, 13, 14, 15).collect(toList());
Granted, we've compacted the code into one line. Yet, when we check the class of the list that we've produced by:
System.out.println(list.getClass().getSimpleName());
This results in:
ArrayList
We wanted an unmodifiable list. And, ArrayList
isn’t one. A simple fix would be to thus call the method unmodifiableList()
from Collections
:
List<Integer> ul = Collections.unmodifiableList(list);
And on checking what class we've got as a result:
System.out.println(ul.getClass().getSimpleName());
We get the output:
UnmodifiableRandomAccessList
Hey, but what is an UnmodifiableRandomAccessList
? When you check the JDK's source code, you'll see that it extends UnmodifiableList
Whereby the UnmodifiableList
:
Returns an unmodifiable view of the specified list. This [class] allows modules to provide users with "read-only" access to internal lists
Thus far, we've seemed to have fulfilled our aim of creating an unmodifiable list from a stream of int
values, but we've had to work a lot for it.
This is the exact scenario that Java attempts to remedy with
collectingAndThen()
.
What we want to do is collect the integers, and then do something else (convert the list into an unmodifiable one), which is exactly what we can do with collectingAndThen()
:
ul = Stream.of(12, 13, 14, 15)
.collect(
Collectors.collectingAndThen(
Collectors.toList(),
Collections::unmodifiableList
)
);
And, our result, ul
, is of the type: UnmodifiableList
. Occam's Razor strikes again! Though, there's a lot more to be said about the method.
How does it really work? Is it efficient? When should you use it? How do we put it into practice?
This guide aims to answer all of these questions.
Definition of collectingAndThen()
Method Signature
The collectingAndThen()
method is a factory method in the helper class - Collectors
, a part of the Stream API:
public static <T, A, R, RR> Collector<T, A, RR> collectingAndThen(
Collector<T, A, R> downstream,
Function<R, RR> finisher
) {...}
Whereby the parameters represent:
downstream
: the initial collector that theCollectors
class will call.finisher
: the function that theCollectors
class will apply ondownstream
.
And, the generic types represent:
T
: class type of the stream’s elements.A
: class type of the elements after the accumulation step of collectordownstream
.R
: class type of the elements afterdownstream
finishes collecting.RR
: class type of the elements after you applyfinisher
ondownstream
.
And, the return value is:
Collector<T, A, RR>
: a collector that results from the application offinisher
ondownstream
.
Description
The official Javadoc states that the collectingAndThen()
method is useful because it:
Adapts a
Collector
to perform an additional finishing transformation.
There's not much to be added to this - we often perform actions on collections after collecting them - and this makes it much easier and less verbose!
How Does collectingAndThen() Work?
The following UML activity diagram summarizes the flow of control in a collectingAndThen()
operation. It's a high-level abstraction of what could always occur in such an operation -nonetheless, it shows how routines work in the streaming, collecting, and finishing steps:
When Should You Use collectingAndThen()?
1. When we need an object type other than what a single collect()
operation offers:
List<Integer> list = Arrays.asList(1, 2, 3);
Boolean empty = list.stream()
.collect(collectingAndThen(
toList(),
List::isEmpty
)
);
Here, we managed to get a Boolean
out of the List
that collect()
would've returned.
2. When we need to postpone processing until we can encounter all the elements in a given stream:
String longestName = people.stream()
.collect(collectingAndThen(
// Encounter all the Person objects
// Map them to their first names
// Collect those names in a list
mapping(
Person::getFirstName,
toList()
),
// Stream those names again
// Find the longest name
// If not available, return "?"
l -> {
return l
.stream()
.collect(maxBy(
comparing(String::length)
))
.orElse("?");
}
)
);
Here, for example, we only calculated the longest string after we read all the Person
names.
3. And, when we need to wrap a list to make it unmodifiable:
List<Integer> ul = Stream.of(12, 13, 14, 15)
.collect(
Collectors.collectingAndThen(
Collectors.toList(),
Collections::unmodifiableList
)
);
Is collectingAndThen() Efficient?
In some use cases, you can replace a collectingAndThen()
operation without changing the result of your method. It thus begs the question: would using collectingAndThen()
offer fast runtimes?
For example, assume you have a collection of names and you want to know which among them is the longest. Let's create a Person
class, which would contain somebody's full name: first
and last
:
public class Person {
private final String first;
private final String last;
// Constructor, getters and setters
}
And say you've got an ExecutionPlan
that generates quite a few Person
objects:
@State(Scope.Benchmark)
public class ExecutionPlan {
private List<Person> people;
@Param({"10", "100", "1000", "10000", "100000"})
int count;
@Setup(Level.Iteration)
public void setup() {
people = new ArrayList<>();
Name fakeName = new Faker().name();
for (int i = 0; i < count; i++) {
String fName = fakeName.firstName();
String lName = fakeName.lastName();
Person person = new Person(fName, lName);
people.add(person);
}
}
public List<Person> getPeople() {
return people;
}
}
Note: To easily generate many fake objects with sensible names - we use the Java Faker library. You can also include it in your Maven projects.
The ExecutionPlan
class dictates the number of Person
objects that you can test. Using a test harness (JMH), the count
field would cause the for
loop in setup()
to emit as many Person
objects.
We will find the longest first name using two approaches:
- Using the Stream API's intermediate operation,
sort()
. - Using
collectingAndThen()
.
The first approach uses the withoutCollectingAndThen()
method:
public void withoutCollectingAndThen() {
Comparator nameLength = Comparator.comparing(String::length)
.reversed();
String longestName = people
.stream()
.map(Person::getFirstName)
.sorted(nameLength)
.findFirst()
.orElse("?")
}
This approach maps a stream of Person
objects to their first names. Then, it sorts the length of the names in a descending order. It uses the static comparing()
method from the Comparator
interface. Because comparing()
would cause the sort to list in ascending order, we call reversed()
on it. This will make the stream contain values which start with the largest and end with the smallest.
We conclude the operation by calling findFirst()
, which selects the first, largest value. Also, because the result will be an Optional
we transform it to a String
with orElse()
.
The second approach uses the withCollectingAndThen()
method:
public void withCollectingAndThen() {
Collector collector = collectingAndThen(
Collectors.maxBy(Comparator.comparing(String::length)),
s -> s.orElse("?")
);
String longestName = people.stream()
.map(Person::getFirstName)
.collect(collector);
}
This approach is more concise because it contains the downstream collector, maxBy()
, so we don't have to sort, reverse, and find the first element. This method is one of the Collectors
class' many static methods. It's convenient to use because it returns one element only from a stream - the element with the largest value. The only thing that's left to us is to supply a Comparator
implementation to help it work out this value.
In our case, we're looking for the String
with the longest length so we use a Comparator.comparing(String::length)
. Here too, we need to deal with an Optional
. The maxBy()
operation produces one, which we then turn into a bare String
in the finisher step.
If we benchmark these two methods on 10, 100, 1000, 10000 and 100000 Person
instances using JMH - we get a pretty clear result:
Benchmark (count) Mode Cnt Score Error Units
CollectingAndThenBenchmark.withCollectingAndThen 10 thrpt 2 7078262.227 ops/s
CollectingAndThenBenchmark.withCollectingAndThen 100 thrpt 2 1004389.120 ops/s
CollectingAndThenBenchmark.withCollectingAndThen 1000 thrpt 2 85195.997 ops/s
CollectingAndThenBenchmark.withCollectingAndThen 10000 thrpt 2 6677.598 ops/s
CollectingAndThenBenchmark.withCollectingAndThen 100000 thrpt 2 317.106 ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen 10 thrpt 2 4131641.252 ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen 100 thrpt 2 294579.356 ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen 1000 thrpt 2 12728.669 ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen 10000 thrpt 2 1093.244 ops/s
CollectingAndThenBenchmark.withoutCollectingAndThen 100000 thrpt 2 94.732 ops/s
Note: JMH assigns a score instead of measuring the time it takes to execute a benchmarked operation. The units used were operations per second so the higher the number is, the better, as it indicates a higher throughput.
When you test with ten Person
objects, collectingAndThen()
runs twice as fast as sort()
. Whereas collectingAndThen()
can run 7,078,262
operations in a second, sort()
runs 4,131,641
.
But, with ten thousand of those objects, collectingAndThen()
displays even more impressive results. It runs six times as fast as sort()
! On larger datasets - it very clearly outperforms the first option so if you're dealing with many records, you'll gain significant performance benefits from collectingAndThen()
.
Find the complete test results' report on GitHub. There entire test harness is also on this GitHub repository. Go ahead and clone it and run it on your local machine and compare the results.
Putting collectingAndThen() to Practice - Indoor Pollution Dataset Analysis
So far, we've seen that collectingAndThen()
can adapt a collector with an extra step. Yet, this capability is even more powerful than you may think. You can nest collectingAndThen()
within other operations that also return Collector
instances. And remember, collectingAndThen()
returns a Collector
too. So, you can nest these other operations in it too:
stream.collect(groupingBy(
groupingBy(
collectingAndThen(
downstream,
finisher
)
)
)
);
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
This possibility opens up a slew of code design options. You can, for example, use it to group a stream's elements. Or, to partition them according to a given Predicate
.
If you'd like to read more about Predicates - read our Functional Programming in Java 8: Definitive Guide to Predicates!
We will see how this works using data of the deaths that indoor air pollution causes. This data contains the mortality rates per 100,000 people. Our World in Data (OWID) has categorized it by age and by year. It contains findings from most of the world's countries and regions. Also, it covers the years from 1990 to 2017.
Domain Design
The domain contains three main classes: Mortality
, CountryStats
, and StatsSource
. The Mortality
class contains two fields: the ageGroup
and mortality
. In essence, the Mortality
class is a value class.
See, we have the option of dealing with ageGroup
and mortality
values on their own. Yet, that's only bound to clutter up the client code. The String
values representing age groups wouldn't make sense when you use them on their own. The same applies to the BigDecimal
values representing mortality figures.
But, when you use these two together, they clarify what your domain is all about:
public class Mortality implements Comparable {
private final String ageGroup;
private final BigDecimal mortality;
//Constructor and getters...
@Override
public int compareTo(Mortality other) {
return Comparator.comparing(Mortality::getMortality)
.compare(this, other);
}
}
This class also implements the Comparable
interface. This is important because it would help us in sorting Mortality
objects. The next class, CountryStats
contains mortality data for different age groups. It's another value class and contains the name of a country/region. And, the year in which various deaths occurred across several age groups. It thus gives a snapshot of a country's mortality rates history:
public class CountryStats {
private final String country;
private final String code;
private final String year;
private final Mortality underFive;
private final Mortality seventyPlus;
private final Mortality fiftyToSixtyNine;
private final Mortality fiveToFourteen;
private final Mortality fifteenToFourtyNine;
//Constructor and getters...
public Mortality getHighest() {
Stream<Mortality> stream = Stream.of(
underFive,
fiveToFourteen,
fifteenToFourtyNine,
fiftyToSixtyNine,
seventyPlus
);
Mortality highest = stream.collect(
collectingAndThen(
Collectors.maxBy(
Comparator.comparing(
Mortality::getMortality
)
),
m -> m.orElseThrow(
RuntimeException::new
)
)
);
return highest;
}
}
Its getHighest()
method helps us know which age group has the highest mortality rate. It uses the collector from maxBy()
to know the Mortality
object with the highest rate. But, it returns an Optional
. Hence, we have an extra finishing step which unwraps the Optional
. And it does so in a manner which can throw a RuntimeException
if the Optional
is empty.
The last class, StatsSource
handles the mapping of the CSV data to CountryStats
. At heart, it acts as a helper class, which gives access to the CSV file containing the mortality rates. It uses the Apache Commons CSV library to read the CSV file containing the data:
public class StatsSource {
private List<CountryStats> stats;
public List<CountryStats> getStats() {
if (stats == null) {
File f; //Get CSV file containing data
Reader in = new FileReader(f);
CSVFormat csvf = CSVFormat
.DEFAULT
.builder()
.setHeader()
.setSkipHeaderRecord(true)
.build();
Spliterator split = csvf.parse(in)
.splitIterator();
stats = StreamSupport
// Set `true` to make stream parallel
// Set `false` to make sequential
.stream(split, false)
.map(StatsSource::toStats)
.collect(toList());
}
return stats;
}
public static CountryStats toStats(CSVRecord r) {
// Constructor...
}
}
Note how it maps the lines in the file to CountryStats
objects using a stream. We had the option of using StreamSupport
to create a parallel stream of lines by using a true
flag. But, we opted to have a serial stream instead by passing false
to StreamSupport
.
The data in the CSV file comes in an alphabetical order from the source. Yet, by using a parallel stream, we would lose that order.
Using collectingAndThen() in Grouping
We want to present the data from the source in various, useful ways. We want to show, for example, pertinent data in categories of year, country, and mortality rate. A simple use case would be to present the data with only two headers. A country and the year that it suffered the highest mortality rates for children aged under five. In other terms, this is single-level grouping.
In a tabulated format, for example, we would wish to achieve this:
Country | Year with highest mortality for kids under 5 years |
Afghanistan | 1997 |
Albania | 1991 |
Nigeria | 2000 |
Solomon Islands | 2002 |
Zimbabwe | 2011 |
A more complex one would be to list the countries by the years in which mortality occurred. And in those years, we would want to list the age group that suffered the highest mortality. In statistical terms, we're aiming for multi-level grouping of data. In simple terms, multi-level grouping is akin to creating many single-level groups. We could thus represent these statistics as:
Afghanistan
Year | Age Group Reporting Highest Mortality |
1990 | Under 5 years |
1991 | Between 50 and 69 years |
2000 | Over 70 years |
2001 | Over 70 years |
2010 | Under 5 years |
Papua New Guinea
Year | Age Group Reporting Highest Mortality |
1990 | Over 70 years |
1991 | Over 70 years |
2000 | Between 5 and 14 years |
2001 | Between 5 and 14 years |
2010 | Between 15 and 49 years |
And so on... for every country, from the year 1990 to 2017.
Single-level Grouping with collectingAndThen()
In declarative programming terms, we have three tasks we need the code to perform:
- Group the mortality data according to countries.
- For each country, find its highest mortality rate for children under five years.
- Report the year in which that high rate occurred.
Group by Country
One thing is worth considering. The CSV file we're dealing with lists mortality data for every country several times. It lists 28 entries for each country. We could thus create a Map
out of these entries. The key would be the country name and the CountryStats
value. And, this is the exact thing the method shouldGroupByCountry()
does:
private final StatsSource src = new StatsSource();
private List<CountryStats> stats = src.getStats();
private final Supplier exc = RuntimeException::new;
@Test
public void shouldGroupByCountry() {
Map result = stats.stream().collect(
Collectors.groupingBy(
CountryStats::getCountry,
Collectors.toList()
)
);
System.out.println(result);
}
If you'd like to read more about
groupingBy()
read our Guide to Java 8 Collectors: groupingBy()!
This Map
is large so just printing it out to the console would make it absolutely unreadable. Instead, we can format the output by inserting this code block right after calculating the result
variable:
result.entrySet()
.stream()
.sorted(comparing(Entry::getKey))
.limit(2)
.forEach(entry -> {
entry.getValue()
.stream()
.sorted(comparing(CountryStats::getYear))
.forEach(stat -> {
System.out.printf(
"%s, %s: %.3f\n",
entry.getKey(),
stat.getYear(),
stat.getUnderFive().getMortality()
);
});
});
The result
value is of the type, Map<String, List<CountryStats>>
. To make it easier to interpret:
- We sort the keys in an alphabetical order.
- We instruct the stream to limit its length to only two
Map
elements. - We deal with outputting the details for every element using
forEach()
.- We sort the value (a list of
CountryStats
values) from the key by year. - Then, we print the year and its mortality rate for children under five years.
- We sort the value (a list of
With that done, we can now get an output such as this:
Afghanistan, 1990: 9301.998
Afghanistan, 1991: 9008.646
# ...
Afghanistan, 2016: 6563.177
Afghanistan, 2017: 6460.592
Albania, 1990: 390.996
Albania, 1991: 408.096
# ...
Albania, 2016: 9.087
Albania, 2017: 8.545
Find Highest Mortality Rate for Children Under 5 Years
We've been listing the mortality of children under five years for all the pertinent years. But, we are taking it a notch higher by selecting that one year that had the highest mortality.
Like
collectingAndThen()
,groupingBy()
accepts a finisher parameter too. But, unlikecollectingAndThen()
, it takes aCollector
type. Remember,collectingAndThen()
takes a function.
Working with what we have then, we pass a maxBy()
to groupingBy()
. This has the effect of creating a Map
of type: Map<String, Optional<CountryStats>>
. It is a step in the right direction because we are now dealing with one Optional
wrapping a CountryStats
object:
result = stats.stream().collect(
Collectors.groupingBy(
CountryStats::getCountry,
Collectors.maxBy(comparing::getUnderFive)
)
);
Still, this approach doesn't produce the exact output we are after. Again, we have to format the output:
result.entrySet()
.stream()
.sorted(comparing(Entry::getKey))
.limit(2)
.forEach(entry -> {
CountryStats stats = entry
.getValue()
.orElseThrow(exc);
System.out.printf(
"%s, %s: %.3f\n",
entry.getKey(),
stat.getYear(),
stat.getUnderFive().getMortality()
);
});
So that we can get this output:
Afghanistan, 1997: 14644.286
Albania, 1991: 408.096
Granted, the output cites the correct figures we were after. But, there should be another way of producing such an output. And true enough, as we will see next, that way involves using collectingAndThen()
.
Cite the Year with the Highest Mortality Rate for Children Under 5 Years
Our main issue with the previous attempt is that it returned an Optional
as the value of the Map
element. And this Optional
wrapped a CountryStats
object, which in itself is an overkill. We need the Map
elements to have the country name as the key. And the year as the value of that Map
.
So, we will achieve that by creating the Map
result with this code:
result = stats.stream().collect(
groupingBy(
CountryStats::getCountry,
TreeMap::new,
Collectors.collectingAndThen(
Collectors.maxBy(
Comparator.comparing(
CountryStats::getUnderFive
)
),
stat -> {
return stat
.orElseThrow(exc)
.getYear();
}
)
)
);
We've changed the previous attempt in three ways! First, we have included a Map
factory (TreeMap::new
) in the groupingBy()
method call. This would make groupingBy()
sort the country names in an alphabetical order. Remember, in the previous attempts we made sort()
calls to achieve the same.
Yet, this is poor practice. We force an encounter of all the stream elements even before we apply a terminal operation. And that beats the whole logic of processing stream elements in a lazy fashion.
The sort()
operation is a stateful intermediate operation. It would negate any gains we would gain if we used a parallel stream, for example.
Second, we have made it possible to get an extra step out of the maxBy()
collector result. We have included collectingAndThen()
to achieve that. Third, in the finishing step, we have transformed the Optional
result from maxBy()
into a year value.
And true enough, on printing the result to console, this is what we get:
{
Afghanistan=1997,
Albania=1991,
Algeria=1990,
American Samoa=1990,
Andean Latin America=1990,
Andorra=1990, Angola=1995,
Antigua and Barbuda=1990,
Argentina=1991,
...,
Zambia=1991,
Zimbabwe=2011
}
Multi-level Grouping with collectingAndThen()
You could say, the previous task focused on creating data that can fit in one table. One that has two columns: a country and year with the highest mortality of children under five. But, for our next task, we want to create data that fits many tables where each table contains two columns. That is, the year with the highest mortality and the age group that was the most affected.
Furthermore, each of these datasets should relate to a unique country. After the previous exercise, though, that isn't as hard as you might think. We could achieve the multi-level grouping with code that's as concise as this:
@Test
public void shouldCreateMultiLevelGroup() {
Map result = stats.stream().collect(
Collectors.groupingBy(
CountryStats::getCountry,
TreeMap::new,
Collectors.groupingBy(
CountryStats::getYear,
TreeMap::new,
Collectors.collectingAndThen(
Collectors.maxBy(
Comparator.comparing(
CountryStats::getHighest
)
),
stat -> {
return stat
.orElseThrow(exc)
.getHighest()
.getAgeGroup();
}
)
)
)
);
System.out.println(result);
}
Here, the only difference is that we've included an extra, outer groupingBy()
operation. This ensures that the collection occurs for each country on its own. The inner groupingBy()
sorts the country's data by year. Then, the collectingAndThen()
operation uses the downstream collector maxBy()
. This collector extracts the CountryStats
with the highest mortality across all age groups.
And in the finishing step, we find the name of the age group with the highest mortality. With these done, we get an output such as this one on the console:
{
Afghanistan={
1990=Under 5 yrs,
1991=Under 5 yrs,
1992=Under 5 yrs,
...,
2014=Under 5 yrs,
2015=Under 5 yrs,
2016=Under 5 yrs,
2017=Under 5 yrs
},
Albania={
1990=Over 70 yrs,
1991=Over 70 yrs,
1992=Over 70 yrs,
...,
2014=Over 70 yrs,
2015=Over 70 yrs,
2016=Over 70 yrs,
2017=Over 70 yrs
},
..,
Congo={
1990=Between 50 and 69 yrs,
1991=Between 50 and 69 yrs,
1992=Between 50 and 69 yrs,
...,
2014=Over 70 yrs,
2015=Over 70 yrs,
2016=Over 70 yrs,
2017=Between 50 and 69 yrs}
...
}
Using collectingAndThen() in Partitioning
We may encounter a use case where we want to know which country is at the edge. Meaning it shows indications of suffering from unacceptable mortality rates. Let's assume the rate at which mortality becomes a major point of concern is at 100,000.
Note: This is an arbitrary rate, set for illustration purposes. In general, risk is calculated by the number of deaths per 100,000, depending on the population of the country.
A country that enjoys a rate that's lower than this shows that it's mitigating the given risk factor. It's doing something about indoor pollution, for example. But, a country whose rate is near or at that rate shows that it could need some help:
Here, our aim is to find a way to partition the mortality data into two. The first part would contain the countries whose rates haven't hit the point of concern yet (x
). But, we will be seeking the country whose rate is max in this group. This will be the country, which we will identify as needing help.
The second partition will contain the countries that are experiencing very high rates. And its max will be the country/region with the worst rates. The best collecting operation for this task would be the partitioningBy()
method.
According to its official Javadoc, partitioningBy()
:
Returns a
Collector
which partitions the input elements according to aPredicate
, reduces the values in each partition according to anotherCollector
, and organizes them into aMap<Boolean, D>
whose values are the result of the downstream reduction.If you'd like to read more about
partitioningBy()
read our Java 8 Streams: Definitive Guide to partitioningBy()!
Going by this, we need a Predicate
that checks whether mortality exceeds 100,000:
Predicate p = cs -> {
return cs.getHighest()
.getMortality()
.doubleValue() > 100_000
};
Then, we will need a Collector
that identifies the CountryStats
not fulfilling the predicate. But, we would also need to know the CountryStats
that doesn't meet the condition; but, is the highest. This object will be of interest because it would be about to hit the point-of-concern rate.
And as we'd seen earlier, the operation capable of such collecting is maxBy()
:
Collector c = Collectors.maxBy(
Comparator.comparing(CountryStats::getHighest)
);
Still, we want plain CountryStats
values in the Map
which partitioningBy()
will produce. Yet, with maxBy()
alone we will get an output of:
Map<Boolean, Optional<String>> result = doPartition();
Hence, we'll rely on collectingAndThen()
to adapt the Collector
that maxBy()
emits:
Collector c = Collectors.collectingAndThen(
Collectors.maxBy(),
s -> {
return s.orElseThrow(exc).toString();
}
);
And when we combine all these pieces of code, we end up with:
@Test
public void shouldCreatePartition() {
Map result = stats.stream().collect(
Collectors.partitioningBy(
cs -> {
return cs
.getHighest()
.getMortality()
.doubleValue() > 100_000;
},
Collectors.collectingAndThen(
Collectors.maxBy(
Comparator.comparing(
CountryStats::getHighest
)
),
stat -> {
return stat
.orElseThrow(exc)
.tostring();
}
)
)
);
System.out.println(result);
}
On running this method, we get the output:
{
false={
country/region=Eastern Sub-Saharan Africa,
year=1997,
mortality={
ageGroup=Under 5 yrs,
rate=99830.223
}
},
true={
country/region=World,
year=1992,
mortality={
ageGroup=Over 70 yrs,
rate=898396.486
}
}
}
These results mean that the Sub-Saharan region hasn't hit the point-of-concern yet. But, it could hit it anytime. Otherwise, we're not concerned with the "World" set because it has exceeded the set rate already, due to it being fixed.
Conclusion
The collectingAndThen()
operation makes it possible to chain Collector
results with extra functions. You can nest as many collectingAndThen()
methods within each other. Other operations, which return Collector
types, can work with this nesting approach too.
Near the end of this article, we found out that it can improve data presentation. The method also enabled us to refactor out inefficient operations like sort()
. Using JMH, we measured and discovered how fast collectingAndThen()
can run.
Find the complete code that this article has used in this GitHub repository.
Feel free to clone and explore the code in its entirety. Dig into the test cases, for example, to get a sense of the many uses of collectingAndThen()
.