Introduction
The reduce()
method is Java 8's answer to the need for a fold implementation in the Stream API.
Folding is a very useful and common functional programming feature. It operates on a collection of elements to return a single result using some sort of operation.
Note: Folding is also known as reducing, aggregating, accumulating and compressing, and these terms all apply to the same concept.
That being said - it's one of the most malleable, flexible and applicable operations - and it's very commonly used to calculate aggregate results of collections and widely employed in one form or another in analytical and data-driven applications. The reduce()
operation equips the Stream API with similar fold capabilities.
Thus, if you have some int
values such as, say, [11, 22, 33, 44, 55]
, you could use reduce()
to find their sum, amongst other results.
In functional programming, finding the sum of those numbers would apply steps such as these:
0 + 11 = 11
11 + 22 = 33
33 + 33 = 66
66 + 44 = 110
110 + 55 = 165
Using the reduce()
method, this is achieved as:
int[] values = new int[]{11, 22, 33, 44, 55};
IntStream stream = Arrays.stream(values);
int sum = stream.reduce(0, (left, right) -> left + right);
The sum
is:
165
The reduce()
is straightforward enough. If you look at the functional routine, for example, you could call all the values on the left side of the +
operator left
; and those on the right, right
. Then, after every sum operation, the result becomes the new left
of the next summing.
Likewise, Java's reduce()
method does exactly what the functional routine does. It even includes a starting value, 0
, which the functional routine has too.
Operation-wise, the reduce()
method adds a left
value to the next right
value. Then it adds that sum to the next right
value and so on.
You could even visualize how reduce()
implements folding on those values as:
((((0 + 11) + 22) + 33) + 44) + 55 = 165
The Stream API does not offer the folding capabilities of reduce()
like in the example above only, though.
It goes all out to include its functional interfaces in three reduce()
method implementations. As you will see in more detail in subsequent sections, the API offers reduce()
in flavors such as:
T reduce(T identity, BinaryOperator<T> accumulator)
This version is what we used earlier. Where,
0
was theidentity
; and,(left, right) -> left + right)
was theaccumulator
that implemented theBinaryOperator
functional interface.
And:
Optional<T> reduce(BinaryOperator<T> accumulator)
And:
<U> U reduce(U identity,
BiFunction<U,? super T,U> accumulator,
BinaryOperator<U> combiner)
Note: The sum()
, average()
, max()
and min()
operations of the Stream API are reduction variations.
The sum()
, max()
and min()
methods are essentially wrappers for the reduce()
operation:
// Equivalent to stream.sum()
stream.reduce(0, Integer::sum);
// Equivalent to stream.max()
stream.reduce(0, Integer::max);
// Equivalent to stream.min()
stream.reduce(0, Integer::min);
In the proceeding sections, we'll dive deep into the
reduce()
method, its variants, use-cases and good practices, leaving you with a deeper understanding and appreciation for the underlying mechanism.
reduce() Flavors and Examples
The Stream API offers three reduce()
operation variants. Let's go through each of them, their definitions and practical usage.
1. reduce() whose Result is the Same Type as the Stream's Elements
Method signature:
T reduce(T identity, BinaryOperator<T> accumulator)
Official documentation definition:
Performs a reduction on the elements of this stream, using the provided identity value and an associative accumulation function, and returns the reduced value.
By now, we know how this type of reduce()
operates. But, there is a little matter that you should be careful with when using this reduce()
type. (Actually, with any reduction operation):
The associative nature of your
reduce()
implementation.
When you use reduce()
, you should provide the possibility for your routines to run in parallel settings as well. Reduction operations are not constrained to perform sequentially.
To this end, associativity is crucial because it will enable your accumulator to produce correct results regardless of the stream elements' encounter order. If associativity didn't hold here, the accumulator would be unreliable.
Case in point: say, you have three int
values, [8, 5, 4]
.
Associativity demands operating on these values in any order should always produce matching results. For example:
(8 + 5) + 6 == 8 + (5 + 6)
Also, when parallelization occurs the accumulation may handle these values in even smaller units. For instance, take a stream which contains the values [7, 3, 5, 1]
. A parallel stream may make the accumulation work in a fashion like:
7 + 3 + 5 + 1 == (7 + 3) + (5 + 1)
But, these demands effectively bar you from using some types of operations with the reduce()
method. You cannot, for example, do subtraction operations with reduce()
. That is because it would violate the associativity principle.
See, say you use the values from one of the previous examples: [8, 5, 4]
. And then try to use reduce()
to find their cumulative difference.
It would look something like this:
(8 - 5) - 6 != 8 - (5 - 6)
Otherwise, the identity parameter is another factor to be careful of. Choose an identity value, i
, such that: for each element e
in a stream, applying an operation op
on it should always return e
.
What this means is that:
e op identity = e
In the case of addition, the identity is
0
. In case of multiplication, the identity is1
(as multiplication with 0 will always be 0, not e). In the case of strings, the identity is aString
, etc.
This operation can be functionally used in Java as:
IntStream intStream = IntStream.of(11, 22, 33, 44, 55);
Stream stringStream = Stream.of("Java", "Python", "JavaScript");
int sum = intStream.reduce(0, (left, right) -> left + right);
int max = intStream.reduce(0, Integer::max);
int min = intStream.reduce(0, Integer::min);
// Mapping elements to a stream of integers, thus the return type is the same type as the stream itself
int sumOfLengths = stringStream.mapToInt(String::length)
.reduce(0, Integer::sum);
These reduce()
calls were so common, that they were replaced with a higher-level call - sum()
, min()
, max()
, and you could by all means use those instead of the reduce()
calls, though keep in mind that they were modified to return Optional
variants:
int sum = intStream.sum();
OptionalInt max = intStream.max();
OptionalInt min = intStream.min();
Where reduce()
shines is in cases where you want any scalar result from any sequence - such as reducing a collection to an element that has the greatest length, which results in an Optional
. We'll take a look at that now.
2. reduce() whose Result is an Optional
Method signature:
Optional<T> reduce(BinaryOperator<T> accumulator)
Official documentation definition:
Performs a reduction on the elements of this stream, using an associative accumulation function, and returns an Optional describing the reduced value, if any.
Operationally, this is the simplest way of using the reduce()
method. It asks for only one parameter. A BinaryOperator
implementation, which would serve as an accumulator.
So, instead of this:
int sum = stream
.reduce(0, (left, right) -> left + right);
You would only have to do this (i.e., leave out the identity value):
Optional<Integer> sum = stream
.reduce((left, right) -> left + right);
The difference between the former and the latter is that in the latter the result may not contain any value.
That would occur when you pass an empty stream for evaluation, for example. Yet, that does not happen when you use an identity as one of the parameters because reduce()
returns the identity itself as result when you offer it an empty stream.
Another example would be reducing collections to certain elements, such as reducing the stream created by several Strings to a single one:
List<String> langs = List.of("Java", "Python", "JavaScript");
Optional longest = langs.stream().reduce(
(s1, s2) -> (s1.length() > s2.length()) ? s1 : s2);
What's going on here? We're streaming a list and reducing it. For each two elements (s1, s2
), their lengths are compared, and based on the results, either s1
or s2
are returned, using the ternary operator.
The element with the greatest length will be propagated through these calls and the reduction will result in it being returned and packed into an Optional
, if such an element exists:
longest.ifPresent(System.out::println);
This results in:
JavaScript
3. reduce() which Uses a Combining Function
Method signature:
<U> U reduce(U identity, BiFunction<U,? super T,U> accumulator, BinaryOperator<U> combiner)
Official documentation definition:
Performs a reduction on the elements of this stream, using the provided identity, accumulation and combining functions.
While this definition seems straightforward enough, it hides a powerful capability.
This
reduce()
variant can allow you to process a result whose type does not match that of a stream's elements.
Haven't we done this before? Not really.
int sumOfLengths = stringStream
.mapToInt(String::length)
.reduce(0, Integer::sum);
The mapToInt()
method returns an IntStream
, so even though we start out with a stream of Strings - the reduce()
method is called on an IntStream
, and returns an integer, which is the type of the elements in the stream.
The
mapToInt()
is a quick hack that allowed us to "return a different type", though, it didn't really return a different type.
Take the case where you want to calculate the cumulative length of a paragraph of words, or the length of the words like we've had before.
That suggests that you may have a stream of String
elements. Yet, you need the return type of the reduce()
operation to have an int
value to denote the length of the paragraph.
This is where the combiner comes into play:
String string = "Our Mathematical Universe: My Quest for the Ultimate Nature of Reality";
List<String> wordList = List.of(string.split(" "));
int length = wordList
.stream()
.reduce(
0,
(parLength, word) -> parLength + word.length(),
(parLength, otherParLength) -> parLength + otherParLength
);
System.out.println(String.format("The sum length of all the words in the paragraph is %d", length));
This code sums the length of all strings in the paragraphs, broken down on each space (so whitespaces aren't included in the calculation) and results in:
The sum length of all the words in the paragraph is 60
The feature that is worth noting with this reduce()
variant is that it serves parallelization pretty well.
Take the accumulator in the example:
(parLength, word) -> parLength + word.length()
The reduce()
operation will call it multiple times, no doubt. Yet, in a parallelized stream there may end up being quite a few accumulators in the pipeline. And that is where the combiner function steps in.
The combiner function in the example is:
(parLength, otherParLength) -> parLength + otherParLength
It sums the results of the available accumulators to produce the final result.
And that allows the reduce()
operation to break down a chunky process into many, smaller, and probably faster operations. This also segues us into the next significantly important topic - parallelization.
Using reduce() with Parallel Streams
You can turn any sequential stream into a parallel one by calling the parallel()
method on it.
Likewise, let us consider a use case where you want to sum all the int
values in a given range to test how reduce()
works in parallel.
There are several ways of generating a sequence of int
values within a given range using the Stream API:
- Using
Stream.iterate
- Using
IntStream.rangeClosed
Using Stream.iterate()
private final int max = 1_000_000;
Stream<Integer> iterateStream = Stream.iterate(1, number -> number + 1).limit(max);
Using IntStream.rangeClosed()
IntStream rangeClosedStream = IntStream.rangeClosed(1, max);
So, if we have these two ways of producing a stream of int
values, is one more efficient than the other for our use case?
The answer is a resounding yes.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The Stream.iterate()
is not as efficient as the IntStream.rangeClosed()
when you apply the reduce()
operation to them. We'll see why shortly.
When you use the two tactics to find the sum of numbers you would write code such as this:
Integer iterateSum = iterateStream
.parallel()
.reduce(0, (number1, number2) -> number1 + number2);
int rangeClosedSum = rangeClosedStream
.parallel()
.reduce(0, (number1, number2) -> number1 + number2);
True, both ways will always produce matching and correct results.
If you set the variable max
to 1,000,000
, for example, you will get 1,784,293,664
from both reduce()
methods.
Yet, calculating iterateSum
is slower than rangeClosedSum
.
The cause for this is the fact that Stream.iterate()
applies unboxing and boxing to all the number values it encounters in its pipeline. For instance, notice that we supplied int
values to it and it returned an Integer
object as the result.
IntStream.rangeClosed()
does not suffer from this shortcoming because it deals with int
values directly and even returns an int
value as a result, for example.
Here are some more tests on GitHub that illustrate this phenomenon. Clone that repo and run the tests to explore further for yourself how
reduce()
performs when running inStream.iterate()
andIntStream.rangeClosed()
.
When to not Use reduce()
The
reduce()
operation requires the use of a stateless and non-interfering accumulator.
That means that the accumulator should ideally be immutable. And, to achieve this, most accumulators create new objects to hold the value of the next accumulation.
Take a case where you want to join several elements of String
objects into one String
object. Where you want to make a sentence out of several words, for example. Or, even a word by chaining several char
values.
The official documentation offers one such example:
String concatenated = strings.reduce("", String::concat);
Here, the reduce()
operation will create very many string objects if the strings
stream has a large number of elements.
And, depending on how large the strings
stream is, the performance will take a dip fast because of all the object allocation that is going on.
To get a clearer picture of how this operation works consider its for
loop equivalent. Then, note how new String
objects materialize with every loop pass:
String concatenated = "";
for (String string : strings) {
concatenated += string;
}
Yet, you could attempt to remedy the creation of new objects in
reduce()
operations by using mutable objects in the first place.
However, keep in mind that if you try to remedy that shortcoming by using a mutable identity container like a List
we expose that container to ConcurrentModification
exceptions.
Take a case where you want to reduce()
a stream of int
values into a List
of Integer
objects. You could do something like this:
Stream<Integer> numbersStream = Arrays.asList(12, 13, 14, 15, 16, 17).stream();
List<Integer> numbersList = numbersStream.reduce(
// Identity
new ArrayList<>(),
// Accumulator
(list, number) -> {
list.add(number);
return list;
},
// Combiner
(list1, list2) -> {
list1.addAll(list2);
return list1;
}
);
This code will give you a correct result:
[12, 13, 14, 15, 16, 17]
But, it will come at a cost.
First, the accumulator in this case is interfering with the identity. It is introducing a side effect by adding a value to the list acting as identity.
Then, if you happen to turn the stream, numbersStream
, to a parallel one, you will expose the list accumulation to concurrent modification. And, this is bound to make the operation throw a ConcurrentModification
at some point.
Thus, your entire reduce()
operation may fail altogether.
Putting reduce() into Practice
Because of its functional nature, the Stream API demands a total rethinking of how we design Java code. It requires the use of methods that can fit in the patterns of functional interfaces that operations such as reduce()
use.
As a result, we will design our code such that when we call the reduce()
operation on it, it will result in terse code. One that you can rewrite with member references, for example.
But, first, let us explore the use case that we will use to test reduce()
operations with.
- We have a grocery store that sells various products. Examples include, cheese, tomatoes, and cucumbers.
- Now, every product has attributes such as a name, price, and unit weight
- Customers get products from the store through transactions.
As a manager of such a grocery store, you come in one day and ask the clerk a few questions:
- How much money did you make from all your transactions?
- How heavy were the sold items? That is, what was the cumulative weight of the products that you sold?
- What was the value of the transaction that a customer paid the most for?
- Which transaction had the lowest value (in terms of its total price value)?
Designing the Domain
We will create a class Product
to represent the items that will that the grocery store will stock:
public class Product {
private final String name;
private final Price price;
private final Weight weight;
public Product(String name, Price price, Weight weight) {
this.name = name;
this.price = price;
this.weight = weight;
}
// Getters
}
Notice that we have included two value classes as fields of Product
named Weight
and Price
.
Yet, if we had wanted to do it naively, we would have made these two fields have double
values.
Like this:
public Product(String name, double price, double weight) {
this.name = name;
this.price = price;
this.weight = weight;
}
There is an absolutely good reason for doing this, and you shall find out why soon. Otherwise, both Price
and Weight
are simple wrappers for double
values:
public class Price {
private final double value;
public Price(double value) {
this.value = value;
}
//Getters
}
public class Weight {
private final double value;
public Weight(double value) {
this.value = value;
}
// Getters
}
Then, we have the Transaction
class. This class will contain a Product
and the int
value which represents the quantity of the product that a customer will buy.
Thus, Transaction
should be able to inform us of the total Price
and Weight
of Product
which a customer bought. It should thus include methods such as:
public class Transaction {
private final Product product;
private final int quantity;
public Transaction(Product product, int quantity) {
this.product = product;
this.quantity = quantity;
}
//Getters omitted
public Price getTotalPrice() {
return this.product.getPrice().getTotal(quantity);
}
public Weight getTotalWeight() {
return this.product.getWeight().getTotal(quantity);
}
}
Note how the methods getTotalPrice()
and getTotalWeight()
delegate their calculations to Price
and Weight
.
These delegations are quite important, and the reason we used classes instead of simple
double
fields.
They suggest that Price
and Weight
should be able to do accumulations of their types.
And remember, the reduce()
operation always takes a BinaryOperator
as its accumulator. So, this is the juncture where we start to pre-build accumulators for our classes.
Thus, add the following methods to serve as accumulators for Price
and Weight
:
public class Price {
// Fields, constructor, getters
public Price add(Price otherPrice) {
return new Price(value + otherPrice.getValue());
}
public Price getTotal(int quantity) {
return new Price(value * quantity);
}
}
public class Weight {
// Fields, constructor, getters
public Weight add(Weight otherWeight) {
return new Weight(value + otherWeight.getValue());
}
public Weight getTotal(int quantity) {
return new Weight(value * quantity);
}
}
There are variants of the reduce()
operation which also require identity parameters. And since an identity is a starting point of a calculation (which may be the object with the lowest value), we should go ahead and create the identity versions of Price
and Weight
.
You could do this by simply including the identity versions of these classes as global variables. So, let's add the fields named NIL
to Price
and Weight
:
public class Price {
// Adding NIL
public static final Price NIL = new Price(0.0);
private final double value;
public Price(double value) {
this.value = value;
}
}
public class Weight {
// Adding NIL
public static final Weight NIL = new Weight(0.0);
private final double value;
public Weight(double value) {
this.value = value;
}
}
As the name NIL
suggests, these fields represent Price
or Weight
which has the minimum value. With that done, it is time to create the Grocery
object which will conduct the transactions:
public class Grocery {
public static void main(String[] args) {
//Inventory
Product orange = new Product("Orange", new Price(2.99), new Weight(2.0));
Product apple = new Product("Apple", new Price(1.99), new Weight(3.0));
Product tomato = new Product("Tomato", new Price(3.49), new Weight(4.0));
Product cucumber = new Product("Cucumber", new Price(2.29), new Weight(1.0));
Product cheese = new Product("Cheese", new Price(9.99), new Weight(1.0));
Product beef = new Product("Beef", new Price(7.99), new Weight(10.0));
//Transactions
List<Transaction> transactions = Arrays.asList(
new Transaction(orange, 14),
new Transaction(apple, 12),
new Transaction(tomato, 5),
new Transaction(cucumber, 15),
new Transaction(cheese, 8),
new Transaction(beef, 6)
);
}
}
As the code shows, the Grocery
has few Product
objects in its inventory. And, a few Transaction
events occurred.
Still, the store's manager had asked for some data regarding the transactions. We should thus proceed to put reduce()
to work to help us answer those queries.
Money Made from All Transactions
The total price of all the transactions is a result of summing the total price of all transactions.
Thus, we map()
all the Transaction
elements to their Price
values first.
Then, we reduce the Price
elements to a sum of their values.
Here, the abstraction of the accumulator into the Price
object itself has made the code highly readable. Also, the inclusion of the Price.NIL
identity has made the reduce()
operation read as functionally as possible:
Price totalPrice = transactions.stream()
.map(Transaction::getTotalPrice)
.reduce(Price.NIL, Price::add);
System.out.printf("Total price of all transactions: %s\n", totalPrice);
After running that code snippet, the output you should expect is:
Total price of all transactions: $245.40
Note too that we delegate the printing of the price value to the Print
object's toString()
method to simplify debugging further:
Using the
toString()
method to provide a human friendly description of an object's value is always good practice.
@Override
public String toString() {
return String.format("$%.2f", value);
}
Total Weight of All Sold Products
Similar to what we did with Price
, here we task Weight
with summing the values of several elements.
Of course we need map()
each Transaction
element in the pipeline to a Weight
object first.
Then we task the Weight
elements with doing the accumulation of their values themselves:
Weight totalWeight = transactions.stream()
.map(Transaction::getTotalWeight)
.reduce(Weight.NIL, Weight::add);
System.out.printf("Total weight of all sold products: %s\n", totalWeight);
On running this snippet, you should an output such as:
Total weight of all sold products: 167.00 lbs
Price of Highest Value Transaction
This query demands a bit of a redesign of how a Price
finds a minimum or maximum value between two Price
elements.
Remember, in the preceding tasks, all we did was accumulate the values when executing reduce()
. However, finding a minimum or maximum value is another matter altogether.
Whereas we did summing with previous accumulations, here we have to start with the value of the first Price
element. Then we will replace it with another value if that value is greater than what we have. Thus, in the end, we end up with the highest value. This logic applies to when you are seeking the minimum value too.
Hence, include this code to calculate your max and min values for Price
elements:
public class Price {
// Fields, getters, constructors, other methods
public Price getMin(Price otherPrice){
return new Price(Double.min(value, otherPrice.getValue()));
}
public Price getMax(Price otherPrice){
return new Price(Double.max(value, otherPrice.getValue()));
}
}
And when you include these capabilities in your Grocery
objects calculations, you will get a reduce()
operation that looks like this:
transactions.stream()
.map(Transaction::getTotalPrice)
.reduce(Price::getMax)
.ifPresent(price -> System.out.printf("Highest transaction price: %s\n", price));
With an output of:
Highest transaction price: $79.92
Note too, that we have used the
reduce()
variant that takes only one parameter: aBinaryOperator
. The thinking is: we do not need an identity parameter because we will not need a default starting point for this operation.
When you are seeking the maximum value from a collection of elements, you start testing those elements directly without involving any external default value.
Lowest Value Transaction
Continuing with the trend we started with the preceding tasks, we delegate the query on which is the lowest value transaction to the Transaction
elements themselves.
Further, because we need a result that contains an entire Transaction
element's details, we direct all the interrogation to a stream of Transaction
elements without mapping them into any other type.
Still, there is a bit of work you should do to make a Transaction
element gauge its value in terms of Price
.
First, you will need to find the minimum Price
of two Transaction
objects.
Then, check which Transaction
had that minimum Price
and return it.
Otherwise, you will accomplish that by using a routine such as this getMin
method:
public class Transaction {
// Fields, getters, constructors, other methods
public Transaction getMin(Transaction otherTransaction) {
Price min = this.getTotalPrice().getMin(otherTransaction.getTotalPrice());
return min.equals(this.getTotalPrice()) ? this : otherTransaction;
}
}
With that done, it becomes fairly simple to incorporate the routine into a reduce()
operation such as this one:
transactions.stream()
.reduce(Transaction::getMin)
.ifPresent(transaction -> {
System.out.printf("Transaction with lowest value: %s\n", transaction);
});
To get an output of:
Transaction with lowest value { Product: Tomato; price: $3.49 Qty: 5 lbs Total price: $17.45}
Again, an output such as this one is attainable when you exploit the
toString()
fully. Use it to generate as much information as possible to make an object's value human friendly when you print it out.
Conclusion
As Java's implementation of the commonplace fold routine, reduce()
is fairly effective. Yet, as we have seen, it demands a total rethink of how you design your classes to be able to exploit it fully.
Keep in mind, though, that reduce()
can lower your code's performance if you use it wrongly. The operation works in both sequential and parallel streams. It can however get tricky when you use it with huge streams because reduce()
is not efficient in mutable reduction operations.
We saw a case, for example, where you could use reduce()
to concatenate String
elements. Remember String
objects are immutable. Thus, when we used reduce()
for accumulation, we actually created very many String
objects in every accumulation pass.
Yet, if you try to remedy that shortcoming by using a mutable identity container like a List
we expose that container to ConcurrentModification
exceptions.
Otherwise, we have explored a use case of a grocery store's transactions. We designed the code for this scenario in such a way that every accumulation carries out small and fast calculations.
Yes, new object allocations are still there for every accumulation we call with reduce()
. But, we have made them as simple as possible. As a result, our implementation can work just as well when you parallelize the Transaction
streams.
The code used for this article comes complete with unit tests. So, feel free to explore the code and its inner workings on GitHub.