Java 8 Streams: Definitive Guide to the filter() Method

Introduction

The Java Streams API simplifies working with a collection of elements. Because streams turn these elements into a pipeline, you could test them using a set of conditions (known as predicates), before finally acting on those that meet your criteria.

The filter() method is one such operation that tests the elements in a stream. And, as you can guess, it requires a predicate for it to work.

The official documentation defines the filter() method as one which:

Returns a stream consisting of the elements of [a given] stream that match the given predicate.

Whereby, the documentation defines a predicate as:

[a boolean-valued function] of one argument

The filter() method has the signature:

Stream<T> filter(Predicate<? super T> predicate)

And, it takes a predicate (which is an implementation of a functional interface) with one method:

boolean test(T t)

Note: The filter() method is an intermediate operation. So, it is important that you pass a predicate to the filter() method that does not modify the elements on test. Also, the predicate should not produce different results when you subject it to similar operations.

When predicates fulfill these two requirements, they make it possible to run streams in parallel. That is because you are sure that no unexpected behavior will come out of such a process.

In practice, there is no limit to the number of filter() method calls you can make on a stream. For example:

list.stream()
    .filter(predicate1)
    .filter(predicate2)
    .filter(predicate3)
    .filter(predicate4)
    .count();

You could also just stack multiple predicates via the && operator:

list.stream()
    .filter(predicate1
            && predicate2
            && predicate3
            && predicate4)
    .count();

Yet, the classic for loop can do exactly the same things like what you can do with filter() methods. Like this, for example:

long count = 0;
for (int i = 0; i < list().size(); i++) {
    if (predicate1
            && predicate2
            && predicate3
            && predicate4) {
        count = count + 1;
    }
}

So, what approach should you settle for among these three? Is there a difference in resource efficiency among the three? That is, is there an approach that runs faster than the other?

This guide will answer these questions, and give you a deeper understanding of the filter() method and how you can employ it in your Java applications today.

Also, we will put into practice what it has concluded from those answers to create an interesting piece of code. One that filters an entire dictionary of words to assemble groups of anagrams. And, if you have played "Scrabble" before (or even filled a crossword puzzle, you will appreciate why anagrams are such an important feature of words to get to know.

Understanding the filter() Method

Say you have a list of four words:

yearly
years
yeast
yellow

And say you want to know how many are five-letter words - how many of those words have a string length of 5.

Since we'll be utilizing the Stream API to process this data - let's create a Stream out of the word list, and filter() them given a Predicate, and then count() the remaining elements:

List<String> list = List.of("yearly", "years", "yeast", "yellow");

long count = list.stream().filter(s -> s.length() == 5).count();
System.out.println(String.format("There are %s words of length 5", count));

This results in:

There are 2 words of length 5

After the filter() method kicks in, given this predicate - only two elements are available in the stream, which can be collected into another collection as well:

List filteredList = list.stream().filter(s -> s.length() == 5).collect(Collectors.toList());
System.out.println(filteredList);

This results in:

[years, yeast]

The filter() method returns a new stream, so we can choose to perform other stream operations, or collect it to a more tangible collection. For instance, you can stack several filter() methods consecutively:

List<String> list = List.of("yearly", "years", "yeast", "yellow", "blues", "astra");

List filteredList = list.stream()
            .filter(s -> s.length() == 5)
            .filter(s -> !s.startsWith("y"))
            .filter(s -> s.contains("str"))
            .collect(Collectors.toList());
System.out.println(filteredList);

Here, we filter the list three times, creating three streams:

First  filter() results in: [years, yeast, blues, astra]
Second filter() results in: [blues, astra]
Third  filter() results in: [astra]

So we are ultimately left with:

[astra]

So, what's really happening here?

If you're new to how predicates work, the previous code might fully make sense, but there might be a barrier between true understanding of what's going on - so let's break it down.

Let's start off by creating a Stream of the words:

Stream<String> words = Stream.of("yearly", "years", "yeast", "yellow");

There's no difference between creating a Stream explicitly like this, or creating one from a collection via the stream() method anonymously:

List<String> list = List.of("yearly", "years", "yeast", "yellow");

// Create Stream and return result
List result = list.stream()...

Both of these construct a stream, but the latter case is more common, as you'll typically have an underlying collection to work with.

Then, we can define a predicate for matching our elements:

Predicate<String> predicate = new Predicate<String>() {
    @Override
    public boolean test(String word) {
        return word.length() == 5;
    }
};

The predicate runs the test() method against all elements - and a boolean value is returned based on the result of this method. If true, the element is not filtered out and will remain in the stream after the filter() method. If false, it's removed from the Stream, but of course, not the underlying collection.

You could also declare this predicate using a lambda, as a short-hand version:

Predicate<String> predicate = (String word) -> word.length() == 5;

Or, even in an even more concise manner:

Predicate<String> predicate = word -> word.length() == 5;

The last step is to attach the predicate to a filter() method on the words stream before asking it to count the number of elements that have passed the test:

// Put the collection of words into a stream
Stream<String> words = Stream.of("yearly", "years", "yeast", "yellow");
// Declare a predicate that allows only those words that have a length of 5
Predicate<String> predicate = word -> word.length() == 5;
// Attach the predicate to filter method and count how many words have passed the test
long count = words.filter(predicate).count();

With a sharp eye - you can see that this is factually the same, explicit version of the code we wrote first!

long count = list.stream().filter(s -> s.length() == 5).count();

In this version - we simply create a Stream via the stream() method and call the predicate anonymously within the filter() method call.

Is There a 'Right' Way of Using the filter() Method?

The previous example put the filter() method to good use. Still, we can take things a notch higher. So, let us explore an even more involving use case.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

You want to generate many decimal figures between E and PI. And those figures must exclude E, PI, 2.0, and 3.0. That means that a figure (f) must meet the following criteria:

f > Math.Ef < Math.PIf != 2f != 3

Here, PI and E come from the Java Math API. Where PI is:

The double value that is closer than any other to pi, the ratio of the circumference of a circle to its diameter.

Hence:

PI = 3.14159265358979323846;

And E is:

The double value that is closer than any other to e, the base of the natural logarithms.

Thus:

E = 2.7182818284590452354;

Creating Random Figures

All the filtering strategies we will create need figures to work with. So, let us start by creating many random figures that are all greater than 1 and less than 4.

And, to accomplish that, we will use the abstract class FilterFigures:

public abstract class FilterFigures {
    // Generate random figures in increasing exponents of base 10	
    // Thus, with an exponent of one: 10^1 = 10	
    // two: 10^2 = 100	
    // three: 10^3 = 1,000	
    // four: 10^4 = 10,000	
    // five: 10^5 = 100,000	
    // six: 10^6 = 1,000,000	
    // and so on	
    private final double exponent;
        
    FilterFigures(double exponent) {
        this.exponent = exponent;
    }
    
    // Child classes must do their filtering here when this method is called by client code   
    public abstract void doFilter();
    // A list of random doubles are automatically generated by this method    
    protected List<Double> getRandomFigures() {
        return ThreadLocalRandom
                .current()
                .doubles((long) Math.pow(10, exponent), 1, 4)
                .boxed()
                .collect(Collectors
                        .collectingAndThen(Collectors.toList(), 
                                           Collections::unmodifiableList));
    }
}

With this class, we will use an exponent of 10 to generate random numbers.

So, note the method getRandomFigures():

  • (1) We create a random number generator using ThreadLocalRandom.current(). You should prefer this way of creating a Random instance because as the official documentation remarks:

When applicable, use of ThreadLocalRandom rather than shared Random objects in concurrent programs will typically encounter much less overhead and contention.

  • (2) We call the generator to produce random double values. Here, we pass three arguments. First, the number of random figures we want the generator to produce using Math.pow(10, exponent). Meaning the Math API will return a value that is equal to 10 raised to the power of the passed exponent. Second, we dictate the lowest random figure that may be included in the collection of random figures. Here that value is 1. Also we suggest the highest (though, exclusive) bound (4).

  • (3) We instruct the random number generator to box the primitive double values with the wrapper Double class. And why is that important? Because we want to collect the values in List. Yet Java's List implementations like the ArrayList class cannot hold primitive values like double. It can hold Double though.

  • (4) Finally we terminate the stream of Double values using a Collector and a finisher.

With the FilterFigures class at hand, we can then create concrete subclasses for it that use various tactics to filter the random numbers.

Using Many, Sequential filter() Methods

public class ManySequentialFilters extends FilterFigures {    
    public ManySequentialFilters(double exponent) {        
        super(exponent);    
    }	
    // This method filters the random figures and only permits those figures that are less than pi 	
    // (i.e., 3.14159265358979323846)
    // It permits those that are greater than the base of a natural algorithm	
    // (i.e., 2.7182818284590452354)	
    // It does not permit the figure 3
    // It does not permit the figure 2    
    @Override
    public long doFilter() {
        return super.getRandomFigures().stream()
                .filter(figure -> figure < Math.PI)
                .filter(figure -> figure > Math.E)
                .filter(figure -> figure != 3)
                .filter(figure -> figure != 2)
                .count();
    }
}

This class applies four filters to fulfill the requirements that we set out earlier. As earlier, a filter() results in a new stream, with certain elements filtered out, based on the predicate. This means we can call filter() again on that stream, and so on.

Here, four new streams are created, and each time, some elements are being filtered out:

FilterFigures ff = new ManySequentialFilters(5);

long count = ff.doFilter();
System.out.println(count);

With an exponent of 5, there are quite a lot of numbers, and the count of numbers that fit our four filters is something along the lines of:

14248

Given the randomness factor, each run will result in a different count, but it should be in the same ballpark approximately.

If you're interested in the figured created by the class, you can easily take a peek with:

System.out.println(ff.getRandomFigures());

Which will result in a potentially long list - with an exponent of 5, this list has 100000 elements:

2.061505905989455, 2.1559549378375986, 2.785542981180915, 3.0510231495547373, 
3.449422675836848, 3.225190770912789, 3.100194060442495, 2.4322353023765593, 
2.007779315680971, 2.8776634991278796, 1.9027959105246701, 3.763408883116875, 
3.670863706271426, 1.5414358709610365, 3.474927271813806, 1.8701468250626507, 
2.546568871253891...

Note: With larger numbers, such as 10, you'll run out of heap space if you don't manually change it.

Using Combined, Sequential filter() Methods

Creating a new stream for each filter() is a bit wasteful, and if you have an arbitrary list of predicates, creating a whole lot of streams can impact the performance of your application.

You can combine multiple predicates and filter() using them in one go:

public class CombinedSequentialFilters extends FilterFigures {

    public CombinedSequentialFilters(double exponent) {
        super(exponent);
    }
    
    // This method filters random figures  using a 
    // predicate testing all the conditions in one go
    @Override
    public long doFilter() {
        return super.getRandomFigures()
            .stream()
            .filter(
                figure - > figure < Math.PI 
                && figure > Math.E 
                && figure != 3 
                && figure != 2
            )
            .count();
    }
}

So, how much does this approach have an effect on performance? The performance is benchmarked in a later section.

Using Many, Parallel filter() Methods

public class ManyParallelFilters extends FilterFigures {

    public ManyParallelFilters(double exponent) {
        super(exponent);
    }

    @Override
    public long doFilter() {
        return super.getRandomFigures()
            .stream()
            .parallel()
            .filter(figure - > figure < Math.PI)
            .filter(figure - > figure > Math.E)
            .filter(figure - > figure != 3)
            .filter(figure - > figure != 2)
            .count();
    }
}

Again, the expected outcome of this class is similar to the two we have seen earlier. But, the difference here is that we have started using the parallel() feature. This is an intermediate feature of the Streams API.

With the addition of the parallel() method, the code will make use of all the cores that your machine has. We could also parallelize the filtering tactic of using a combined predicate.

Using Combined, Parallel filter() Methods

public class CombinedParallelFilters extends FilterFigures {
    public CombinedParallelFilters(double exponent) {
        super(exponent);
    }
    @Override public long doFilter() {
        return super.getRandomFigures()
                .stream()
                .parallel()
                .filter(figure -> figure < Math.PI 
                        && figure > Math.E
                        && figure != 3
                        && figure != 2)
                .count();
    }
}

With this class we have simply added the parallel() operation to the complex predicate we encountered earlier. The output should remain in the same class.

Yet, it's worth testing if we get any gains in speed by fashioning the filter() methods in varying ways. Which one is preferable from this bunch?

Choosing the Fastest Way of Using filter() Methods

A straightforward way of measuring how the various styles of using filter() perform is by timing them. So, in the FiltersTest class we have run all the classes using filter with an exponent of 7. Meaning we want each of these classes to filter 10,000,000 random doubles.

long startTime = System.currentTimeMillis();
// With an exponent of 7, the random generator will produce 10^7 random doubles - 10,000,000 figures!
int exponent = 7;
new ManySequentialFilters(exponent).doFilter();
long endTime = System.currentTimeMillis();
System.out.printf(
    "Time taken by many sequential filters = %d ms\n",
    (endTime - startTime)
);
startTime = System.currentTimeMillis();
new ManyParallelFilters(exponent).doFilter();
endTime = System.currentTimeMillis();
System.out.printf(
    "Time taken by many parallel filters = %d ms\n",
    (endTime - startTime)
);
startTime = System.currentTimeMillis();
new CombinedSequentialFilters(exponent).doFilter();
endTime = System.currentTimeMillis();
System.out.printf(
    "Time taken by combined sequential filters = %d ms\n",
    (endTime - startTime)
);
startTime = System.currentTimeMillis();
new CombinedParallelFilters(exponent).doFilter();
endTime = System.currentTimeMillis();
System.out.printf(
    "Time taken by combined parallel filters = %d ms\n",
    (endTime - startTime)
);

When you run this test, you will get results that look like these:

Time taken by many sequential filters = 2879 ms
Time taken by many parallel filters = 2227 ms
Time taken by combined sequential filters = 2665 ms
Time taken by combined parallel filters = 415 ms

Note, these results are from a computer running on ArchLinux, Java 8, with 8GiB of RAM and an Intel i5-4579T CPU @ 2.90GHz.

A very different result is achieved when run on a different machine, running Windows 10, Java 14, with 32GiB of RAM and an AMD Ryzen 7 3800X 8-Core @ 3.9GHz:

Time taken by many sequential filters = 389 ms
Time taken by many parallel filters = 295 ms
Time taken by combined sequential filters = 303 ms
Time taken by combined parallel filters = 287 ms

Thus, depending on your machine's capabilities and architecture your results may be faster or slower.

For instance, the Intel i5 processor got an obvious boost with parallelization, while the AMD Ryzen 7 processor doesn't seem to have gained much.

filter() Method vs. for Loop

The for loop was king before filtering came along, and the filter() method was accepted with gratitude from the developer community. It's a much more concise and less verbose way to filter elements out of collections.

Using the classic Java for loop, you can still filter elements to satisfy given conditions. So, for our case we could filter the random doubles using this ClassicForLoop class:

public class ClassicForLoop extends FilterFigures {
    
    public ClassicForLoop(double exponent) {
        super(exponent);
    }
    
    @Override
    public long doFilter() {
        List<Double> randomFigures = super.getRandomFigures();
        long count = 0;
        for (int i = 0; i < randomFigures.size(); i++) {
            Double figure = randomFigures.get(i);
            if (figure < Math.PI
                    && figure > Math.E
                    && figure != 3
                    && figure != 2) {
                count = count + 1;
            }
        }
        return count;
    }
}

But, why even bother with this loop style? So far we have seen that the combined parallel filters run the fastest on certain machines. So, we should compare the latter with the for loop to see if there is a substantial difference in speeds, if nothing else.

And, for that we will add a code snippet in the FiltersTest class to measure the speed of the for loop alongside the combined parallel filters. Like this:

startTime = System.currentTimeMillis();
new ClassicForLoop(exponent).doFilter();
endTime = System.currentTimeMillis();
System.out.printf(
        "Time taken by filtering using classic for loop = %d ms\n",
                (endTime - startTime));

The results will, again, vary depending on your local machine:

Generally speaking - the for() loop should outperform the filter() method on small sets, such as with exponents of up to 4, though this is typically measured in milliseconds - so you practically won't notice a difference.

With more than ~10k doubles, for loops typically start underperforming compared to the filter() method.

Yet, you should still opt for the filter() method because of its readability. The ‘loops’ style suffers from being too abstract. And since you write code for humans to read and not for computers to compile alone, readability becomes a crucial factor.

Additionally, if your dataset starts increasing - with a for loop, you're out of luck. Whereas for the filter() method - the performance relative to the for loop starts getting better.

Conclusion

The filter() method is one of the ways that you could use to make your Java code more functional in nature. As opposed to imperative or procedural. Yet, there are considerations to put in place with the filter() method.

Chaining many filter methods risks slowing down your code when it runs, for example. This is because as an intermediate operation, it creates a new stream with the elements that pass a predicate's condition. Thus, the trick remains to combine predicates in one statement to reduce the number of filter() calls you make.

You can find the code used in this article on GitHub.

Last Updated: May 8th, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Hiram KamauAuthor

In addition to catching code errors and going through debugging hell, I also obsess over whether writing in an active voice is truly better than doing it in passive.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms