Java - Filter a Stream with Lambda Expressions

Java - Filter a Stream with Lambda Expressions

Java Streams have been introduced all the way back in Java 8 in 2014, in an effort to introduce verbose Java to a Functional Programming paradigm. Java Streams expose many flexible and powerful functional operations to perform collection processing in one-liners.

Filtering collections based on some predicate remains one of the most commonly used functional operations, and can be performed with a Predicate or more concisely - with a Lambda Expression.

In this short guide, we'll take a look at how you can filter a Java 8 Stream with Lambda Expressions.

Advice: If you aren't familiar with functional interfaces, predicates and lambda expressions - for an example-driven intuition-building guide, read our in-depth "Guide to Functional Interfaces and Lambda Expressions in Java"!

Filtering Streams in Java

In general, any Stream can be filtered via the filter() method, and a given predicate:

Stream<T> filter(Predicate<? super T> predicate)


Each element in the stream is run against the predicate, and is added to the output stream if the predicate returns true. You can supply a Predicate instance:

Predicate<String> contains = s -> s.contains("_deprecated");
List<String> results = stream.filter(contains).collect(Collectors.toList());


Or, simplify it by providing a Lambda Expression:

List<String> results = stream.filter(s -> s.contains("_deprecated"))
.collect(Collectors.toList());


Or even collapse the Lambda Expression into a method reference:

// Filters empty strings by invoking s -> s.isEmpty() on each element
List<String> results = stream.filter(String::isEmpty)
.collect(Collectors.toList());


With method references, you can't pass arguments, though, you can define methods in the object you're filtering and tailor them to be easily filterable (as long as the method doesn't accept arguments and returns a boolean).

Remember that streams are not collections - they're streams of collections, and you'll have to collect them back into any collection such as a List, Map, etc. to give them permanence. Additionally, all operations done on stream elements either intermediate or terminal:

• Intermediate operations return a new stream with changes from the previous operation
• Terminal operations return a data type and are meant to end a pipeline of processing on a stream

filter() is an intermediate operation, and is meant to be chained with other intermediate operations, before the stream is terminated. To persist any changes (such as changes to elements themselves, or filtered results), you'll have to assign the resulting output stream to a new reference variable, through a terminal operation.

Note: Even when chaining many lambda expressions, you might not run into readability issues, with proper linebreaks.

In the following examples, we'll be working with this list of books:

Book book1 = new Book("001", "Our Mathematical Universe", "Max Tegmark", 432, 2014);
Book book2 = new Book("002", "Life 3.0", "Max Tegmark", 280, 2017);
Book book3 = new Book("003", "Sapiens", "Yuval Noah Harari", 443, 2011);

List<Book> books = Arrays.asList(book1, book2, book3);


Filter Collection with Stream.filter()

Let's filter this collection of books. Any predicate goes - so let's for example filter by which books have over 400 pages:

List<Book> results = books.stream()
.filter(b -> b.getPageNumber() > 400)
.collect(Collectors.toList());


This results in a list which contains:

[
Book{id='001', name='Our Mathematical Universe', author='Max Tegmark', pageNumber=432, publishedYear=2014},
Book{id='003', name='Sapiens', author='Yuval Noah Harari', pageNumber=443, publishedYear=2011}
]


When filtering, a really useful method to chain is map(), which lets you map objects to another value. For example, we can map each book to its name, and thus return only the names of the books that fit the predicate from the filter() call:

List<String> results = books.stream()
.filter(b -> b.getPageNumber() > 400)
.map(Book::getName)
.collect(Collectors.toList());


This results in a list of strings:

[Our Mathematical Universe, Sapiens]


Filter Collection on Multiple Predicates with Stream.filter()

Commonly, we'd like to filter collections by more than one criteria. This can be done by chaining multiple filter() calls or using a short-circuit predicate, which cheks for two conditions in a single filter() call.

 List<Book> results = books.stream()
.filter(b -> b.getPageNumber() > 400 && b.getName().length() > 10)
.collect(Collectors.toList());

// Or

List<Book> results2 = books.stream()
.filter(b -> b.getPageNumber() > 400)
.filter(b -> b.getName().length() > 10)
.collect(Collectors.toList());


Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

When utilizing multiple criterions - the lambda calls can get somewhat lengthy. At this point, extracting them as standalone predicates might offer more clarity. Though, which approach is faster?

Single Filter with Complex Condition or Multiple Filters?

It depends on your hardware, how large your collection is, and whether you use parallel streams or not. In general - one filter with a complex condition will outperform multiple filters with simpler conditions (small-to-medium collections), or perform at the same level (very large collections). If your conditions are too long - you may benefit from distributing them over multiple filter() calls, for the improved readability, since performance is very similar.

The best choice is to try both, note the performance on the target device, and adjust your strategy accordingly.

GitHub user volkodavs did a filtering benchmark in throughput operations/s, and hosted the results on the "javafilters-benchmarks" repository. The results are summarized in an informative table:

It shows a clear diminishing of returns at larger collection sizes, with both approaches performing around the same level. Parallel streams benefit significantly at larger collection sizes, but curb the performance at smaller sizes (below ~10k elements). It's worth noting that parallel streams maintained their throughput much better than non-parallel streams, making them significantly more robust to input.

Last Updated: October 29th, 2022

Get tutorials, guides, and dev jobs in your inbox.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.