HomeArticles

Java: Finding Duplicate Elements in a Stream

Introduction

Introduced in Java 8, the Stream API is commonly used for filtering, mapping and iterating over elements. When working with streams, one of the common tasks is finding duplicate elements.

In this tutorial, we'll be covering several ways to find duplicate elements in a Java Stream.

Collectors.toSet()

The easiest way to find duplicate elements is by adding the elements into a Set. Sets can't contain duplicate values, and the Set.add() method returns a boolean value which is the result of the operation. If an element isn't added, false is returned, and vice versa.

Let's make a Stream of Strings with some duplicate values. These values are checked via the equals() method, so make sure to have an adequately implemented one for custom classes:

Stream<String> stream = Stream.of("john", "doe", "doe", "tom", "john");

Now, let's make a Set to store the filtered items. We'll use the filter() method to filter out duplicate values and return them:

Set<String> items = new HashSet<>();

stream.filter(n -> !items.add(n))
        .collect(Collectors.toSet())
        .forEach(System.out::println);

Here, we try to add() each element to the Set. If it's not added, due to it being duplicate, we collect that value and print it out:

john
doe

Collectors.toMap()

Alternatively, you can also count the occurrences of duplicate elements and keep that information in a map that contains the duplicate elements as keys and their frequency as values.

Let's create a List of Integer type:

List<Integer> list = Arrays.asList(9, 2, 2, 7, 6, 6, 5, 7);

Then, let's collect the elements into a Map and count their occurrences:

Map<Integer, Integer> map = list.stream()
        .collect(Collectors.toMap(Function.identity(), value -> 1, Integer::sum));
        
System.out.println(map);

We haven't removed any elements, just counted their occurrences and stored them into a Map:

{2=2, 5=1, 6=2, 7=2, 9=1}

Collectors.groupingBy(Function.identity(), Collectors.counting()) with Collectors.toList()

The Collectors.groupingBy() method is used for grouping elements, based on some property, and returning them as a Map instance.

In our case, the method receives two parameters - Function.identity(), that always returns its input arguments and Collectors.counting(), that counts the elements passed in the stream.

Then, we'll use the groupingBy() method to create a map of the frequency of these elements. After that, we can simply filter() the stream for elements that have a frequency higher than 1:

list.stream()
        // Creates a map {4:1, 5:2, 7:2, 8:2, 9:1}
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
        .entrySet()
        // Convert back to stream to filter
        .stream()
        .filter(element -> element.getValue() > 1)
        // Collect elements to List and print out the values
        .collect(Collectors.toList())
        .forEach(System.out::println);

This results in:

5=2
7=2
8=2

If you'd like to extract just the duplicate elements, without their frequency, you can throw in an additional map() into the process. After filtering, and before collecting to a list, we'll get only the keys:

.map(Map.Entry::getKey)

Collections.frequency()

Collections.frequency() is another method that comes from the Java Collections class that counts the occurrences of a specified element in the input stream by traversing each element. It takes two parameters, the collection and the element whose frequency is to be determined.

Now, we'll filter() the stream for each element that has a frequency() larger than 1:

list.stream()
        .filter(i -> Collections.frequency(list, i) > 1)
        //Collect elements to a Set and print out the values 
        .collect(Collectors.toSet())
        .forEach(System.out::println);

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

Here, we can either collect to a Set or to a List. If we collect to a list, it'll have all duplicate elements, so some may repeat. If we collect to a set, it'll have unique duplicate elements.

This results in:

5
7
8

Stream.distinct()

The distinct() method is a stateful method (keeps the state of previous elements in mind) and compares elements using the equals() method. If they are distinct/unique, they're returned back, which we can populate into another list.

Let's make a list with some duplicate values and extract the distinct values:

List<String> list = new ArrayList(Arrays.asList("A", "B", "C", "D", "A", "B", "C", "A", "F", "C"));

List<String> distinctElementList = list.stream()
        .distinct()
        .collect(Collectors.toList());

Now, all non-distinct values have more than one occurrence. If we remove the distinct values, we'll be left with duplicate elements:

for (String distinctElement : distinctElementList) {
    list.remove(distinctElement);
}

Now, let's print out the results:

list.forEach(System.out::print)

These are the duplicate elements, with their respective occurrences:

ABCAC

If you'd like to sift through these as well, and only show one occurrence of each duplicate element (instead of all of them separately), you can run them through the distinct() method again:

list.stream()
        .distinct()
        .collect(Collectors.toList())
        .forEach(System.out::print);

This results in:

ABC

Conclusion

In this article, we've gone over a few approaches to finding duplicate elements in a Java Stream.

We've covered the Stream.distinct() method from the Stream API, the Collectors.toSet(), Collectors.toMap() and Collectors.groupingBy() methods from Java Collectors, as well as Collections.frequency() method from Collections framework.

# java # streams

Last Updated: September 20th, 2023

Was this article helpful?