Java: Finding Duplicate Elements in a Stream

Introduction

Introduced in Java 8, the Stream API is commonly used for filtering, mapping and iterating over elements. When working with streams, one of the common tasks is finding duplicate elements.

In this tutorial, we'll be covering several ways to find duplicate elements in a Java Stream.

Collectors.toSet()

The easiest way to find duplicate elements is by adding the elements into a Set. Sets can't contain duplicate values, and the Set.add() method returns a boolean value which is the result of the operation. If an element isn't added, false is returned, and vice versa.

Let's make a Stream of Strings with some duplicate values. These values are checked via the equals() method, so make sure to have an adequately implemented one for custom classes:

Stream<String> stream = Stream.of("john", "doe", "doe", "tom", "john");

Now, let's make a Set to store the filtered items. We'll use the filter() method to filter out duplicate values and return them:

Set<String> items = new HashSet<>();

stream.filter(n -> !items.add(n))
        .collect(Collectors.toSet())
        .forEach(System.out::println);

Here, we try to add() each element to the Set. If it's not added, due to it being duplicate, we collect that value and print it out:

john
doe

Collectors.toMap()

Alternatively, you can also count the occurances of duplicate elements and keep that information in a map that contains the duplicate elements as keys and their frequency as values.

Let's create a List of Integer type:

List<Integer> list = Arrays.asList(9, 2, 2, 7, 6, 6, 5, 7);

Then, let's collect the elements into a Map and count their occurences:

Map<Integer, Integer> map = list.stream()
        .collect(Collectors.toMap(Function.identity(), value -> 1, Integer::sum));
        
System.out.println(map);

We haven't removed any elements, just counted their occurences and stored them into a Map:

{2=2, 5=1, 6=2, 7=2, 9=1}

Collectors.groupingBy(Function.identity(), Collectors.counting()) with Collectors.toList()

The Collectors.groupingBy() method is used for grouping elements, based on some property, and returning them as a Map instance.

In our case, the method receives two parameters - Function.identity(), that always returns its input arguments and Collectors.counting(), that counts the elements passed in the stream.

Then, we'll use the groupingBy() method to crate a map of the frequency of these elements. After that, we can simply filter() the stream for elements that have a frequency higher than 1:

list.stream()
        // Creates a map {4:1, 5:2, 7:2, 8:2, 9:1}
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
        .entrySet()
        // Convert back to stream to filter
        .stream()
        .filter(element -> element.getValue() > 1)
        // Collect elements to List and print out the values
        .collect(Collectors.toList())
        .forEach(System.out::println);

This results in:

5=2
7=2
8=2

If you'd like to extract just the duplicate elements, without their frequency, you can throw in an additional map() into the process. After filtering, and before collecting to a list, we'll get only the keys:

.map(Map.Entry::getKey)

Collections.frequency()

Collections.frequency() is another method that comes from Java Collections class that counts the occurrences of a specified element in the input stream by traversing each element. It takes two parameters, the collection and the element whose frequency is to be determined.

Now, we'll filter() the stream for each element that has a frequency() larger than 1:

list.stream()
        .filter(i -> Collections.frequency(list, i) > 1)
        //Collect elements to a Set and print out the values 
        .collect(Collectors.toSet())
        .forEach(System.out::println);

Here, we can either collect to a Set or to a List. If we collect to a list, it'll have all duplicate elements, so some may repeat. If we collect to a set, it'll have unique duplicate elements.

This results in:

5
7
8

Stream.distinct()

The distinct() method is a stateful method (keeps the state of previous elements in mind) and compares elements using the equals() method. If they are distinct/unique, they're returned back, which we can populate into another list.

Let's make a list with some duplicate values and extract the distinct values:

List<String> list = new ArrayList(Arrays.asList("A", "B", "C", "D", "A", "B", "C", "A", "F", "C"));

List<String> distinctElementList = list.stream()
        .distinct()
        .collect(Collectors.toList());

Now, all non-distinct values have more than one occurence. If we remove the distinct values, we'll be left with duplicate elements:

for (String distinctElement : distinctElementList) {
    list.remove(distinctElement);
}

Now, let's print out the results:

list.forEach(System.out::print)

These are the duplicate elements, with their respective occurences:

ABCAC

If you'd like to sift through these as well, and only show one occurence of each duplicate element (instead of all of them separately), you can run them through the distinct() method again:

list.stream()
        .distinct()
        .collect(Collectors.toList())
        .forEach(System.out::print);

This results in:

ABC

Conclusion

In this article, we've gone over a few approaches to finding duplicate elements in a Java Stream.

We've covered the Stream.distinct() method from the Stream API, the Collectors.toSet(), Collectors.toMap() and Collectors.groupingBy() methods from Java Collectors, as well as Collections.frequency() method from Collections framework.

Author image
Enthusiastic full stack development engineer