Spring Data MongoDB - Guide to the @Aggregation Annotation

Spring Data MongoDB - Guide to the @Aggregation Annotation

Introduction

MongoDB is a document-based NoSQL database, that stores data in BSON (Binary JSON) format.

Like with any database, you'll routinely make calls to read, write or update data stored in the document store. In many cases, retrieving data isn't as simple as just writing a single query (even though queries can get pretty complex).

If you'd like to read more about writing MongoDB queries with Spring Boot - read our Spring Data MongoDB - Guide to the @Query Annotation!

With MongoDB - Aggregations are used to process many documents, and return some computed result. This is achieved by creating a Pipeline of operations, where each operation takes in a set of documents, and filters them given some criteria.

Spring Data MongoDB is Spring's module that acts as an interface between a Spring Boot application and MongoDB. Naturally, it offers a set of annotations that allow us to easily "switch" features on and off, as well as let the module itself know when it should take care of things for us.

The @Aggregation annotation is used to annotate Spring Boot Repository methods, and invoke a pipeline() of operations you supply to the @Aggregation annotation.

In this guide, we'll take a look at how to leverage the @Aggregation annotation to aggregate the results in a MongoDB database, what aggregation pipelines are, how to use named and positional method arguments for dynamic aggregations, as well as how to sort and paginate results!

Domain Model and Repository

Let's start out with our domain model and a simple repository. We'll create a Property, acting as a model for a real estate property with a couple of relevant fields:

@Document(collection = "property")
public class Property {

    @Id
    private String id;
    @Field("price")
    private int price;
    @Field("area")
    private int area;
    @Field("property_type")
    private String propertyType;
    @Field("transaction_type")
    private String transactionType;
    
    // Constructor, getters, setters, toString()
    
}

And with it, a simple associated MongoRepository:

@Repository
public interface PropertyRepository extends MongoRepository<Property, String> {}

Reminder: MongoRepository is a PagingAndSortingRepository, which is ultimately a CrudRepository.

Through aggregations, you can, naturally, sort and paginate the results as well, taking advantage of the fact that it's extending the PagingAndSortingRepository interface from Spring Data.

Understanding the @Aggregation Annotation

The @Aggregation annotation is applied on the method-level, within a @Repository. The annotation accepts a pipeline - an array of strings, where each string represents a stage in the pipeline (operation to run). Each next stage operates on the results of the previous one.

Various stages exist, and they allow you to perform a wide variety of operations. Some of the more commonly used ones are:

  • $match - Filters documents based on whether their field matches a given predicate.
  • $count - Returns the count of the documents left in the pipeline.
  • $limit - Limits the number of (slices) returned documents, starting at the beginning of the set and approaching the limit.
  • $sample - Randomly samples a given number of documents from a set.
  • $sort - Sorts the documents given a field and sorting order.
  • $merge - Writes the documents in the pipeline into a collection.

Some of these are terminal operations (applied at the end), such as $merge. Sorting also has to be done after the rest of the filtering has already been finished.

In our case, to add an @Aggregation to our repository, we'd only have to add a method and annotate it:

@Aggregation(pipeline = {
        "Operation/Stage 1...",
        "Operation/Stage 2...",
        "Operation/Stage 3...",
})
List<Property> someMethod();

Or, you can keep them inline:

@Aggregation(pipeline = {"Operation/Stage 1...", "Operation/Stage 2...", "Operation/Stage 3..."})
List<Property> someMethod();

Depending on the number of stages you have, the latter option might get illegible fairly quickly. In general, it helps to break the stages down in new lines for readability.

That being said, let's add some operations to the pipeline! Let's, for instance, search for properties that have a field that matches a given value, such as properties whose transactionType is equal to "For Sale":

@Aggregation(pipeline = {
    "{'$match':{'transaction_type':'For Sale'}",
})
List<Property> findPropertiesForSale();

Though, having a single match like this beats the point of aggregation. Let's add some more matching conditions. Don't forget, you can supply any number of matching conditions here, including selectors/operators such as $gt to filter further:

@Aggregation(pipeline = {
    "{'$match':{'transaction_type':'For Sale', 'price' : {$gt : 100000}}",
})
List<Property> findExpensivePropertiesForSale();

Now, we'd be searching for properties that match the transaction_type, but also have a price greater than ($gt) 100000! Even with two of these, having just a $match stage doesn't have to warrant an @Aggregation, even though it's still a fully valid way to obtain results based on multiple conditions.

Additionally, it's no fun when you deal with fixed values. Who's to say this is an expensive property? It would be much more useful to be able to supply a lower mark to the method call, and use that with the $gt operator instead.

This is where named and positional method parameters come in.

Referencing Named and Positional Method Parameters

We rarely deal with just static numbers, since they're, well, not flexible. We want to offer flexibility both to end-users, but also to developers. In the example before, we've used two fixed values - 'For Sale', and 100000. In this section, we'll replace those two with named and positional method parameters, and supply them via the method's parameters!

Using named or positional arguments doesn't change the code functionally, and it's generally up to the engineer/team to decide which option to go with, based on their preferences. It's worth being consistent with one type, once you've chosen it:

@Aggregation(pipeline = {
        "{'$match':{'transaction_type': ?0, 'price' : {$gt : ?1}}",
})
List<Property> findPropertiesByTransactionTypeAndPriceGTPositional(String transactionType, int price);

@Aggregation(pipeline = {
        "{'$match':{'transaction_type': #{#transactionType}, 'price' : {$gt : #{#price}}}",
})
List<Property> findPropertiesByTransactionTypeAndPriceGTNamed(@Param("transactionType") String transactionType, @Param("price") int price);

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

The former is more concise, but it does require you to enforce the order of the arguments coming in. Additionally, if the field in the database itself isn't indicative of the type/expected value (which is bad design, but sometimes is out of your control) - using positional arguments might add to the confusion, as there's a small amount of ambiguity as to what value you can expect.

The latter is, admittedly, more verbose, but it does allow you to mix the order of parameters. There's no need to enforce their position, as they're matched by the @Param annotation with the SpEL expressions, tying them to the name in the operation pipeline.

There's no objectively better option here, nor one that's widely accepted in the industry. Choose the one you feel more comfortable with yourself.

Tip: If you've turned on DEBUG as your logging level - you'll be able to see the query that's sent out to Mongo in the logs. You can copy-paste that query into the MongoDB Atlas to check whether the query returns the correct results there, and verify if you've accidentally messed up the positions. Chances are - your query is fine, but you've just mixed up the positions, so the result is empty.

Now, you can supply values to the method calls, and they'll be used in the @Aggregation dynamically! This allows you to reuse the same methods for various calls, such as, for instance, getting active properties. This is going to be a common call, so whether you retrieve 5, 10 or 100 of them, you can reuse the same method.

When you're dealing with larger corpora of data, it's worth looking into sorting and paging, as well. End-users shouldn't be expected to sort through the data themselves.

Sorting and Paging

Sorting is typically done at the end, since sorting beforehand might end up being redundant. You can either apply sorting methods after the aggregation takes place, or during the aggregation.

Here, we'll explore the prospect of applying sorting methods inside the aggregation itself. We'll sample some number of properties, and sort them, say, by area. This could be any other field, such as price, datePublished, sponsored, etc. The $sort operation accepts a field to sort by, as well as the order (where 1 is ascending and -1 is descending).

@Aggregation(pipeline = {
        "{'$match':{'transaction_type':?0, 'price': {$gt: ?1} }}",
        "{'$sample':{size:?2}}",
        "{'$sort':{'area':-1}}"
})
List<Property> findPropertiesByTransactionTypeAndPriceGT(String transactionType, int price, int sampleSize);

Here, we've sorted the properties by area, in descending order - meaning, the properties with the largest area will show up first in the sort. The transaction type, price and sample size are all variable and can be dynamically set.

If you'd like to incorporate Pagination into this, the standard Spring Boot pagination approach is applied - you just add a Pageable pageable to the method definition and call:

@Aggregation(pipeline = {
        "{'$match':{'transaction_type':?0, 'price': {$gt: ?1} }}",
        "{'$sample':{size:?2}}",
        "{'$sort':{'area':-1}}"
})
Iterable<Property> findPropertiesByTransactionTypeAndPriceGTPageable(String transactionType, int price, int sampleSize, Pageable pageable);

When calling the method from a controller, you'll want to construct a Pageable object to pass in:

int page = 1;
int size = 5;

Pageable pageable = new PageRequest.of(page, size);
Page<Property> = propertyRepository.findPropertiesByTransactionTypeAndPriceGTPageable("For Sale", 100000, 5, pageable);

The Page would be the second page (index 1), with 5 results.

Note: Since we've already sorted the properties in the aggregation, there's no need to include any additional sorting configuration there. Alternatively, you can skip the $sort in the aggregation, and sort it via the Pageable instance.

Creating a REST API

Let's quickly spin up a REST API that exposes the results of these methods to an end user, and send a curl request to validate the results, starting with the controller with an autowired repository:

@RestController
public class HomeController {
    @Autowired
    private PropertyRepository propertyRepository;
    
}

If you'd like to read more about the @RestController and @Autowired annotations, read out @Controller and @RestController Annotations in Spring Boot and @Autowired Section at Spring Annotations: Core Framework Annotations!

We'll firstly want to add a few properties to the database:

@GetMapping("/addProperties")
public ResponseEntity addProperies() {

    List<Property> propertyList = List.of(
            new Property(100000, 45, "Apartment", "For Sale"),
            new Property(65000, 48, "Apartment", "For Sale"),
            new Property(280000, 75, "Apartment", "For Sale"),
            new Property(452000, 110, "House", "For Sale"),
            new Property(400000, 125, "House", "For Rent"),
            new Property(125000, 100, "Apartment", "For Sale"),
            new Property(95000, 70, "House", "For Rent"),
            new Property(35000, 25, "Apartment", "For Sale")
    );

    for (Property property : propertyList) {
        propertyRepository.save(property);
    }

    return ResponseEntity.ok().body(propertyList);
}

Now, let's curl a request to this endpoint to add the properties to the database:

$ curl localhost:8080/addProperties

[ {
  "id" : "61dedea6799b5758bb857292",
  "price" : 100000,
  "area" : 45,
  "propertyType" : "Apartment",
  "transactionType" : "For Sale"
}, {
  "id" : "61dedea6799b5758bb857293",
  "price" : 65000,
  "area" : 48,
  "propertyType" : "Apartment",
  "transactionType" : "For Sale"
},
...

Note: For a pretty-print response, remember to turn Jackson's INDENT_OUTPUT to true in your application.properties.

And now, let's define a /getProperties endpoint, which calls one of the PropertyRepository methods which performs an aggregation:

@GetMapping("/getProperties")
public ResponseEntity home() {
    return ResponseEntity
    .ok()
    .body(propertyRepository.findPropertiesByTransactionTypeAndPriceGT("For Sale", 100000, 5));
}

This should return up to 5 randomly selected properties from a set of properties that are for sale, over 100k in price, sorted by their area. If there are no 5 samples to choose from - all of the fitting properties are returned:

$ curl localhost:8080/getProperties

[ {
  "id" : "61dedea6799b5758bb857295",
  "price" : 452000,
  "area" : 110,
  "propertyType" : "House",
  "transactionType" : "For Sale"
}, {
  "id" : "61dedea6799b5758bb857297",
  "price" : 125000,
  "area" : 100,
  "propertyType" : "Apartment",
  "transactionType" : "For Sale"
}, {
  "id" : "61dedea6799b5758bb857294",
  "price" : 280000,
  "area" : 75,
  "propertyType" : "Apartment",
  "transactionType" : "For Sale"
} ]

Works like a charm!

Conclusion

In this guide, we've gone over the @Aggregation annotation in the Spring Data MongoDB module. We've covered what aggregations are, when they can be used, and how they differ from regular queries.

We've overviewed some of the common operations in an aggregation pipeline, before writing our own pipelines with static and dynamic arguments. We've explored positional and named parameters for the aggregations, and finally, spun up a simple REST API to serve the results.

Last Updated: January 12th, 2022
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

David LandupAuthor

Entrepreneur, Software and Machine Learning Engineer, with a deep fascination towards the application of Computation and Deep Learning in Life Sciences (Bioinformatics, Drug Discovery, Genomics), Neuroscience (Computational Neuroscience), robotics and BCIs.

Great passion for accessible education and promotion of reason, science, humanism, and progress.

Want a remote job?

    Prepping for an interview?

    • Improve your skills by solving one coding problem every day
    • Get the solutions the next morning via email
    • Practice on actual problems asked by top companies, like:
     
     
     

    Make Clarity from Data - Quickly Learn Data Visualization with Python

    Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

    From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

    © 2013-2022 Stack Abuse. All rights reserved.