Spring Cloud: Distributed Tracing with Sleuth

Overview

In this article, we'll introduce you to Spring Cloud Sleuth, which is a distributed tracing framework for a microservice architecture in the Spring ecosystem.

In a typical microservice architecture we have many small applications deployed separately and they often need to communicate with each other. One of the challenges developers face is to trace a full request for logs to debug or to check for latency in downstream services.

To further add to the complexity, some services can have multiple instances running. It's difficult to track particular request logs in multiple services, especially if a particular service has many instances.

Spring Cloud Sleuth automatically adds some traces/metadata to your logs and inter service communication (via request headers) so its easy to track a request via log aggregators like Zipkins, ELK, etc.

This article assumes that you already have knowledge of Spring Cloud basic components. We have published several articles covering Spring Cloud if you want to read more:

Setup

In order to demonstrate the concept of tracing, we'll use a few services:

  • Eureka Server: Acts as a service registry and running on port 8761.
  • Address Service: A simple REST service that has a single endpoint of /address/{customerId} and running on port 8070.
  • Customer Service: A simple REST service that has a single endpoint of /customer/{customerId} and running on port 8060.
  • Portal Service: A simple REST service that has a single endpoint of /fullDetails/{customerId} and running on port 8050. This service internally calls address-service and customer-service to get data and combines them before the response.
  • Gateway: Single point of entry to our microservice architecture, build using Spring Cloud Gateway and running on port 8080.

spring-cloud-sleuth-request-flow

And here is how the Eureka server looks like when all the service are running:

spring-cloud-sleuth-eureka

Let's see what's written in each controller class, starting from AddressController of the address-service:

@RestController  
@RequestMapping(value = "/address")  
public class AddressController {  
  
    private static Logger log = LoggerFactory.getLogger(AddressController.class);  
  
    @GetMapping(value = "/{customerId}")  
    public String address(@PathVariable(name = "customerId", required = true) long customerId) {  
        log.info("GET /address/"+customerId);  
        return "Address of id="+customerId;  
    }  
}

CustomerController of customer-service:

@RestController  
@RequestMapping(value = "/customer")  
public class CustomerController {  
  
    private static Logger log = LoggerFactory.getLogger(CustomerController.class);  
  
    @GetMapping(value = "/{customerId}")  
    public String address(@PathVariable(name = "customerId", required = true) long customerId){  
        log.info("GET /customer/"+customerId);  
        return "Customer details of id="+customerId;  
    }  
}

PortalController of portal-service:

@RestController
public class PortalController {

    private static Logger log = LoggerFactory.getLogger(PortalController.class);

    @Autowired
    RestTemplate restTemplate;

    @GetMapping(value = "/fullDetails/{customerId}")
    public String address(@PathVariable(name = "customerId", required = true) long customerId) {
        log.info("GET /fullDetails/" + customerId);

        String customerResponse = restTemplate.getForObject("http://customer-service/customer/" + customerId, String.class);
        String addressResponse = restTemplate.getForObject("http://address-service/address/" + customerId, String.class);

        return customerResponse + "<br>" + addressResponse;
    }
}

To check let's run the portal-service endpoint via gateway by navigating your browser to http://localhost:8080/portal-service/fullDetails/12. You should see something like this:

spring-cloud-sleuth-test-url

Now, imagine tracing these logs on different servers. Also, even if you have these log files pushed to a common location and have a log aggregator, it would be difficult to find a full trace of a request between multiple services at some point of time.

Adding Spring Cloud Sleuth

Spring Cloud Sleuth adds unique IDs to your logs, which stay the same between many microservices and can be used by common log aggregators to see how a request flows.

To add this functionality, we need to add a dependency in the pom.xml file of each downstream service:

<dependency>  
 <groupId>org.springframework.cloud</groupId>  
 <artifactId>spring-cloud-starter-sleuth</artifactId>  
</dependency>

Restart all the application and hit the http://localhost:8080/portal-service/fullDetails/12 endpoint again and check the logs for each service.

Portal service logs:

spring-cloud-sleuth-logs1

Address service logs:

spring-cloud-sleuth-logs2

Customer service logs:

spring-cloud-sleuth-logs3

Spring Cloud Sleuth adds two types of IDs to your logging:

  • Trace Id: A unique ID that remains the same throughout the request containing multiple microservices.
  • Span Id: A unique ID per microservice.

Basically, a Trace ID will contain multiple Span ID which can easily be used by log aggregation tools.

Sleuth not only adds these IDs to our logs but also propagates these to the next service calls (HTTP or MQ based). Also, it can send random sample logs to external applications like Zipkins out of the box.

Logs Aggregation with Zipkins

Zipkins is a distributed tracing system usually used to troubleshoot latency problems in service architectures.

To run a Zipkin server, you can follow a quick and simple guide here.

I used the Java way to run it, by executing the commands:

$ curl -sSL https://zipkin.io/quickstart.sh | bash -s
$ java -jar zipkin.jar

Although you can also run it via Docker or straight from the source code.

By default, the Zipkin server will run on port 9411. Navigate your browser to http://localhost:9411/zipkin/, to access its home page:

spring-cloud-sleuth-zipkin-home-page

Sleuth Integration with Zipkins

Now, we have to tell Sleuth to send data to the Zipkin server. First we need to add another dependency to the pom.xml file of each service:

<dependency>  
 <groupId>org.springframework.cloud</groupId>  
 <artifactId>spring-cloud-starter-zipkin</artifactId>  
</dependency>

After this, we need to add following properties in the application.properties file of each service:

spring.sleuth.sampler.probability=100  
spring.zipkin.baseUrl= http://localhost:9411/

The spring.zipkin.baseUrl property tells Spring and Sleuth where to push data to. Also, by default, Spring Cloud Sleuth sets all spans to non-exportable. This means these traces (Trace Id and Span Id) appear in logs but are not exported to another remote store like Zipkin.

In order to export spans to the Zipkin server, we need to set a sampler rate using spring.sleuth.sampler.probability. A value of 100 means all the spans will be sent to the Zipkin server too.

Now, let's restart all applications again and hit the http://localhost:8080/portal-service/fullDetails/12 endpoint again.

Now, on the Zipkin home page at http://localhost:9411/zipkin/, click "Find Traces":

spring-cloud-sleuth-zipkin-find-traces

Upon clicking on a trace, we'll be navigated to its detail page:

spring-cloud-sleuth-zipkin-find-traces-details

Above we can see the request overall took around 16ms and a tree showing the time taken by each service.

Typically to visualize logs for debugging purpose we use the ELK stack. To integrate it with Sleuth we can follow the explanation here.

Conclusion

In this article, we've covered how to use Spring Cloud Sleuth in our existing spring-based microservice application. We saw how it is useful for log tracing of a single request that spans over multiple services. We also integrated it with a Zipkin server to see the latency of each sub-service in the overall request and response.

As always, the code for the examples used in this article can be found on Github.