Author: Aditya Anasane
Hi readers, this is not a tutorial but just an informative blog on a small topic of Correlation ID and its importance in distributed systems and Microservice based architectures.
What is it?
A Correlation ID can be defined as an ‘identifier value attached to messages and request headers which allows referencing a particular transaction or event’.
(While debugging Mule applications we can locate the correlationId field as shown in the below screenshot)
Why do we need correlationId?
In monolithic architectures transactions used to happen step by step and it used to be easy to track the flow of message and events. When things are synchronous then it’s easy to track the errors if they occur.
As we all know, today our enterprise systems are adopting loosely coupled Microservices based architectures and the request-response mechanisms are getting highly asynchronous as well.
- So for client applications talking to numerous microservices, flow of events for placement of order, placing a claim etc. is distributed across many logs.
How can we troubleshoot and also tell the flow of that request through all these distributed services?
- How does a requester that has received a reply know which request this is the reply for?
- In a travel planning application, we pass a single request and this message is consumed by hotel, airline and car-rental microservices. These separate services respond asynchronously.
So in such scenarios where multiple responses and requests are flowing across widely distributed microservices, how do we keep track of one single transaction?
The answer to above questions is Correlation ID.
How does it help?
The idea is simple. When a user-facing service receives a request it’ll create a correlation ID, and:
- pass it along in the HTTP header to every other service,include it in every log message
- The correlation ID can then be used to quickly find all the relevant log messages for this request
The diagrams shown below help to understand how we can “tag” a message to identify its context. (So basically correlationId “marks” or “tags” messages to make them unique to identify later on…)
When a correlation ID has been set, Mule will preserve it across transport. This means that Mule will propagate the correlation ID alongside the message payload with transports that support metadata (like HTTP or JMS)
Thus each component keeps an audit log with the correlationId. When an error occurs in a particular component, it is easy to group related transactions and detect the root cause of the problem.
There is no standard format for the correlation identifier. It can be a UUID or any meaningful unique identifier for the application domain (Mule returns this value in a header named: X-MULE_CORRELATION_ID)
CorrelationId is the glue that helps to bind a transaction.
In Asynchronous systems, the logs are all mixed up. So we can easily track logs pertaining to a particular transaction flow if we had set the correlation ID in the event header and propagated it all along the downstream flows.
Thus when you can group a transaction’s events under a unifying value, the Correlation ID, you can spend your time fixing the problem rather than finding the problem.
(The info has been referenced from and put together from the following blogs and videos)