Skip to content

Message Ordering

Gaurav Ashok edited this page May 14, 2024 · 4 revisions

Message Ordering / Grouping

The messages can be produced and delivered by Varadhi in an unordered or ordered (aka grouped) manner.

In Varadhi, we often refer to ordering as grouping because, to provide message ordering, Varadhi requires that messages are associated with a GroupId. Such messages are said to be "grouped" by their GroupId, hence the usage of the term "grouping" or "grouped".

Relevant Headers

Each message received by Varadhi must have a MessageID (header: X_MESSAGE_ID) and an optional GroupID (header: X_GROUP_ID) associated with it. The GroupId header is mandatory if message ordering is required end-to-end.

Note that the above header names are just examples. Actual names can be configured as part of the deployment, allowing maintainers to choose their own convention.

Unordered vs Ordered

When the grouped is set to false (the default value) for a topic and its subscriptions, Varadhi will deliver messages out of order.

If the grouped config is true for a topic and its subscriptions, then Varadhi will, in essence, make sure to deliver messages with the same GroupId in the same order as they were received from the message producer, even in the case of delivery / api failures.

Effect of enabling grouped on a topic

When grouped is enabled for a topic, 2 things happen:

  • Varadhi will start making an assertion that each message must now contain a GroupID header (= X_GROUP_ID). Without this header, Varadhi will return a BadRequest (400) error.
  • Varadhi will use the GroupID to hash it to a specific partition within the topic so that messages with the same GroupId land in the same partition, helping with the ordering provided by the underlying messaging stack.

Effect of enabling grouped on subscription

When grouped is enabled for a subscription, various things happen:

Behaviour during successful message deliveries

The consumer for the subscription, managed by Varadhi and responsible for delivering the messages, will poll messages from the topic's partitions and attempt to deliver messages in the same order as they are present in the partition. It will also try to parallelize the delivery of messages that belong to different GroupIds. Note that the ordering across different GroupIds is not maintained even if they come from the same partition.

For example, if the partition had messages with IDs in the format (GroupId, MessageId), such as g1.m1, g2.m2, g1.m3, g3.m4, g1.m5, g2.m6, g4.m7, g3.m8, and so on, where the read pointer is at g1.m1.

Then, Varadhi can deliver them as follows (assuming consumer count / parallelism = 2):

Note that, the messages from group "g2" end up being delivered concurrently with the messages from group "g1". Also note that g4.m7 got delivered before g1.m5 even though g4.m7 was produced after it.

The point to be taken is that in Varadhi, the granularity at which the ordering of messages is enforced is GroupId and not a partition. This may sound different from Apache Kafka if you are familiar with its consumption semantics, and it may sound similar to key-shared subscription from Apache Pulsar.

Behaviour during message delivery failures

TODO: add link to info on retries and dead-lettering.

The gist is that if Varadhi encounters a non-successful (non-2xx in case of HTTP) code during message delivery, then that message will be treated as a "failed" message and moved to one of the RQ / DLQ. Additionally, the GroupID for this message will be marked as "failed" in Varadhi's GroupID State which is maintained per subscription.

This "failed" state signifies that some previous message of this group has not yet been delivered and is somewhere in the RQ / DLQ. In such a case, if a new message of the same GroupID is encountered, it will be placed alongside the last known failed message, and its delivery will not even be attempted for the time being. This new message is only attempted for delivery if the message before it was successfully delivered while consuming from RQ / DLQ.

Example: Consider the previous example, where g1.m1 couldn't be delivered and varadhi got a error code (lets say a 5xx in case of HTTP). In such a case this message will be put in RQ for delayed retry (assuming 1 time step delivery below).

green == delivered successful
red == delivery failed with non 2xx

strikethrough == message not delivered and moved to a different topic for future delivery.