Would anyone know if there is anyway to guarantee ordering of messages across partition? We need to implement “atleast one” with message ordering but everything I read points to ordering guarantee within partition but not across…
Also all the articles are relatively old so I don’t know if there are any changes with most recent versions that provides this capabilities OR if there is any good design pattern that can be used
We are getting messages/events from mainframe and they are coming at us from batch jobs… so drop of 1 million by few milliseconds. When that happens across multiple threaded jobs we are struggling to maintain order in which they are placed.
One obvious option is to add timestamp on the source but would prefer to avoid that if that’s possible.
Can you expand on why you need strict ordering across partitions? What’s the process/requirement that this is supporting ?
Usually strict ordering is needed to support particular business logic which can be enforced by setting the key on messages correctly so that they are all written to the same partition.
Robin meant that if you use keys on your records, you will be able to control in which partitions your records will end up. Though I’m afraid, this is only partially true.
Kafka distributes records across partitions based on the criteria specified via an entity called partitioner. Every producer has one. The default one is called DefaultPartitioner, and it uses a simple algorithm that prioritizes partition affinity over distribution. When the affinity cache is empty — which is usually the case for the first time a partition will be used — the partitioner relies on logic to route the record to a specific partition based on the key specified in the record. Under the same logic, if you don’t set a key onto the record, the partitioner will pick a random partition to use.
Pragmatically speaking, you have to configure your mainframe or the middleware that is pulling from the mainframe to send data to Kafka to use these keys onto the records so ordering within a partition can be obtained.
To better illustrate what I am trying to say, I recommend you read this blog that I wrote a while ago that explains how partitioning/assignment works at a record level in Kafka. It uses a buckets pattern to implement a reasonable level of ordering across partitions.
Thank you for the detailed explanation. I am still reading through the blog post but it is beginning to make sense now.
@rmoff Keys on the message is same. For example, Customer 123 spent 34$ at amazon, a minute later spent 40$ at walmart and a second later has a credit of 50$ from Target.
Then while we process messages between walmart and Target, he has reached credit limit but he should get payment declined at Kohls. But if we process credit back from Target, it should go through
Now as crazy as it sounds, we don’t have timestamp on these events and key is customer id. My question/concern is, I want to keep order of the events identical when we ship events to another consumer.
You are right that since key is identical, they should end up going to same partition but i am not sure if there is a guarantee for the same. I also feel like this is probably more common issue and there should be a design pattern to handle this which i am not aware of.
Also our consumer is documentDb and we apply upsert due to the nature of middleware that’s producing events. If we don’t process events in the exact same order as they are produced, we run the risk of data consistency where middleware shows current record having Kohls but on the kafka consumer if ordering isn’t guaranteed, we may show Target or Walmart as latest transaction.