Hi all,
I have been trying to understand behavior of Kafka under the following scenario:
Let us assume I have a topic T, that has P partitions. Let us assume I create two consumer groups for this topic - G1 and G2 via two different consumers C1 and C2. The consumer groups are not pre-defined, and let us say the consumer is creating a consumer group.
If my understanding is correct based on the book, then both consumers C1 and C2 will receive all the messages posted to the topic independently of each other i.e. if messages
are placed as follows:
G1 - P0 - m_01, m_02, m_03 …
G1 - P1 - m_11, m_12, m_12 …
…
G1 - Pp - m_p1, m_p2, m_p3…
G2 - P0 - m_01, m_02, m_03 …
G2 - P1 - m_11, m_12, m_12 …
…
G2 - Pp - m_p1, m_p2, m_p3…
If client C1 works with G1 and C2 works with G2, both of them would get all the messages, and in order of their offset in respective partition and offsets in both consumer groups will be independent of each other i.e. current offset in G1-P0 may be 100 while current offset in G2-P0 may be 27. However the end-offset would be same in both groups assuming that topic has same producers.
Is this understanding correct ?
If yes, then if my group G2 starts 1 hour after G1 (and topic creation) and if let us say message retention is set to let us say 30 minutes, and auto.offset.reset is set to earliest, would this imply that all messages, from time G2 starts - 30 mins, including the processed ones that have not yet been deleted (or compacted) in topic (across all partitions) would be delivered to G2? But what happens when auto.offset.reset is set to latest? Or does it not matter at all in this case and G2 will get all messages since start of topic creation ?
I hope my question makes sense. If not, I’ll try to clarify.
Regards,
Shishir