I’m doing some tests on compacted topics and I noticed an unexpected behavior by the consumer. I created a compacted topic and a producer that generates hundreds of messages per second using 5 keys with random values.
After few seconds of message generation if I inspects the topic content with Offset Explorer I correctly see the compaction working (only the last values generated for each key are kept).
The problem appears with the following workflow:
- The producer generates a bulk of messages on the compacted topic.
- The consumer connects to the cluster, it subscribes to the compacted topic and after a first poll action it correctly receives only the most recent values for each key.
- The consumer remains connected and subscribed but it waits for a user interaction before performing the next poll action.
- The producer generates an other bulk of messages.
- Using Offset Explorer I see that even with this second message generation the compaction has been successfully executed.
- The consumer performs a second poll and now it receives all the messages generated in the last bulk, not only those kept by the compaction process.
I tried to write different consumer and producer clients (in C# and C++) and I played with configuration parameters for both the consumer and the compacted topic but it seems that the only way for a consumer to receive only the compacted messages is to reconnect to the cluster or to change the group id. Otherwise the connected consumer receives always all the messages produced.
Is this the expected behavior? Because I cannot figure out what I’m doing wrong.