This is originally from a Slack thread. Copied here to make it available permanently.
You can join the community Slack here
Hi guys, I have a question. I have a kafka cluster with a ktable which gets updated as a results of other tables/streams joined together. This table is then converted to a stream whose elements are pushed into a topic which is growing massively in terms of number of elements/size (maybe I should talk in terms of partitions here if you look at the internals). The stream of elements are keyed and I am really only interested in keeping its last value. From what I can see I can control the retention of messages in terms of bytes and millisecond. I was wondering if there’s a way I can only keep the last message of each keyed entity. I know it’s a log, it’s immutable etc… but I need to understand how I can approach this problem. Thanks guys
that is what compacted topics are for; and kafka-streams and tables leverages this.
However, just because the topic is compacted doesn’t mean there are duplicates (just means that older values can be deleted).
This is one of the reasons Kafka-Streams will fully read in a topic into rockdb on a restart to make sure it is current value.
So by taking a table and converting it into a stream this is where it is going from a compacted topic (provided the streams application configured the topic) to a non compacted topic.
typically if I need move data from a topic to a stream, I have the topic backing that stream to have a low retention time as the system of record for data in that stream is maintained in the ktable (compacted topic)
so the pieces are there to get you what you need (I believe) but the nuances will take a little more understanding of your business needs. 1) maybe you don’t need to take the table and converted to a stream or 2) maybe that stream could be windowed 3) or you do need a stream and you need to lower its retention hand have means to republish from the source ktable as needed.
Thanks @nbuesing it’s a very good answer