Kafka Topic and Tombstone messages

Hello

I’ve read that tombstone messages with compaction lead to both the last keyed message and tombstone message both being deleted from a topic, removing ALL messages with the key

I was wondering if there is a way to achieve delete without compaction?

So a topic would have multiple events/message per key and without the process of compaction, a tombstone message would delete ALL messages with that tombstone key? (ie all other non-compacted messages with different keys would exist in full still)

cheers

n99

What’s the use case you’re trying to solve here? AFAIK what you describe isn’t possible, but if you can give some more context it might be easier for people to suggest a solution.

Hi

Many thanks for your time in reading my question.

I was thinking about full message replay for new consumers so they get a full set of messages.

Log compaction sounds like a great way to ensure you don’t run out of resources as you have the latest state.

Adding in tombstone messages sounds like a way to further save on resources and remove deleted data from the topic.

However it might be useful to keep all non-delete messages as you might deal with creates and updates differently. So then you would not enable Log compaction.

However you might want to still have the tombstone “compaction” behaviour to save on resources and remove deleted data from the topic?

feels like a tombstone-only-compaction option is what I’m after…?

Cheers

Nomit

Hi

Thought I’d say more about the use case as I’ve been watching the confluent 101 courses at What is Apache Kafka®? Online Course for Beginners and they are fab, and I would love an excuse to use this exciting tech!

We are an education institution and have a rolling student population that ages out after 3 years typically. We also have external suppliers that require a current and up to date list of these users. Maybe this could be a kafka topic populated by CDC with debezium?

I would like to use kafka in a situation where we have a new external supplier and I can just point them (via other tech cogs and wheels) to a topic.

f I point the new consumer to an infinite storage topic with no log compaction then that topic will provide a stream of creates and updates and deletes for students long left. We don’t want /need to send this type of data out.

If I use tombstone events and log compaction then, while the topic will hold a more up to date picture I loose some granularity for events for current students.

I was wondering if there is a half way point between the 2? Infinite retention with only tombstone activated compaction?

There might be better patterns where I use one topic for initial population and then another for updates?

But I thought I’d check I’m not missing something.

Cheers for any advice.

n99

Hi there,

If I’m hearing this right, are you after a way to keep all updates to a key but still be able to delete/cleanup for a key sometimes?

So since automatic compaction is an infrastructure side optimization anyway, you sometimes have the active topic contain but the key-value and key-null combinations.

I’d like to offer a way out that gets around this problem. Every time you do an update for a key, you use a surrogate key instead of just the key itself (like key-seqnumber). That way, each key update is unique. Now, how to do a delete now? Well, if you have a KStream application with a state store that had an index with all known keys (and their sequence numbers), you could do a lookup and emit a number of tombstones for each of they key-seqno combinations. The design of the state store is an interesting exercise as you could do this with an item per key in the state store, where the value is a collection of sequence numbers (maybe some sort of encoded bitset for efficiency since they’re just monotonically increasing numbers). When you want to do a delete, you signal somehow to the kstream instance that you want to delete (you’d have to publish a non-tombstone message for that key to a signalling topic), and that kstream instance would do a key lookup of the sequences and emit the right list of tombstones to clear up all those updates.

Does this give you some idea? You can riff off this and modify it to your needs I’m sure. The brute force way of doing it would be a commandline tool which does a full scan of a topic for a key, determine the full list of update entries, and emit the requisite number of tombstones. This is an expensive operation but if you rarely do deletes, it might be a simpler option.

1 Like

Hi
Yes that’s right… Keep all events for a key but have the option to delete all events for a key with a tombstone for that key.
Thanks for that - sounds like a good approach if it’s not supported out of the boc.
Cheers
n99

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.