Hi codonnell, sorry for the delay on your follow ups:
If we use compaction, we would need to ensure the latest event on the topic for any given entity has its full state, right?
Yes, correct.
take a snapshot once a week if retention is limited to 2 weeks
This is Lambda architecture, and it’s one that I am personally not that fond of. The main difficulty is that you now need to maintain two sources for data: the snapshot and the event stream. You also need to ensure that the two sources are consistent with one another. For example, a thought experiment: will two separate instances of precisely the same application end up in the same exact state, if one is consuming from the event stream since t=0, and the other bootstrapped from the snapshot from t=now? The answer should be yes, but in practice I find that having two code paths (one to maintain the stream, one to maintain taking + exposing the snapshot) don’t always end up with the same results, and chasing down the inconsistencies can be painful.
A further complication arises when you’re trying to mix multiple snapshots + streams together. Say you have two sources upstream, each with a snapshot + stream component. If you’re doing any sort of time-based or ordering sensitive calculations, your code may end up looking quite complicated, as you need to manage the “seams” between the snapshots and the streams. And this isn’t just the seam between the data in snapshotA and streamA, but between snapshotA and streamB, snapshotA and shapshotB, streamB and snapshotA, etc.
It really depends on what you’re trying to do with the data in the event stream. If you only care about transferring state and not driving any business logic, then snapshot+stream can work fine. But usually we’re using a stream because we want to process business logic, and you can get into some hairy situations when using Lambda style architectures.
My main concern is that most engineers would need to learn and operate these tools to create a data product for their service, increasing the barrier to entry. I’m also concerned that it would be frustrating to reconstruct a business event via kafka streams that was readily available at the application layer.
You may want to look into the outbox pattern and see if you can get the application developers on board with denormalizing data at write time to the outbox. This can reduce some of the overhead of recomposing data outside, while isolating the internal model of the relational database.
As for running Kafka Connect as a Service, yeah, I hear you. I think your comment does illustrate the difficult-to-quantify accumulation of barriers. I’ve been partially responsible for running and maintaining Kafka and Kafka Connect in the past - it’s one of the reasons I (biasedly, of course - incoming sales pitch) think that SaaS solutions like Confluent Cloud offer a lot of value. It’s can be hard to quantify the overhead spent fixing connectors, scaling the cluster, etc etc etc, but when that burden is lifted it really opens up what can be done with event streams and provides a lot of operational mobility.
Thank you again for your response!
You’re welcome. Again, sorry for the delay on getting back to you. I need to double check my forum notification settings.