We are building a micro service architecture system, with mainly asynchronous event based communication between services. We have high throughput requirements with guaranteed message delivery, exactly once in some areas and at least once in others, so Kafka suits us well.
I am looking at how we can best build our services. I give my current conclusions and some questions below - hopefully it makes sense, but I may have misunderstood some aspects.
One option for building services is that we build them state based, using a database to hold state, using CDC & Kafka Connect to harvest events. This means where we want exactly once we need to store the processor topic offsets in the database. This also leaves some challenges around high availability and disaster recovery. We need replicated database as well as topics, as we have two sources of truth overall.
We prefer the idea of being fully event based - which seems to mean event sourcing. This means we do not have the problem of having two sources of truth - we just have Kafka with the events. I think we can use transactions and exactly once where we need to guarantee exactly once processing.
However AFAICS when doing this we are not strictly going to be able to event source, in that we need all the aggregate state used in the state change logic to be in persistent state stores with compacted change log topics (i.e. not events) and then publishing events to other event topics. Transactions then allow us to guarantee that an aggregate has incorporated the latest event before reading and processing the next one - trying to use a KTable over the published events does not I think enable that to be the case.
I am hoping to confirm whether I am correct in thinking this?
Thus we will be fully event based but not event sourced. This is OK, although less neat as there is a slight disjoint between the internal and external mechanisms.