Hey all,
I have been using Kafka Streams for a problem I am working on and I started to think about implementing the same functionality with ksqldb to leverage the existing aggregations and functions as well as the flexibility of writing ad hoc queries rather than running new instances but I am having a hard time wrapping my head around it, maybe somebody here can help me with this. If all works out i am planning to write my own UDF and UDAFs to extend the available vocabulary for my processing needs.
I have a kafka topic where I am receiving events with polymorpic schemas that have the same key which have a specific ordering that I need to keep and leverage.
My events are roughly like this:
{"eventType": "A", "body": {/* A schema */}}
{"eventType": "B", "body": {/* B schema */}}
{"eventType": "C", "body": {/* C schema */}}
And i am receiving events in one of the following orderings where time flows from left to right (per key):
- A → B → C
- A → C
- B → C
repeating in time.
My hard requirement is that, I want to have tables from A and B, and a stream from C, where i join events from A and B tables. I need to make sure that the tables are updated BEFORE my join runs.
In other words, I would like to be able to write stream-table n-way joins which are triggered with each event C, with the hard requirement that the tables are updated before.
If i understand correctly for different CREATE TABLE
statements, ksqldb uses different consumers and although kafka ensures ordering within the same partition, the consumer lag between the consumers of the tables and the stream might not ensure this guarantee.
In the Kafka Streams implementation, I am using the .branch()
operator on the same KStream
to create my KTable
s where I am fairly positive that works the way I expected.
Am I approaching this wrong? I would appreciate any help.
Thanks