Is Ktable a posible solution?

hansj.melby · 16 May 2022 09:05

I have a usecase i am not sure Ktable is a perfect fit for or not.
In our organization we have several enterprice processes gathering information.
each process is independent of each other but now there is a need to gather a piace of information from each process and have them accessable for “everybody”; almost like a cache.
i know that the main entry point for each process is a kafka topic where the information we want ro share is located.
So the question: I am considering to have a kafka consumer (or i practice a stream application) listenting to all “entry topics” (several topics) and fetch the information needed and then publish that information to a ktable ?
the problem : Each data in each process is different, so i need to “merge” all the streams into one ktable . The ID (social security number) will allways be the same, but the value must be the key/valye for each of the relevant enterprice process data)

key: social security number
Value :
{atribute1: value,
 atribut2:value,
....etc}

ätribute1 is from enterprice topic1, atribute2 from enterprice topic2 etc
the main thing is that if atribute2 is NOT present , it shoult NOT! remove the atribute if it is already present)

in short this ktable should act as a cache for relevant data in a spesific period of time.
i plan to look up the value from Ktabke for each event processing the enterprice event and then publish updates/nothing wether or not relevant data is received.

the data from the krable will be used to add data (joins) to other kafka streams in the organization)

It might be that a database is better fit here, so feel free to tell me that Ktable is NOT the best fit for this problem

abellemare · 17 May 2022 15:00

Hi hansj.melby

the problem : Each data in each process is different, so i need to “merge” all the streams into one ktable . The ID (social security number) will allways be the same, but the value must be the key/valye for each of the relevant enterprice process data)

You could use the merge function on the various KStreams to create a single stream, then map the stream to standardize the contents into the format you require.
Check out How to merge many streams into one stream using Kafka Streams for more details on merging. developer.confluent.io also has a number of other recipes and examples that may help you get a better understanding of what Kafka Streams (or ksqlDB) can do for you.

in short this ktable should act as a cache for relevant data in a spesific period of time.
i plan to look up the value from Ktabke for each event processing the enterprice event and then publish updates/nothing wether or not relevant data is received.

If you only want a specific period of time then you may want to look into windowing, as they allow you to specify a size and expiry time. The default KTable retains records indefinitely.

Search for “windows” on this page to see everything available: Apache Kafka® Tutorials and Recipes by Confluent

eg: Sliding Windows: How to create sliding windows using Kafka Streams

the data from the krable will be used to add data (joins) to other kafka streams in the organization)

KTables are only usable within the Kafka Streams application that defined them - you wouldn’t be able to use a single KTable in one application to join on another KTable in a different application.

What you could do, however, is create your KTable and then use kTableName.toStream().to( <output topic name> ) to write the table out as a stream. Then, your other Kafka Streams applications can simply declare their own KTable within their own application runtime, and join the data in as they see fit.

It might be that a database is better fit here, so feel free to tell me that Ktable is NOT the best fit for this problem

An external database will give you the ability to query and join the data from other runtimes. It won’t, however, deliver you any event-driven functionality to your other applications, so it depends on your specific use-cases.

eg: If you want to send an email to each SSN as events occur, Kafka Streams with KTable is a very good choice.

Topic		Replies	Views
KafkaStreams + Global Ktable with Same topic Kafka Streams	1	187	29 September 2024
KTable - how to publish it to Kafka? Kafka Streams	1	3826	22 May 2021
KStream vs KTable vs GlobalKTable Kafka Streams	1	4269	8 February 2022
KStream - KTable join: to enriched a stream with Ktable data Kafka Streams	1	3474	3 June 2021
ksqlDB: Stream to join with Table Stream Processing	0	30	15 September 2024

Is Ktable a posible solution?

Related topics