Creating large number of kstreams on the fly

vpatni · 11 November 2021 12:43

We have use-case where large numbers of events are streaming on kafka topic almost 100k messages per day. On the consumer side We need to apply some predefined set of rules on the incoming event stream and pass event to downstream only when rule evaluation succeeded. No of rules could be 10k for avg case. We wanted to avoid doing database lookup as it can be expensive hence we are exploring KSQLDB as it fits our usecase. In our case this rules directly fits into the where clause of ANSI SQL Query. Hence we are planning to model it in following manner.

Firstly We will create standard ksqldb stream for incoming events named as eventStream .
Secondly We will be creating streams on the fly for each rule where corresponding condition of rule would be concat with the where clause of Select query

    CREATE  STREAM  ruleStream_1
        WITH (kafka_topic='xyz',
              value_format='json') AS
        SELECT 1 as column1, array(1,2,3) as column2, *
        FROM eventStream
        where (column3=123 and column4='testValue');

   CREATE  STREAM  ruleStream_2
        WITH (kafka_topic='xyz',
              value_format='json') AS
        SELECT 1 as column1, array(1,2,3) as column2, *
        FROM eventStream
        where (column3=123 and column4='testValue');   
    ....   
    ....
    CREATE  STREAM  ruleStream_n    ```

We are thinking to deploy ksqldb Server Cluster in interactive mode and will use java client to create,drop streams on the fly whenever corresponding changes are done in the rule definition by user.

We are planning to do performance benchmarking on that, but before that we wanted to validate whether Is it advisable to create 10k streams in KSQL DB and Can it cause performance issues ?

mjsax · 13 November 2021 03:21

Hard to say, but it does not sound efficient. Your load is actually not very high, as a single stateless query (depending on your hardware) should be able to process 100K messages per second (and you only have 100K per day, it, one per second).

However, the issue is that each query would read the input topic once, ie, if you deploy 10K queries, you read the input data 10,000 times…

It might be more efficient to fall back to Kafka Streams (if it’s ok with you to write a Java application), as it would allow you to read the data only once.

system · 11 December 2021 12:43

This topic was automatically closed after 30 days. New replies are no longer allowed.

Topic		Replies	Views
Event stream from DBs with checks using KSqlDB Architecture and Design	0	3583	16 January 2022
Create a stream with no columns just to count events ksqlDB	6	3665	25 February 2022
🎧 Advanced Stream Processing with ksqlDB ft. Michael Drogalis News and Blogs	0	3452	11 August 2021
Query table from stream ksqlDB	2	2770	12 January 2023
How does ksqlDB work? – Free resources to learn more ksqlDB	1	3480	31 December 2021

Creating large number of kstreams on the fly

Related topics