We have use-case where large numbers of events are streaming on kafka topic almost 100k messages per day. On the consumer side We need to apply some predefined set of rules on the incoming event stream and pass event to downstream only when rule evaluation succeeded. No of rules could be 10k for avg case. We wanted to avoid doing database lookup as it can be expensive hence we are exploring KSQLDB as it fits our usecase. In our case this rules directly fits into the where clause of ANSI SQL Query. Hence we are planning to model it in following manner.
- Firstly We will create standard ksqldb stream for incoming events named as eventStream .
- Secondly We will be creating streams on the fly for each rule where corresponding condition of rule would be concat with the where clause of Select query
CREATE STREAM ruleStream_1
WITH (kafka_topic='xyz',
value_format='json') AS
SELECT 1 as column1, array(1,2,3) as column2, *
FROM eventStream
where (column3=123 and column4='testValue');
CREATE STREAM ruleStream_2
WITH (kafka_topic='xyz',
value_format='json') AS
SELECT 1 as column1, array(1,2,3) as column2, *
FROM eventStream
where (column3=123 and column4='testValue');
....
....
CREATE STREAM ruleStream_n ```
We are thinking to deploy ksqldb Server Cluster in interactive mode and will use java client to create,drop streams on the fly whenever corresponding changes are done in the rule definition by user.
We are planning to do performance benchmarking on that, but before that we wanted to validate whether Is it advisable to create 10k streams in KSQL DB and Can it cause performance issues ?