Are ksqlDB push queries distributed across cluster?

We have a ksqlDB cluster with 5 nodes and a stream created that reads from a topic:

CREATE OR REPLACE STREAM topic_stream 
WITH (
    KAFKA_TOPIC='kafka_topic',
    VALUE_FORMAT='AVRO'
);

We also have a push query that reads from this ksqlDB stream

SELECT * FROM topic_stream WHERE session_id = '${sessionId}' EMIT CHANGES;

When the push query is started does the work get distributed across all 5 servers?

When we run this query during high traffic we noticed only 1 server has max CPU and the query starts lagging.
How do we parallelize push queries across our cluster? I couldn’t find any documentation on this.

Thank you.

There’s some relevant documentation about this here.

How many partitions does kafka_topic have? This dictates the amount of parallelism you can achieve.

Also, what is the key? I’m not positive about this, but it may be the case that if session_id is the key and you filter like this that ksqlDB / Kafka Streams will be smart about it and won’t try to distribute execution

You might also get a sense of how the query will execute by running EXPLAIN against the query.

Hi thanks for the response.

Our topic has 99 partitions spread across 9 brokers in our kafka cluster.
session_id isn’t the key for the topic (i dont believe we’re using any keys when producing records to the topic)

we set ksql.query.pull.table.scan.enabled to true so we are able to filter on non-key’d fields

However we were able to work around this by creating a persistent query which reads from our kafka topic

CREATE OR REPLACE STREAM persistent_query_stream AS SELECT * FROM topic_stream;

then changed our push query to read from this new persistent query

SELECT * FROM persistent_query_stream WHERE session_id = '${sessionId}' EMIT CHANGES;

according to this confluent article:

To utilize this new scalable codepath, push queries v2 require that a push query select from an existing persistent query

this seemed to solve our lag issues and the CPU usage is more distributed across our ksqlDB cluster, instead of 1 server working on the push query

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.