Debugging slow throughput kSQL query - group by

calvin-kargo · 24 August 2023 09:08

Hi, i have an existing super slow kSQL query running in production (with throughput ~4 message per second) which aggregate multiple data by a single key, using GROUP BY and REDUCE. The idea of this query is to aggregate multiple employees data, into a single list based on user_id.

The existing kSQL server is well provisioned (8 vCPU, 20 GB ram, 100 GB SSD, kSQL version 0.26.0). and currently see high utilization in CPU but low memory usage and Disk IO usage

What could i done/try to better optimize this query? is there such things as profiling/query plan analysis with kSQL?

kSQL query

CREATE TABLE user_employees_aggregate_data
  WITH (
      FORMAT = 'AVRO',
      KAFKA_TOPIC = 'employees-aggregate-data'
    )
  AS SELECT 
    user_ksuid AS ID,
    AS_VALUE(user_ksuid) AS `_id`, 
    FILTER(REDUCE(
      COLLECT_LIST(
        STRUCT(
          `company_ksuid` := company_ksuid,
          `email` := email,
          `ksuid` := ksuid,
          `name` := name,
          `status` := status,
          `access_types` := access_types
        )
      ),
      MAP('' := STRUCT(
        `company_ksuid` := '',
        `email` := '',
        `ksuid` := '',
        `name` := '',
        `status` := '',
        `access_types` := ARRAY['']
      )),
      (s, v) => MAP_UNION(s, MAP(v->`ksuid` := v))
    ), (k, v) => k != '') AS `employee_data`
  FROM employee
  GROUP BY user_ksuid
  EMIT CHANGES;

system · 23 September 2023 09:08

This topic was automatically closed after 30 days. New replies are no longer allowed.

Topic		Replies	Views
kSQL Data Streaming - grouping a stream's data on timestamp as string ksqlDB	2	4326	10 May 2021
Cache/Query latest value of each key in a topic? ksqlDB	2	3697	2 July 2021
Batch processing in kstreams Kafka Streams	9	5861	19 September 2021
Simple pull query takes too long time (2-3 sec) ksqlDB	1	3258	2 November 2022
KGroupedStream / Reduce: Dramatic Performance Degradation Over Time Stream Processing	10	33	4 September 2024

Debugging slow throughput kSQL query - group by

Related topics