Debugging slow throughput kSQL query - group by

Hi, i have an existing super slow kSQL query running in production (with throughput ~4 message per second) which aggregate multiple data by a single key, using GROUP BY and REDUCE. The idea of this query is to aggregate multiple employees data, into a single list based on user_id.

The existing kSQL server is well provisioned (8 vCPU, 20 GB ram, 100 GB SSD, kSQL version 0.26.0). and currently see high utilization in CPU but low memory usage and Disk IO usage

What could i done/try to better optimize this query? is there such things as profiling/query plan analysis with kSQL?

kSQL query

CREATE TABLE user_employees_aggregate_data
  WITH (
      FORMAT = 'AVRO',
      KAFKA_TOPIC = 'employees-aggregate-data'
    )
  AS SELECT 
    user_ksuid AS ID,
    AS_VALUE(user_ksuid) AS `_id`, 
    FILTER(REDUCE(
      COLLECT_LIST(
        STRUCT(
          `company_ksuid` := company_ksuid,
          `email` := email,
          `ksuid` := ksuid,
          `name` := name,
          `status` := status,
          `access_types` := access_types
        )
      ),
      MAP('' := STRUCT(
        `company_ksuid` := '',
        `email` := '',
        `ksuid` := '',
        `name` := '',
        `status` := '',
        `access_types` := ARRAY['']
      )),
      (s, v) => MAP_UNION(s, MAP(v->`ksuid` := v))
    ), (k, v) => k != '') AS `employee_data`
  FROM employee
  GROUP BY user_ksuid
  EMIT CHANGES;

This topic was automatically closed after 30 days. New replies are no longer allowed.