Hi, i have an existing super slow kSQL query running in production (with throughput ~4 message per second) which aggregate multiple data by a single key, using
GROUP BY and
REDUCE. The idea of this query is to aggregate multiple
employees data, into a single list based on
The existing kSQL server is well provisioned (8 vCPU, 20 GB ram, 100 GB SSD, kSQL version 0.26.0). and currently see high utilization in CPU but low memory usage and Disk IO usage
What could i done/try to better optimize this query? is there such things as profiling/query plan analysis with kSQL?
CREATE TABLE user_employees_aggregate_data WITH ( FORMAT = 'AVRO', KAFKA_TOPIC = 'employees-aggregate-data' ) AS SELECT user_ksuid AS ID, AS_VALUE(user_ksuid) AS `_id`, FILTER(REDUCE( COLLECT_LIST( STRUCT( `company_ksuid` := company_ksuid, `email` := email, `ksuid` := ksuid, `name` := name, `status` := status, `access_types` := access_types ) ), MAP('' := STRUCT( `company_ksuid` := '', `email` := '', `ksuid` := '', `name` := '', `status` := '', `access_types` := ARRAY[''] )), (s, v) => MAP_UNION(s, MAP(v->`ksuid` := v)) ), (k, v) => k != '') AS `employee_data` FROM employee GROUP BY user_ksuid EMIT CHANGES;