Hi, i have an existing super slow kSQL query running in production (with throughput ~4 message per second) which aggregate multiple data by a single key, using GROUP BY
and REDUCE
. The idea of this query is to aggregate multiple employees
data, into a single list based on user_id
.
The existing kSQL server is well provisioned (8 vCPU, 20 GB ram, 100 GB SSD, kSQL version 0.26.0). and currently see high utilization in CPU but low memory usage and Disk IO usage
What could i done/try to better optimize this query? is there such things as profiling/query plan analysis with kSQL?
kSQL query
CREATE TABLE user_employees_aggregate_data
WITH (
FORMAT = 'AVRO',
KAFKA_TOPIC = 'employees-aggregate-data'
)
AS SELECT
user_ksuid AS ID,
AS_VALUE(user_ksuid) AS `_id`,
FILTER(REDUCE(
COLLECT_LIST(
STRUCT(
`company_ksuid` := company_ksuid,
`email` := email,
`ksuid` := ksuid,
`name` := name,
`status` := status,
`access_types` := access_types
)
),
MAP('' := STRUCT(
`company_ksuid` := '',
`email` := '',
`ksuid` := '',
`name` := '',
`status` := '',
`access_types` := ARRAY['']
)),
(s, v) => MAP_UNION(s, MAP(v->`ksuid` := v))
), (k, v) => k != '') AS `employee_data`
FROM employee
GROUP BY user_ksuid
EMIT CHANGES;