Hi Team,
I use the kafka java client library to produce and consume. I randomly see spikes of a high record queue time even though my request-latency is quite low in the milliseconds. This happens with or without compression enabled.
Producer & Topic settings:
- batch.size=262144
- linger.ms=200
- acks=all
- min.insync.replicas=2
- replication.factor=3
- partitions=8
Setup:
A single producer writes to multiple topics and the topics use a unique id to assign the record to a partition.
Most of the time the application metrics show smooth running but I get spikes of high record-queue-time-max in the 10’s of seconds. I have looked at URP’s flush rates and other client side metrics.
Could you suggest and guide into what other metrics i could look at to make educated guesses about tweaking the settings?
EDIT: I also now tried with max.block.ms=2000 just to rule out metadata fetch blocking.