Spikes in record-queue-time-max on kafka producer

Hi Team,

I use the kafka java client library to produce and consume. I randomly see spikes of a high record queue time even though my request-latency is quite low in the milliseconds. This happens with or without compression enabled.

Producer & Topic settings:

  • batch.size=262144
  • linger.ms=200
  • acks=all
  • min.insync.replicas=2
  • replication.factor=3
  • partitions=8

A single producer writes to multiple topics and the topics use a unique id to assign the record to a partition.

Most of the time the application metrics show smooth running but I get spikes of high record-queue-time-max in the 10’s of seconds. I have looked at URP’s flush rates and other client side metrics.

Could you suggest and guide into what other metrics i could look at to make educated guesses about tweaking the settings?

EDIT: I also now tried with max.block.ms=2000 just to rule out metadata fetch blocking.