There is a producer metric ( request-latency-avg ), this is not updated in producer metrics documentation in confluent.Also can i please get the detail of this metric that records the time spent from which point to which point exactly?
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}
Total time in ms to serve the specified request | LinkThe explanation for this metric is also not clear, can we have a more precise explanation of this and also wanted to ask if can we have this at the topic level?
what do you mean by producer send here? does it include the time it resides in the producer buffer? or it is from the point it is removed from the producer buffer.
it could be helpful if we can confidently conclude this may be by referencing Kafka code.
from my understanding yes
never tested in detail, but if you play around with buffer.memoryand keep everything else I guess you should be able to check/measure this.
record-queue-time-avg:
This is the average amount of time your record batches spend in the send buffer.the batch prior to being flushed
→ how long to fill a batch
request-latency-avg:
double checked once again and it’s a measure of the amount of time between when KafkaProducer.send() was called until the producer receives a response from the broker.
→ so “after” the buffering
from my understanding it’s the time which is taken from clients nic to the broker’s nic card
plus the time the brokers need to handle the message (replication and so on)
Hey there, I think we’ve pretty much nailed it down here, but I wanted to make sure we have this clarified once and for all.
request-latency-avg is a little misleading as it’s not clear exactly what request is referring to. It does not correspond to each call to producer.send() as might be expected. Rather, as you’ve reasoned through here, it refers to the internal request that the producer makes to send records to the broker. record-queue-time-avg is the time the records sit in the producer queue/buffer between the call to producer.send() and the aforementioned request to the broker.
To summarize: request-latency-avg and record-queue-time-avg refer to two completely different time periods in the record send lifecycle; there’s no overlap between the two.
When producer.send() is called, records will first sit in the producer buffer/queue until batch.size or linger.ms is reached or we call producer.flush(). This contributes to record-queue-time-avg. (And this is also why the record-queue-time-avg is so close to linger.ms in the example you printed, @tushargoyal22.)
After this, the producer makes a request to the broker, at which point we start the clock for request-latency-avg.