Kafka Metric detailed info/ documentation

There is a producer metric ( request-latency-avg ), this is not updated in producer metrics documentation in confluent.Also can i please get the detail of this metric that records the time spent from which point to which point exactly?

kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}
Total time in ms to serve the specified request | LinkThe explanation for this metric is also not clear, can we have a more precise explanation of this and also wanted to ask if can we have this at the topic level?

hey @tushargoyal22

afaik the request-latency-avg
is the time between a producer send until the producer receives a response from the broker.

fwiw: there is a KIP in discussion regarding client metrics
https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability

best,
michael

what do you mean by producer send here? does it include the time it resides in the producer buffer? or it is from the point it is removed from the producer buffer.

it could be helpful if we can confidently conclude this may be by referencing Kafka code.

@mmuehlbeyer @Bruno @mjsax

I mean does this include the time from application calling send() and the record is sent to the broker?

producer buffer is not the only relevant part here

ack settings for the topic/broker also come into the game
I assume it’s relevant that the data is written and “not only” sent to the broker?

best,
michael

yes that I understood, the question is whether it includes the time record resides in the producer buffer.

I want to ask this because it can help me conclude whether the broker is slow or producer!

from my understanding yes
never tested in detail, but if you play around with
buffer.memoryand keep everything else I guess you should be able to check/measure this.

best,
michael

also would recommend the following hands-on

as well as

best,
michael

The figures in the tutorial seem to be completely misleading

How can the request-latency-avg be less that record-queue-time-avg

From where can we get the best understanding of what it means?

record-queue-time-avg:
This is the average amount of time your record batches spend in the send buffer.the batch prior to being flushed
→ how long to fill a batch

request-latency-avg:
double checked once again and it’s a measure of the amount of time between when KafkaProducer.send() was called until the producer receives a response from the broker.
→ so “after” the buffering

so we can say request-latency-avg is the time from the point it got flushed() to the time producer receives the response?

and record-queue-time-avg is time from the point KafkaProducer.send() was called to the time when the batch got flushed()

Am I correct?

from my understanding it’s the time which is taken from clients nic to the broker’s nic card
plus the time the brokers need to handle the message (replication and so on)

Hey there, I think we’ve pretty much nailed it down here, but I wanted to make sure we have this clarified once and for all.

request-latency-avg is a little misleading as it’s not clear exactly what request is referring to. It does not correspond to each call to producer.send() as might be expected. Rather, as you’ve reasoned through here, it refers to the internal request that the producer makes to send records to the broker. record-queue-time-avg is the time the records sit in the producer queue/buffer between the call to producer.send() and the aforementioned request to the broker.

To summarize: request-latency-avg and record-queue-time-avg refer to two completely different time periods in the record send lifecycle; there’s no overlap between the two.

  1. When producer.send() is called, records will first sit in the producer buffer/queue until batch.size or linger.ms is reached or we call producer.flush(). This contributes to record-queue-time-avg. (And this is also why the record-queue-time-avg is so close to linger.ms in the example you printed, @tushargoyal22.)

  2. After this, the producer makes a request to the broker, at which point we start the clock for request-latency-avg.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.