Kafka Metric detailed info/ documentation

tushargoyal22 · 4 November 2022 07:03

There is a producer metric ( request-latency-avg ), this is not updated in producer metrics documentation in confluent.Also can i please get the detail of this metric that records the time spent from which point to which point exactly?

kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}
Total time in ms to serve the specified request | LinkThe explanation for this metric is also not clear, can we have a more precise explanation of this and also wanted to ask if can we have this at the topic level?

mmuehlbeyer · 4 November 2022 08:39

hey @tushargoyal22

afaik the request-latency-avg
is the time between a producer send until the producer receives a response from the broker.

fwiw: there is a KIP in discussion regarding client metrics
https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability

best,
michael

tushargoyal22 · 4 November 2022 08:42

what do you mean by producer send here? does it include the time it resides in the producer buffer? or it is from the point it is removed from the producer buffer.

it could be helpful if we can confidently conclude this may be by referencing Kafka code.

@mmuehlbeyer @Bruno @mjsax

tushargoyal22 · 4 November 2022 08:51

I mean does this include the time from application calling send() and the record is sent to the broker?

mmuehlbeyer · 4 November 2022 08:58

producer buffer is not the only relevant part here

ack settings for the topic/broker also come into the game
I assume it’s relevant that the data is written and “not only” sent to the broker?

best,
michael

tushargoyal22 · 4 November 2022 09:05

yes that I understood, the question is whether it includes the time record resides in the producer buffer.

I want to ask this because it can help me conclude whether the broker is slow or producer!

mmuehlbeyer · 4 November 2022 11:45

from my understanding yes
never tested in detail, but if you play around with
buffer.memoryand keep everything else I guess you should be able to check/measure this.

best,
michael

mmuehlbeyer · 4 November 2022 11:47

also would recommend the following hands-on

as well as

best,
michael

tushargoyal22 · 4 November 2022 13:05

The figures in the tutorial seem to be completely misleading

How can the request-latency-avg be less that record-queue-time-avg

From where can we get the best understanding of what it means?

mmuehlbeyer · 4 November 2022 14:02

record-queue-time-avg:
This is the average amount of time your record batches spend in the send buffer.the batch prior to being flushed
→ how long to fill a batch

request-latency-avg:
double checked once again and it’s a measure of the amount of time between when KafkaProducer.send() was called until the producer receives a response from the broker.
→ so “after” the buffering

tushargoyal22 · 4 November 2022 14:45

so we can say request-latency-avg is the time from the point it got flushed() to the time producer receives the response?

and record-queue-time-avg is time from the point KafkaProducer.send() was called to the time when the batch got flushed()

Am I correct?

mmuehlbeyer · 4 November 2022 15:27

from my understanding it’s the time which is taken from clients nic to the broker’s nic card
plus the time the brokers need to handle the message (replication and so on)

danicafine · 7 November 2022 20:57

Hey there, I think we’ve pretty much nailed it down here, but I wanted to make sure we have this clarified once and for all.

request-latency-avg is a little misleading as it’s not clear exactly what request is referring to. It does not correspond to each call to producer.send() as might be expected. Rather, as you’ve reasoned through here, it refers to the internal request that the producer makes to send records to the broker. record-queue-time-avg is the time the records sit in the producer queue/buffer between the call to producer.send() and the aforementioned request to the broker.

To summarize: request-latency-avg and record-queue-time-avg refer to two completely different time periods in the record send lifecycle; there’s no overlap between the two.

When producer.send() is called, records will first sit in the producer buffer/queue until batch.size or linger.ms is reached or we call producer.flush(). This contributes to record-queue-time-avg. (And this is also why the record-queue-time-avg is so close to linger.ms in the example you printed, @tushargoyal22.)
After this, the producer makes a request to the broker, at which point we start the clock for request-latency-avg.

system · 14 November 2022 20:57

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to conclude whether the broker is slow or producer is slow Kafka Streams	3	3035	3 November 2022
Metrics for producers Confluent Cloud	2	2566	13 March 2023
How to get per topic performance metrics Confluent Cloud	0	2084	21 April 2023
Producer Performance - Differences Clients	2	3249	1 March 2022
Performance Issues with confluent kafka vs older cluster Ops	25	176	25 February 2025

Kafka Metric detailed info/ documentation

Related topics