Issue at consumer batch poll

SaiKrishnaNeeli · 8 January 2025 05:36

Hi All,
We are facing an issue with the Kafka consumer poll behavior in our application. We have configured max.poll.records to 3000 and implemented logic such that if the number of polled records is greater than or equal to 3000, we process the records and commit the offsets. Otherwise, we ignore the poll.

This setup has been working fine, but recently, we observed an unusual behavior. For almost 48 hours, the consumer consistently polled the same number of records (1500) in every poll, which is less than the configured threshold of 3000. As a result, we were unable to process the records, leading to a significant business impact. After this period, the issue resolved itself, and the consumer started polling with the expected record count (>=3000).

This behavior has been occurring intermittently for the past month, causing disruptions to our application’s processing.

Following are Consumer configurations, using deafult configurations other than below.

“auto.offset.reset” value=“earliest”
“enable.auto.commit” value=“false”
“max.poll.interval.ms” value=“360000”
“max.poll.records” value=“3000”
“max.partition.fetch.bytes” value=“2000000”
“heartbeat.interval.ms” value=“12000”
“session.timeout.ms” value=“120000”

each record size almost 160 bytes => 3000*160 = 480000(0.48MB)

Has anyone else faced a similar issue or have insights into what might be causing this behavior? Are there any fixes or configurations we should check to prevent this from happening again?

Any suggestions or guidance would be greatly appreciated.

dtroiano · 9 January 2025 19:51

You can look at increasing fetch.min.bytes and fetch.max.wait.ms to impact the behavior. Impact is the key word though - AFAIK there is no way to guarantee preventing what you are seeing given that multiple parallel fetches might happen under the hood. So, I believe that successive polls might greedily return fewer records than you want. The crux of the issue is that max.poll.records is strictly an upper bound threshold when you are looking for a lower bound. I will poke around to see if there is another way but currently I think that you will need to either implement the lower bound threshold client-side, or reconsider whether you need a hard # records lower bound.

SaiKrishnaNeeli · 27 January 2025 07:30

Thank you for your insights on this. We have attempted to increase the fetch.min.bytes and fetch.max.wait.ms settings; however, these changes did not have any impact on the issue. Unfortunately, we are still experiencing the problem intermittently, at least twice a week. Could you kindly share your suggestions?

dtroiano · 27 January 2025 14:45

That’s not entirely unexpected. There isn’t a way to guarantee this (no lower bound analogue of max.poll.records). A couple of options that you can consider:

Buffer in your application – call poll multiple times if needed until you get to >= 3000 records
Take a look at Kafka Streams. You can use it to buffer a number of records and optionally have an upper time bound when you let a batch through even if you haven’t hit 3000. This Stack Overflow has some ideas and sample code.

I would probably lean toward option 1, though if Kafka Streams could replace the application entirely then option 2 is enticing. IOW IMO it comes down to how good a Kafka Streams fit your application is overall.

Topic		Replies	Views
Max interval poll Kafka Connect	1	655	26 July 2024
Kafka consumer polls return empty results even with long timeouts Java Clients	2	902	26 August 2024
[Python] How to capture Application maximum poll interval exceeded error? Non-Java Clients	1	5727	7 June 2022
Cannot read all the Kafka messages from all partitions of a topic Clients	1	1126	16 December 2024
Large fetch response size when resuming read from partition Java Clients	2	17	2 March 2025

Issue at consumer batch poll

Related topics