Issue at consumer batch poll

Hi All,
We are facing an issue with the Kafka consumer poll behavior in our application. We have configured max.poll.records to 3000 and implemented logic such that if the number of polled records is greater than or equal to 3000, we process the records and commit the offsets. Otherwise, we ignore the poll.

This setup has been working fine, but recently, we observed an unusual behavior. For almost 48 hours, the consumer consistently polled the same number of records (1500) in every poll, which is less than the configured threshold of 3000. As a result, we were unable to process the records, leading to a significant business impact. After this period, the issue resolved itself, and the consumer started polling with the expected record count (>=3000).

This behavior has been occurring intermittently for the past month, causing disruptions to our application’s processing.

Following are Consumer configurations, using deafult configurations other than below.

“auto.offset.reset” value=“earliest”
“enable.auto.commit” value=“false”
“max.poll.interval.ms” value=“360000”
“max.poll.records” value=“3000”
“max.partition.fetch.bytes” value=“2000000”
“heartbeat.interval.ms” value=“12000”
“session.timeout.ms” value=“120000”

each record size almost 160 bytes => 3000*160 = 480000(0.48MB)

Has anyone else faced a similar issue or have insights into what might be causing this behavior? Are there any fixes or configurations we should check to prevent this from happening again?

Any suggestions or guidance would be greatly appreciated.

You can look at increasing fetch.min.bytes and fetch.max.wait.ms to impact the behavior. Impact is the key word though - AFAIK there is no way to guarantee preventing what you are seeing given that multiple parallel fetches might happen under the hood. So, I believe that successive polls might greedily return fewer records than you want. The crux of the issue is that max.poll.records is strictly an upper bound threshold when you are looking for a lower bound. I will poke around to see if there is another way but currently I think that you will need to either implement the lower bound threshold client-side, or reconsider whether you need a hard # records lower bound.