Kafka streams application getting stuck after broker restarts

Hello All,

We have a simple streams application which reads events from the input topic and process to an output topic. The application logic is pretty straightforward . But the application getting stuck when we restart brokers due to maintenance .

Here is the stack trace.

2021-05-11T23:01:37,990 [us-sample-ingester-a139e474-7c0f-4f70-8167-d303d5297cbb-StreamThread-12] ERROR org.apache.kafka.streams.processor.internals.StreamThread - stream-thread [us-sample-ingester-a139e474-7c0f-4f70-8167-d303d5297cbb-StreamThread-12] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.streams.errors.StreamsException: task [0_41] Abort sending since an error caught with a previous record (timestamp 1620773943503) to topic sample_txn_input due to org.apache.kafka.common.errors.TimeoutException: Expiring 4 record(s) for sample_txn_input-12:120007 ms has passed since batch creation
You can increase the producer configs `delivery.timeout.ms` and/or `retries` to avoid this error. Note that `retries` is set to infinite by default.
	at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:144) ~[kafka-streams-2.4.0.jar:?]
	at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:52) ~[kafka-streams-2.4.0.jar:?]
	at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:204) ~[kafka-streams-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1348) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:230) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:196) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:730) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:391) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:334) ~[kafka-clients-2.4.0.jar:?]
	at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244) ~[kafka-clients-2.4.0.jar:?]
	at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 4 record(s) for sample_txn_input-12:120007 ms has passed since batch creation

@DileepMandapam have you tried changing the following producer properties to larger values?

request.timeout.ms  = 30000
delivery.timeout.ms = 180000 

You can find documentation on these in the Apache Kafka documentation: Apache Kafka

Rick,
No. I did not. Let me increase these timeouts to larger values. But just curious, Do we have any other solution.

Thanks
Dileep…

@DileepMandapam Sorry, I’ve shared all I could find related to this error when you bring down the brokers.