Kafka Retry Mechanism

We are trying to understand how the retry logic on Kafka works for which we ran the following producer code on a local kafka cluster in ubuntu

prop.setProperty(ProducerConfig.RETRIES_CONFIG, "5");
prop.setProperty(ProducerConfig.ACKS_CONFIG, "1");
prop.setProperty(ProducerConfig.BUFFER_MEMORY_CONFIG, "16384");
prop.setProperty(ProducerConfig.MAX_BLOCK_MS_CONFIG, "15000");
prop.setProperty(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, "100");

final KafkaProducer<String, String> kprod = new KafkaProducer<String, String>(prop);
for (int i = 0; i<500; i++) 
    kprod.send(new ProducerRecord<>("sample-topic", "key_"   i, "Reference site about Lorem Ipsum, giving information on its origins, as well as a random Lipsum generator Value_"   i), null);
    System.out.println("Record Sent Successfully with Value: "   i);
System.out.println("\n\n\n\n\n\n\n\n---------------------------500 message loop complete --------------------------\n\n\n\n\n");

kprod.send(new ProducerRecord<>("sample-topic", "key_"   1000, "Reference site about Lorem Ipsum, giving information on its origins, as well as a random Lipsum generator Value_"   1000));


The broker was manually terminated when the loop was executing. When this was done, 233 messages had been received at the consumer end, no messages after this were received.

Once the loop had finished execution and the producer code went to sleep, we switched the broker back on and expected the messages post 233 to be received at the consumer end. However, we did not receive any message and instead received the following error -

ERROR [Consumer clientId=consumer-console-consumer-16102-1, groupId=console-consumer-16102] LeaveGroup request with Generation{generationId=1, memberId=‘consumer-console-consumer-16102-1-12a7d5ff-518a-4e6c-a8c6-d60c2454b572’, protocol=‘range’} failed with error: The coordinator is loading and hence can’t process requests. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator)

The message sent outside the loop with key 1000 was also not received. We expected some of the messages to be received at the consumer end when the producer code reached flush() method but this was not the case and are trying to understand this behavior of Kafka.

Any help would be appreciated. Thanks.

Thread.sleep could have weird side effects. It also depends on the timing, there is only about half a second for the 5 retries, it takes a lot more time for the broker to be able to handle requests again.