Error sending fetch request / Join group failed

Hi, I have the following kafka streams application,

  • Spring Cloud Stream kafka (functional)
  • running on kubernetes , 3 pods and
  • num stream threads = 1 per pod

I am running into an issue where,

  • the consumers are unable to commit the offset. I can see the 3 pods are receiving the messages from kafka and are updating the DB - however the offsets in the consumer group are not moving at all.
  • Also the consumer group says Consumer group 'Location-dev-Sync' has no active members.

Below are 3 exceptions I see in the logs,

  • Error sending fetch request (sessionId=1445202357, epoch=INITIAL) to node 0: org.apache.kafka.common.errors.TimeoutException: Failed to send request after 30000 ms.
  • Join group failed with org.apache.kafka.common.errors.MemberIdRequiredException: The group member needs to have a valid member id before actually entering a consumer group.
  • Join group failed with org.apache.kafka.common.errors.UnknownMemberIdException: The coordinator is not aware of this member.”

I originally had the commit interval as 1 sec , and thought this must be causing the issue, later removed that and let it be the default (30secs). Still I see this issue, and looks like this issue is intermittent and seems to happen at random.Any pointers please ?

@hnazslacks Can you verify if the errors are in the Kafka broker logs or in your client application logs? Can you also verify that your applications are properly configured with the consumer group id setting properly (by logging output of the consumer)?

You also mention that the issue is “intermittment” but the the broker is reporting the consumer group has no active members. So are the consumers able to commit an offset successfully at all or do they fail to do so intermittently?

Thanks for the reply @rick.

Blockquote
Can you verify if the errors are in the Kafka broker logs or in your client application logs?

  • The logs are client side, but the broker is also reporting this

Member lpLocation-dev-lpMasterSync-08f64051-8c08-44f3-8469-51654fe963d7-StreamThread-1-consumer-a371b1c5-5021-41a7-9d28-b904ca46a340 in group lpLocation-dev-lpMasterSync has failed, removing it from the group

Blockquote
Can you also verify that your applications are properly configured with the consumer group id setting properly (by logging output of the consumer)?

  • Yes, the application name is set and shows up as the consumer group name (lpLocation-dev-lpMasterSync)

Blockquote
So are the consumers able to commit an offset successfully at all or do they fail to do so intermittently?

  • Consumers are getting the messages, but they are not able to commit the offsets.

I let this stay overnight, and this morning I can see 2 consumers seem to have joined the group and are committing offsets, however the third is still not in the consumer group.

could this be related to heartbeat interval / session time out settings ?