We have a dedicated cluster setup using Azure PrivateLink. We are setup in 3 AZs as described at Use Azure Private Link connections with Confluent Cloud | Confluent Documentation.
Each of the 3 AZs is associated with a private endpoint in Azure, and each of those endpoints has an IP address. DNS is setup so that every broker in an AZ resolves to the same IP address - the IP of the private endpoint associated with that AZ.
Given that, how does a request sent to that IP address get to the proper broker? We are seeing a problem where the Kafka client is doing the following when trying to connect a consumer:
- We send a GroupCoordinator request to one of the brokers, and get back a message saying that the group coordinator is
e-0069.az2
. - We send a JoinGroup request to
e-0069.az2
to try and join the consumer group. - Sometimes, the JoinGroup response contains a NOT_COORDINATOR error; sometimes, the JoinGroup response succeeds.
I see us sending the same bytes to the same IP address whether the JoinGroup requests succeeds or fails, which makes me think that sometimes, when we think we’re sending to e-0069.az2
, we’re actually sending to e-0086.az2
, the other broker in that AZ.
Thanks in advance for any insights!