I have had an issue for the past few days when trying to connect to the Confluent Cloud Kafka cluster (Basic type) from the Java application.
My client application is running in the company’s internal network where enterprise firewall is used for inbound and outbound traffic. According to Confluent Cloud UI and related documentation, we’ve enabled traffic that goes to the broker endpoint presented in UI to port 9092. However, the connection still didn’t work from my application or even from Apache Kafka CLI scripts. The connection worked as expected when tried on a machine outside of our company network.
After some investigation, we called the “describe cluster” method from outside of our company network, which listed all Kafka backend nodes, added enable rules to our firewall for all of the backend nodes and the connection started to work.
My question is whether there is any documentation mentioning that traffic to all backend nodes needs to be enabled on the firewall because I didn’t find anything and spent a lot of time with the investigation.
It’s really weird that traffic to all backend nodes needs to be enabled because:
if the broker endpoint is a load balancer, it’s really uncommon that the client would communicate also with backend nodes.
if the broker endpoint is a Kafka broker, connection to a single broker in the Kafka cluster should leverage full connection to the cluster.
I’ve tried to run Wireshark and capture network traffic which confirmed my theory that the client communicates to all the backend nodes.
Thank you for pointing me to the proper documentation,
Petr
This is how it works. The bootstrap servers endpoint in Confluent Cloud is a load balancer but, after establishing the initial connection, clients communicate directly with brokers. This communication pattern is part of the Kafka protocol, i.e., direct connectivity is required for any Kafka deployment. From consumer config documentation on bootstrap.servershere:
The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers.
Because Confluent Cloud doesn’t currently support static ingress IPs, capturing all broker DNS entries in the firewall is the recommended approach. Important caveats: this applies to clusters with public endpoints only, and this guidance will likely change in the future. In case the packet sniffing isn’t giving you all brokers, you could use kcat to get the list:
Metadata for all topics (from broker -1: sasl_ssl://pkc-12345.us-west-2.aws.confluent.cloud:9092/bootstrap):
12 brokers:
broker 0 at b0-pkc-12345.us-west-2.aws.confluent.cloud:9092 (controller)
broker 1 at b1-pkc-12345.us-west-2.aws.confluent.cloud:9092
broker 2 at b2-pkc-12345.us-west-2.aws.confluent.cloud:9092
broker 3 at b3-pkc-12345.us-west-2.aws.confluent.cloud:9092
broker 4 at b4-pkc-12345.us-west-2.aws.confluent.cloud:9092
broker 5 at b5-pkc-12345.us-west-2.aws.confluent.cloud:9092
In the end, I was able to discover all Kafka nodes by running the kafka-broker-api-versions.bat script from Confluent Platform distribution and enabling them all on the firewall.
I’m just missing any guidance or documentation where I can point our customers when dealing with the same issue. The bootstrap.servers description is a good start and I totally missed that info but I’m missing any comment about communication to all backend nodes when browsing through documentation about connection issues.
Thank you again for your explanation and hope this topic will help you to improve your documentation even more.