Kafka does not works through NAT as expected

First off, welcome to the forum and bravo on a great post with clear details of the problem and even a diagram :clap:

Let’s address the non-VPN bit to start with, and then I’ll come back to that afterwards.

Public IP → NAT → Internal

When your external client connects to AA.XX.XX.XX, it is routed by NAT to the internal IP and ports of the Kafka brokers. This is one of the three 10.XX.XX.X_:9092 boxes.

What this means is that your traffic hits the brokers on the internal listener (labelled SASL_SSL in your config) and not the EXTERNAL listener. Because of that, when the broker replies it provides the metadata of the internal listener.

Since your client receives the internal listener metadata, it then tries to connect to 10.XX.XX.X_:9092directly - which fails because it’s not accessible externally.

But why does it work on VPN?

The same as above happens - the external connection goes through NAT, is translated to 10.XX.XX.X_:9092, and internal listener metadata is sent back.

But because the client is on the VPN, when it then tries to subsequently connect given the metadata it received (10.XX.XX.X_:9092) it works, because it is on the VPN and can thus access these IP/ports directly.

How to fix it?

Instead of NAT’ing the external IP to the same internal IP/port as is used internally and on the VPN, you need to NAT each external IP/port to one of the brokers and a new, unused port on that broker. Then configure each broker with a listener on that port, and specify the advertised.listener as the corresponding external IP/port.

So you’ll have three brokers configured thus. Note that the for each EXTERNAL advertised.listener varies, whilst the listeners remains constant.

  • KAFKA01 (10.XX.XX.XX)

    listeners=SASL_SSL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9095
    advertised.listeners=SASL_SSL://10.XX.XX.XX:9092,EXTERNAL://AA.XX.XX.XX:9093
    listener.security.protocol.map=SASL_SSL:SASL_SSL,EXTERNAL:SASL_SSL
    
  • KAFKA02 (10.XX.XX.XY)

    listeners=SASL_SSL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9095
    advertised.listeners=SASL_SSL://10.XX.XX.XY:9092,EXTERNAL://AA.XX.XX.XX:9094
    listener.security.protocol.map=SASL_SSL:SASL_SSL,EXTERNAL:SASL_SSL
    
  • KAFKA03 (10.XX.XX.XZ)

    listeners=SASL_SSL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9095
    advertised.listeners=SASL_SSL://10.XX.XX.XZ:9092,EXTERNAL://AA.XX.XX.XX:9095
    listener.security.protocol.map=SASL_SSL:SASL_SSL,EXTERNAL:SASL_SSL
    

Now you configure NAT thus:

  • AA.XX.XX.XX:9093 -> 10.XX.XX.XX:9095
  • AA.XX.XX.XX:9094 -> 10.XX.XX.XY:9095
  • AA.XX.XX.XX:9095 -> 10.XX.XX.XZ:9095

So a client connecting externally uses the same IP address but different ports, and traffic to each of the different ports routes to one of the three brokers internally. In turn, it hits the broker on its EXTERNAL listener port, and thus when the broker replies the metadata that it sends will include the correct EXTERNAL listener (the NAT’d IP + port)

kcat (formally called kafkacat) can do this, with the -L flag. You can also use the Python or Golang programs here to help validate it:

References

3 Likes