SASL/Kerberos (Kraft) - Brokers fail to authenticate

Hello,
i dont have previous experience configuring Kraft and im trying to set up Kafka cluster using Kraft in combined mode.
My goal is to achieve kafka servers to authenticate via Kerboros (SASL_PLAINTEXT protocol).
I followed official Confluent documentation, such as
https://docs.confluent.io/platform/current/kafka/authentication_sasl/authentication_sasl_gssapi.html

Im stuck at the state, where my kafka service can start, it communicates with other controllers and looks stable. But issue with functionality appears when i want to contact brokers using commands such as:

/usr/bin/kafka-topics --list --bootstrap-server hostname

or

/usr/bin/kafka-metadata-quorum --bootstrap-server hostname:9092 describe --status

In /var/log/kafka/server.log these messages repeat until the command times out (first octets of IP addresses replaced with X.X.X):

DEBUG Accepted connection from /X.X.X.116:58312 on /X.X.X.117:9092 and assigned it to processor 10, sendBufferSize [actual|requested]: [102400|102400] recvBufferSize [actual|requested]: [102400|102400] (kafka.network.DataPlaneAcceptor)
DEBUG Processor 10 listening to new connection from /X.X.X.116:58312 (kafka.network.Processor)
INFO [SocketServer listenerType=BROKER, nodeId=1] Failed authentication with /X.X.X.116 (channelId=X.X.X.117:9092-X.X.X.116:58312-0) (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)

I tried many things, but nothing worked and im out of ideas. I would be very grateful for advices.


Here are details of my kafka cluster:

/usr/lib/systemd/system/confluent-kafka.service

[Unit]
Description=Apache Kafka - broker
Documentation=http://docs.confluent.io/
After=network.target confluent-zookeeper.target

[Service]
Type=simple
User=cp-kafka
Group=confluent
ExecStart=/usr/bin/kafka-server-start /etc/kafka/kraft/server.properties
Environment=“KAFKA_OPTS=-Dsun.security.krb5.debug=true -Djava.security.krb5.conf=/etc/krb5.conf -Djava.security.auth.login.config=/etc/kafka/kraft/kafka_server_jaas.conf”
LimitNOFILE=1000000
TimeoutStopSec=180
Restart=no

[Install]
WantedBy=multi-user.target

However the kerb debug option (-Dsun.security.krb5.debug=true) doesnt seem to be working. I couldnt see no new messages in /var/log/kafka/server.log

/etc/kafka/kraft/server.properties (actual hostnames replaced with generic hostname names)

process.roles=broker,controller
node.id=1
controller.quorum.voters=1@hostname1:9093,2@hostname2.oskarmobil.cz:9093,3@hostname3:9093

listeners=SASL_PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=SASL_PLAINTEXT://hostname1:9092
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT

num.network.threads=16
num.io.threads=12
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600

metadata.log.dir=/var/lib/kraft
log.dirs=/mnt/disk-1/kafka

num.partitions=12
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.retention.hours=336
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0

confluent.license.topic.replication.factor=1
confluent.metadata.topic.replication.factor=1
confluent.security.event.logger.exporter.kafka.topic.replicas=1

delete.topic.enable=true
auto.create.topics.enable=false

sasl.enabled.mechanisms=GSSAPI
sasl.mechanism.inter.broker.protocol=GSSAPI
sasl.mechanism.controller.protocol=GSSAPI

security.inter.broker.protocol=SASL_PLAINTEXT
security.protocol=SASL_PLAINTEXT

listener.name.sasl_plaintext.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname1.keytab”
principal=“kafka/hostname1@REALM”;

sasl.kerberos.service.name=kafka

/etc/kafka/kraft/kafka_server_jaas.conf

KafkaServer {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname1.keytab”
principal=“kafka/hostname1@REALM”;
};

I wasnt actually planning on using dedicated jaas file as confluent docs literally says “While use of separate JAAS files is supported, it is not the recommended approach. Instead, use the listener configuration specified…”
But if i followed the docs, the kafka wouldnt start at all, with error that KafkaServer is not defined and i need to specify jaas conf file. So i created the jaas file with basically same Krb5LoginModule.

I can provide additional details/configs if needed.


One thing i noticed that i couldnt wrap my head around is that controllers seems to be able to authenticate, but brokers cant…
As seen in the /var/log/kafka/server.log:

DEBUG Accepted connection from /X.X.X.117:46588 on /X.X.X.117:9093 and assigned it to processor 10, sendBufferSize [actual|requested]: [102400|102400] recvBufferSize [actual|requested]: [102400|102400] (kafka.network.DataPlaneAcceptor)
DEBUG Processor 10 listening to new connection from /X.X.X.117:46588 (kafka.network.Processor)
DEBUG Accepted connection from /X.X.X.116:36048 on /X.X.X.117:9093 and assigned it to processor 11, sendBufferSize [actual|requested]: [102400|102400] recvBufferSize [actual|requested]: [102400|102400] (kafka.network.DataPlaneAcceptor)
DEBUG Processor 11 listening to new connection from /X.X.X.116:36048 (kafka.network.Processor)
INFO Successfully authenticated client: authenticationID=kafka/hostname1@REALM; authorizationID=kafka/hostname1@REALM. (org.apache.kafka.common.security.authenticator.SaslServerCallbackHandler)
INFO Successfully authenticated client: authenticationID=kafka/hostname2@REALM; authorizationID=kafka/hostname2@REALM. (org.apache.kafka.common.security.authenticator.SaslServerCallbackHandler)

Thank you for your time.
Regard, Lukas

I believe the Kerberos authentication is actually working. Although not sure if fully working or just partially.
In journalctl log i found that during startup, the Krb authentication was successful and server received a valid ticket, as seen here:

kafka-server-start[70482]: [2024-05-27 09:53:52,543] INFO [broker-1-to-controller-forwarding-channel-manager]: Starting (kafka.server.BrokerToControllerRequestThread)
kafka-server-start[70482]: [2024-05-27 09:53:52,564] INFO Updated connection-accept-rate max connection creation rate to 2147483647 (kafka.network.ConnectionQuotas)
kafka-server-start[70482]: Looking for keys for: kafka/hostname1@REALM
kafka-server-start[70482]: Added key: 17version: 2
kafka-server-start[70482]: Added key: 18version: 2
kafka-server-start[70482]: Looking for keys for: kafka/hostname1@REALM
kafka-server-start[70482]: Added key: 17version: 2
kafka-server-start[70482]: Added key: 18version: 2
kafka-server-start[70482]: default etypes for default_tkt_enctypes: 18 17.
kafka-server-start[70482]: >>> KrbAsReq creating message
kafka-server-start[70482]: >>> KrbKdcReq send: kdc=hostname_krb_server TCP:88, timeout=30000, number of retries =3, #bytes=173
kafka-server-start[70482]: >>> KDCCommunication: kdc=hostname_krb_server TCP:88, timeout=30000,Attempt =1, #bytes=173
kafka-server-start[70482]: >>>DEBUG: TCPClient reading 738 bytes
kafka-server-start[70482]: >>> KrbKdcReq send: #bytes read=738
kafka-server-start[70482]: >>> KdcAccessibility: remove hostname_krb_server
kafka-server-start[70482]: Looking for keys for: kafka/hostname1@REALM
kafka-server-start[70482]: Added key: 17version: 2
kafka-server-start[70482]: Added key: 18version: 2
kafka-server-start[70482]: >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
kafka-server-start[70482]: >>> KrbAsRep cons in KrbAsReq.getReply kafka/hostname1
kafka-server-start[70482]: [2024-05-27 09:53:52,568] INFO Successfully logged in. (org.apache.kafka.common.security.authenticator.AbstractLogin)
kafka-server-start[70482]: [2024-05-27 09:53:52,569] INFO [Principal=kafka/hostname1@REALM]: TGT refresh thread started. (org.apache.kafka.common.security.kerberos.KerberosLogin)
kafka-server-start[70482]: [2024-05-27 09:53:52,569] INFO [Principal=kafka/hostname1@REALM]: TGT valid starting at: Mon May 27 09:53:52 CEST 2024 (org.apache.kafka.common.security.kerberos.KerberosLogin)
kafka-server-start[70482]: [2024-05-27 09:53:52,569] INFO [Principal=kafka/hostname1@REALM]: TGT expires: Tue May 28 09:53:52 CEST 2024 (org.apache.kafka.common.security.kerberos.KerberosLogin)
kafka-server-start[70482]: [2024-05-27 09:53:52,570] INFO [Principal=kafka/hostname1@REALM]: TGT refresh sleeping until: Tue May 28 06:00:02 CEST 2024 (org.apache.kafka.common.security.kerberos.KerberosLogin)
kafka-server-start[70482]: [2024-05-27 09:53:52,578] INFO [SocketServer listenerType=BROKER, nodeId=1] Created data-plane acceptor and processors for endpoint : ListenerName(SASL_PLAINTEXT) (kafka.network.SocketServer)
kafka-server-start[70482]: [2024-05-27 09:53:52,590] INFO [broker-1-to-controller-alter-partition-channel-manager]: Starting (kafka.server.BrokerToControllerRequestThread)
kafka-server-start[70482]: [2024-05-27 09:53:52,600] INFO [MetadataLoader id=1] initializeNewPublishers: the loader is still catching up because we still don’t know the high water mark yet. (org.apache.kafka.image.loader.MetadataLoader)

And here:

kafka-server-start[70482]:Found KeyTab /etc/security/keytabs/hostname1.keytab for kafka/hostname1@REALM
kafka-server-start[70482]: Found ticket for kafka/hostname1@REALM to go to krbtgt/REALM@REALM expiring on Tue May 28 09:53:51 CEST 2024
kafka-server-start[70482]: Entered Krb5Context.acceptSecContext with state=STATE_NEW
kafka-server-start[70482]: Looking for keys for: kafka/hostname1@REALM
kafka-server-start[70482]: Added key: 17version: 2
kafka-server-start[70482]: Added key: 18version: 2
kafka-server-start[70482]: >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
kafka-server-start[70482]: default etypes for permitted_enctypes: 18 17.
kafka-server-start[70482]: >>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
kafka-server-start[70482]: MemoryCache: add 1716796435/731701/68AEFD1232CF55C627CEE659AB367B3FE3A282612AC44032B3FF22F77AA573DD/kafka/hostname2@REALM to kafka/hostname2@REALM|kafka/hostname1@REALM
kafka-server-start[70482]: >>> KrbApReq: authenticate succeed.
kafka-server-start[70482]: Krb5Context setting peerSeqNumber to: 634357655
kafka-server-start[70482]: Krb5Context setting mySeqNumber to: 634357655
kafka-server-start[70482]: Krb5Context.wrap: data=[01 01 00 00 ]
kafka-server-start[70482]: Krb5Context.wrap: token=[05 04 01 ff 00 0c 00 00 00 00 00 00 25 cf 87 97 01 01 00 00 6d 54 9d fb 98 c5 be 64 3a 4e 14 92 ]
kafka-server-start[70482]: Krb5Context.unwrap: token=[05 04 00 ff 00 0c 00 00 00 00 00 00 25 cf 87 97 01 00 00 00 6b 61 66 6b 61 2f 63 7a 6d 6f 72 6b 38 2e 6f 73 6b 61 72 6d 6f 62 69 6c 2e 63 7a 40 4d 4f 52 50 48 45 55 53 c2 6d db d5 85 aa 5c da c>
kafka-server-start[70482]: Krb5Context.unwrap: data=[01 00 00 00 6b 61 66 6b 61 2f 63 7a 6d 6f 72 6b 38 2e 6f 73 6b 61 72 6d 6f 62 69 6c 2e 63 7a 40 4d 4f 52 50 48 45 55 53 ]
kafka-server-start[70482]: [2024-05-27 09:53:55,754] INFO Successfully authenticated client: authenticationID=kafka/hostname2@REALM; authorizationID=kafka/hostname2@REALM. (org.apache.kafka.common.security.authenticat>

But when it comes to calling the kafka broker with before-mentioned commands, it still fails with SASL handshake error. Sadly there is no additional info in log, even when i tried turning on debug logging…

INFO [SocketServer listenerType=BROKER, nodeId=1] Failed authentication with /X.X.X.116 (channelId=X.X.X.117:9092-X.X.X:45120-13) (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)

Why does Kerberos work, but SASL handshake does not?
Im no expert to the authentication stuff, it seems to me that problem is not with Kerberos but with something else…
Is there a way how to debug the handshake so i can see why is fails?

hey @LukasK

welcome :slight_smile:

your log message

is seen when a client tries to use the wrong authentication method on your SASL endpoints and fails to authenticate.

For example, the client tries to use a security.protocol of PLAINTEXT against your SASL_SSL endpoint with GSSAPI or security.mechanism SASL_PLAIN rather than SASL_SSL

HTH,
Michael

Thank you for input, I figured as much too from my research in internet, but i have no idea where the misconfiguration might be. :see_no_evil:

Nevertheless i focused on the area of authentication configs again and did some tests.
I reconfigured the cluster to NOT use authentication, leaving just PLAINTEXT and deleting any security protocols/methods, leaving only default settings.

SERVER 1:

process.roles=broker,controller
node.id=1
controller.quorum.voters=1@hostname1:9093,2@hostname2:9093

listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://hostname1:9092
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT

SERVER 2:

process.roles=broker,controller
node.id=2
controller.quorum.voters=1@hostname1:9093,2@hostname2:9093

listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=PLAINTEXT://hostname2:9092
inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT

This configuration with just PLAINTEXT works.
I can successfully run commands like:

[root@hostname1 kraft]# /usr/bin/kafka-metadata-quorum --bootstrap-server hostname1:9092 describe --status
ClusterId: Q8t6MsfhRIe4y1HBW43sKQ
LeaderId: 1
LeaderEpoch: 300
HighWatermark: 764661
MaxFollowerLag: 764662
MaxFollowerLagTimeMs: -1
CurrentVoters: [1,2]
CurrentObservers:
[root@hostname1 kraft]#

By this i confirmed the kafka cluster is working :+1: :+1:, and the problem is truly in the GSSAPI/Kerberos authentication.

So then i reconfigured the cluster to GSSAPI and SASL_PLAINTEXT.
Followed the Confluent documentation here: Configuring GSSAPI | Confluent Documentation
Changes to config include commenting out the “inter.broker.listener.name” and adding the section “KERBEROS config” with config instructed in official documentation.
SERVER 1:

process.roles=broker,controller
node.id=1
controller.quorum.voters=1@hostname1:9093,2@hostname2:9093

listeners=SASL_PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=SASL_PLAINTEXT://hostname1:9092
#inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT

#------>>KERBEROS config----------
sasl.enabled.mechanisms=GSSAPI
sasl.mechanism.inter.broker.protocol=GSSAPI
security.inter.broker.protocol=SASL_PLAINTEXT

listener.name.controller.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname1.keytab”
principal=“kafka/hostname1@REALM”;

listener.name.sasl_plaintext.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname1.keytab”
principal=“kafka/hostname1@REALM”;

sasl.kerberos.service.name=kafka
#--------KERBEROS config<<--------

SERVER 2:

process.roles=broker,controller
node.id=2
controller.quorum.voters=1@hostname1:9093,2@hostname2:9093

listeners=SASL_PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
advertised.listeners=SASL_PLAINTEXT://hostname2:9092
#inter.broker.listener.name=PLAINTEXT
controller.listener.names=CONTROLLER
listener.security.protocol.map=CONTROLLER:SASL_PLAINTEXT,SASL_PLAINTEXT:SASL_PLAINTEXT

#------>>KERBEROS config----------
sasl.enabled.mechanisms=GSSAPI
sasl.mechanism.inter.broker.protocol=GSSAPI
security.inter.broker.protocol=SASL_PLAINTEXT

listener.name.controller.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname2.keytab”
principal=“kafka/hostname2@REALM”;

listener.name.sasl_plaintext.gssapi.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname2.keytab”
principal=“kafka/hostname2@REALM”;

sasl.kerberos.service.name=kafka
#--------KERBEROS config<<--------

And now running the same command in unsuccessful and ends with timeout

[root@hostname1 kraft]# /usr/bin/kafka-metadata-quorum --bootstrap-server hostname1:9092 describe --status

and in log repeating error:

INFO [SocketServer listenerType=BROKER, nodeId=1] Failed authentication with /X.X.X.117 (channelId=X.X.X.117:9092-X.X.X.117:41222-1) (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)

Im even calling the command from node 1 to same node 1, so i dont understand how could there be misconfiguration (different authentication method) if the server tries to connect to itself. Basically like GSSAPI/SASL_PLAINTEXT client connecting to GSSAPI/SASL_PLAINTEXT server.

Enabling GSSAPI just changes few lines, everything else in configuration is unchanged. I cannot find any misconfiguration.

It seems to me that the documentation is not correct, maybe its missing something, or there is a bug in Kraft and its not really possible to make Kerberos work with Kraft. :roll_eyes:

Because i checked configuration in other kafka cluster running older confluent kafka release, which uses zookeper instead of Kraft. And GSSAPI/SASL_PLAINTEXT config is literally same and it works with zookeeper.

I think i provided every config needed, so if anyone can see what is missing in my configuration or see some error, please it would be much appreaciated. Otherwise i dont see how to make Kraft working with Kerberos and im not sure how to proceed further as zookeeper is depracated and Kraft is not functional.

Thanks

Figured it out. :raised_hands:

As suggested the root cause of the problem is mismatch in authentication between client and server.
And i was looking at the wrong place all this time.

The configuration i posted is indeed correct.
BUT the commands i used to test it are incorrect.

I wrongly assumed when i run the commands from cluster itself it takes configuration from config files i have set, but thats not true. In fact i need to use extended command specifying all the config files as i would if running the command from separate client.

For example, instead of

/usr/bin/kafka-metadata-quorum --bootstrap-server hostname1:9092 describe --status

run this

KAFKA_OPTS=-Djava.security.auth.login.config=/etc/kafka/kraft/kafka_client_jaas.conf /usr/bin/kafka-metadata-quorum --command-config /etc/kafka/kraft/client.properties --bootstrap-server hostname1:9092 describe --status

The commands require to have KAFKA_OPTS variable set, but with different jaas config instead of KafkaServer → Kafka Client
kafka_client_jaas.conf:

KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
storeKey=true
keyTab=“/etc/security/keytabs/hostname1.keytab”
principal=“kafka/hostname1@REALM”;
};

and also separate properties file
client.properties:

security.protocol=SASL_PLAINTEXT
sasl.kerberos.service.name=kafka

I assume the separate jaas file is not even needed, it should be possible to fit the Krb5LoginModule into the properties file using sasl.jaas.config=

I hope this helps someone dealing with the same/similar issues!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.