KsqlDb not connecting to schema registry when in headless mode

Hi All,

My development environment consists of a docker-compose file with the 6.2.0 versions for all of the usual suspects:

zookeeper       2181/tcp, 2888/tcp, 3888/tcp
broker  0.0.0.0:9092->9092/tcp, :::9092->9092/tcp, 0.0.0.0:9101->9101/tcp, :::9101->9101/tcp
control-center  0.0.0.0:9021->9021/tcp, :::9021->9021/tcp
ksqldb-cli
rest-proxy      0.0.0.0:8082->8082/tcp, :::8082->8082/tcp
schema-registry 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp
kafkacat

Plus a couple of sqlserver database mocks and a custom container (based on cp-kafka-connect-base:6.2.0) that imports and configures dbz and jdbc for sql server’

edipa-onprem-connect    0.0.0.0:8083->8083/tcp, :::8083->8083/tcp, 9092/tcp
parldata        0.0.0.0:21433->1433/tcp, :::21433->1433/tcp
prismdata       0.0.0.0:11433->1433/tcp, :::11433->1433/tcp

This all works as expected.

When I use a vanilla confluentinc/cp-ksqldb-server:6.2.0 I can interactively create ksql streams against the topics created by debezium and jdbc without any issue.

However, if I use a custom container based the confluentinc/cp-ksqldb-server:6.2.0 image, but with a baked-in queries file:

FROM    confluentinc/cp-ksqldb-server:6.2.0
EXPOSE  8088
USER    appuser
ENV     KSQL_KSQL_SERVICE_ID=MembersKsqlCluster
ENV     KSQL_KSQL_QUERIES_FILE=/config/members-v1.sql

WORKDIR /config
COPY config .

That applies the same queries that worked in interactive mode for use in headless mode, the queries fail when ksqldb attempts to execute the DDL:

$ docker-compose logs ksqldb-server
ksqldb-server           | [2021-09-10 13:46:29,718] ERROR Failed to start KSQL Server with query file: /config/members-v1.sql (io.confluent.ksql.rest.server.StandaloneExecutor:132)
ksqldb-server           | io.confluent.ksql.util.KsqlStatementException: Schema registry fetch for topic value request failed. Topic: prismdata_cdc_prismdata.dbo.Affiliation
ksqldb-server           | Caused by: Could not connect to the server. Please check the server details are
ksqldb-server           |       correct and that the server is running.
ksqldb-server           | Statement: CREATE STREAM IF NOT EXISTS PRISMDATA_AFFILIATION WITH (KAFKA_TOPIC='prismdata_cdc_prismdata.dbo.Affiliation', KEY_FORMAT='KAFKA', VALUE_FORMAT='AVRO');
ksqldb-server           |       at io.confluent.ksql.schema.ksql.inference.DefaultSchemaInjector.inject(DefaultSchemaInjector.java:92)

The logging seems to indicate that the ksqldb-server cannot locate the schema registry at the time it is applying the queries from the query file.

I’ve tried to crank all of the various timeouts in ksqldb up, thinking that my 16MB win11/wsl2 dev environment is struggling under the load:

ksqldb-server:    
    image: ${DOCKER_REGISTRY-}edipa/kafka/ksql/members-ksqldb:latest
    #image: confluentinc/cp-ksqldb-server:6.2.0
    hostname: ksqldb-server
    container_name: ksqldb-server
    depends_on:
      - broker
      - schema-registry
      - edipa-onprem-connect
    ports:
      - "8088:8088"
    
#    volumes:
#      - ./extensions:/etc/ksqldb/ext
    environment:
      KSQL_CONFIG_DIR: "/etc/ksql"
      KSQL_LOG4J_OPTS: "-Dlog4j.configuration=file:/etc/ksqldb/log4j.properties"
      KSQL_KSQL_EXTENSION_DIR: "/etc/ksqldb/ext/"
      KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true'
      KSQL_BOOTSTRAP_SERVERS: "broker:29092"
      KSQL_HOST_NAME: ksqldb-server      
      KSQL_LISTENERS: "http://0.0.0.0:8088, http://ksqldb-server:28088"
      KSQL_CACHE_MAX_BYTES_BUFFERING: 0
      KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:28081"
      KSQL_PRODUCER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor"
      KSQL_CONSUMER_INTERCEPTOR_CLASSES: "io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor"
      KSQL_KSQL_CONNECT_URL: "http://edipa-onprem-connect:8083"
      KSQL_KSQL_LOGGING_PROCESSING_TOPIC_REPLICATION_FACTOR: 1
      KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true'
      KSQL_KSQL_SCHEMA_REGISTRY_DISCOVERY_TIMEOUT: 90000
      KSQL_KSQL_SCHEMA_REGISTRY_DISCOVERY_RETRIES: 9

and similarly with the schema registry

schema-registry:
    image: confluentinc/cp-schema-registry:6.2.0
    hostname: schema-registry
    container_name: schema-registry
    depends_on:
      - broker
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: 'broker:29092'
      SCHEMA_REGISTRY_LISTENERS: http://localhost:8081,http://schema-registry:28081
      SCHEMA_REGISTRY_CUB_KAFKA_TIMEOUT: 300
      SCHEMA_REGISTRY_KAFKASTORE_INIT_TIMEOUT_MS: 90000
      SCHEMA_REGISTRY_KAFKASTORE_TIMEOUT_MS: 2000
      SCHEMA_REGISTRY_KAFKASTORE_ZK_SESSION_TIMEOUT_MS: 90000

If I exec into another container running in the compose network, I can confirm that the schema-registry is up and able to connect:

$ docker exec -it --user root ksqldb-cli bash
[root@d6db8ad3ecdf appuser]# nc -zv schema-registry 28081
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 172.18.0.8:28081.
Ncat: 0 bytes sent, 0 bytes received in 0.03 seconds.

Interestingly, if I try to hit the schema-registry api from the docker host, I get an empty response

$ curl -H "Accept: application/vnd.schemaregistry.v1+json" localhost:8081/schemas/ids/1
curl: (52) Empty reply from server

But there are no errors logged by schema-registry:

$ docker-compose logs schema-registry  |grep -i error
schema-registry         | [2021-09-10 13:46:28,879] INFO Finished rebalance with leader election result: Assignment{version=1, error=0, leader='sr-1-168e9190-3ae3-495b-96bc-9e63c7a3cdc7', leaderIdentity=version=1,host=schema-registry,port=8081,scheme=http,leaderEligibility=true} (io.confluent.kafka.schemaregistry.leaderelector.kafka.KafkaGroupLeaderElector)

Not sure how to fix this :confused: Do I just crank all the timeouts to eleven? Is there some problem with the dependencies I’ve specified in the docker-compose file for ksqldb-server or schema-registry?

Would prefer to make the dev env work completely disconnected, but am thinking that maybe the solution is to use ccloud’s broker, zk and schema registry. Can’t use ccloud for connect - this has to be self managed and would prefer to keep ksqldb self-managed too.

BTW, I saw @ cschwarzfischer post indicating that headless mode is being phased out. Is the headless approach no longer the best way to deploy ksqldb’s into a production/k8s environment?

Any tips or advice on how best to run ksqldb in kubernetes is most welcome.

Thanks all!
_T

An update for the interested:

So the issue appears to be related to the fact that the sql servers were still doing setup (attaching dbs, running scripts) when ksql started. This meant that Connect hadn’t had enough time to do its part (create topics, schema) before Ksql tried to access the registry.

My interim solution is to create the ksqldb in a second docker-compose that spins up and attaches ksqldb to the docker-network that was created by the first compose. I’m calling the second compose manually, but think that I can write a bash script that uses jq to verify the count of the subjects in the schema registry as a prerequisite to starting ksqldb. E.g.

curl -X GET http://localhost:8081/subjects | jq '. | length'

Alas tho, now that’s out of the way, a new error emerges:

[2021-09-12 14:12:44,041] ERROR Failed to start KSQL Server with query file: /config/members-v1.sql (io.confluent.ksql.rest.server.StandaloneExecutor:132)
ksqldb-server    | io.confluent.ksql.util.KsqlException: The SQL file does not contain any persistent queries. i.e. it contains no 'INSERT INTO', 'CREATE TABLE x AS SELECT' or 'CREATE STREAM x AS SELECT' style statements.
ksqldb-server    |      at io.confluent.ksql.rest.server.StandaloneExecutor.validateStatements(StandaloneExecutor.java:205)

Even though I can see the queries are there and the env appears set properly:

$ docker run --rm -it --user root edipaacrhddcace.azurecr.io/edipa/kafka/ksql/members-ksqldb:1.0.0 bash
[root@c5e1ea89022a config]# printenv
LANG=C.UTF-8
HOSTNAME=c5e1ea89022a
container=oci
PWD=/config
HOME=/root
KSQL_KSQL_SERVICE_ID=MembersKsqlCluster
COMPONENT=ksqldb-server
TERM=xterm
KSQL_KSQL_QUERIES_FILE=/config/members-v1.sql
SHLVL=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
CUB_CLASSPATH="/usr/share/java/cp-base-new/*"
KSQL_CLASSPATH=/usr/share/java/ksqldb-server/*
_=/usr/bin/printenv
[root@c5e1ea89022a config]# cat /config/members-v1.sql | head -n 1
CREATE  STREAM IF NOT EXISTS prismdata_Affiliation WITH (KAFKA_TOPIC='prismdata_cdc_prismdata.dbo.Affiliation', VALUE_FORMAT='AVRO');
[root@c5e1ea89022a config]#

Any ideas on why this is happening? Is this related to issue 1530? Or does it not like the CREATE STREAM WITH syntax I’m using?

Cheers,

Solution to the issue of not connecting properly was to write a script to guarantee that the database scripts had run, cdc had run, and the schema registry entries that the headless script relied on had been created.

2 Likes

Hey @_Tim , glad you got it working, and thanks for sharing the solution :+1:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.