KsqlDB always creates new avro schema version

Hi everyone!

When I create this stream:

create stream `TEST` with (key_format='kafka',value_format='avro',partitions=3,value_avro_schema_full_name='Namespace.Avro.Test') as 
      select
        extractjsonfield(event, '$.keyId'),
        cast(extractjsonfield(event, '$.Field1') as int) `Field1`
      from events
      partition by extractjsonfield(event, '$.keyId')
      emit changes;

It always creates a new schema version but I want it to read last schema version.

In docker-compose.yml file there is this section:

  ksqldb-server:
    image: confluentinc/cp-ksqldb-server:6.1.1
    hostname: ksqldb-server
    container_name: ksqldb-server
    ports:
      - "8088:8088"
    environment:
      KSQL_LISTENERS: "http://0.0.0.0:8088"
      KSQL_BOOTSTRAP_SERVERS: "broker:29092"
      KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true'
      KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true'
      KSQL_SECURITY_PROTOCOL: SSL
      KSQL_KSQL_SCHEMA_REGISTRY_URL: http://schema-registry:8081 
      KSQL_AUTO_REGISTER_SCHEMAS: "false"
      KSQL_USE_LATEST_VERSION: "true"

Are the last two environment variables correct?

Can you help me please?

Thanks

This is a known limitation right now in ksqlDB: Support using existing output schema · Issue #5256 · confluentinc/ksql · GitHub.

Hi mikebin thanks for your reply!

Why does the “create stream” command create two schema versions? It creates two similar schema versions, the difference is only the “connect.name” field.
“create stream” command:

create stream `TEST` with (key_format='kafka',value_format='avro',partitions=3,value_avro_schema_full_name='Namespace.Avro.Test') as 
      select
        extractjsonfield(event, '$.keyId'),
        cast(extractjsonfield(event, '$.Field1') as int) `Field1`
      from events
      partition by extractjsonfield(event, '$.keyId')
      emit changes;

Can I avoid creating schema version with “connect.name” field?
Schema version with “connect.name” field:

{
    "subject": "TEST-value",
    "version": 1,
    "id": 83,
    "schema": 
	  "{\"type\":\"record\",
		\"name\":\"Test\",
		\"namespace\":\"Namespace.Avro\",
		\"fields\":[{
			\"name\":\"Field1\",
			\"type\":[\"null\",\"int\"],\"default\":null
		}],
		\"connect.name\":\"Namespace.Avro.Test\"}"
}

Thanks

The issue with 2 schema versions being registered looks related to this: New schema version registered on INSERT VALUES · Issue #6091 · confluentinc/ksql · GitHub

Hi mikebin thank you very much for your reply!

Does KsqlDB currently support Avro unions with schema references?
Avro unions link: https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/

Is this the issue related to avro unions for KsqlDB?

Thanks in advance

ksqlDB currently does not have specific support for union schemas as described in your referenced blog post. However, you could define a “superset” schema which consists of all the possible fields across all schemas used in a topic. The Github issue you referenced is relevant, along with Support multi schema topics · Issue #1267 · confluentinc/ksql · GitHub.