Hi everyone,
I am looking for advice on how to handle Avro schema namespaces when ingesting data from multiple identical SQL Server databases (sharding scenario) using the FullyManaged SQL Server Connector v2.
The Scenario: We have N databases with identical schemas (e.g., ShardedDB_01, ShardedDB_02…) writing to the same Kafka topics. In our legacy pipeline (Connector v1), the default behavior produced a uniform namespace for all records, regardless of the physical source database. This allowed our Java Consumers to simply use SpecificRecord with a single generated avsc class for all incoming messages, without any custom configuration.
The Problem with v2: The Debezium v2 connector now generates schemas where the database name is hardcoded into the namespace by default:
- Source A:
my.prefix.ShardedDB_01.dbo.MyTable - Source B:
my.prefix.ShardedDB_02.dbo.MyTable
Even though the fields are identical, Schema Registry treats these as completely different schemas/namespaces.
The Impact: Because of this fragmentation, we cannot use the standard SpecificDeserializer anymore. The consumer expects a specific class (e.g., com.mycompany.avro.MyTable), but receives records with schemas pointing to dynamic, DB-specific namespaces. We are forced to fallback to GenericRecord, losing type safety, which is a significant regression from our v1 experience.
My Questions:
- Source Side (Normalization): Is there a native Debezium v2 configuration or a standard SMT pattern to exclude the Database Name from the namespace (restoring the v1-like uniform behavior:
my.prefix.dbo.MyTable)?
- Note: We attempted a recursive custom SMT to rewrite the namespace, but traversing Debezium’s complex, deep structures (
before/afterstructs) proved to be too memory-intensive, causing OOM errors on high-throughput workers.
- Consumer Side: If normalization at the source is not possible, how do you handle
SpecificRecorddeserialization in a sharded v2 scenario? Is there a standard way to map multiple writer schemas to a single reader schema without maintaining N duplicate POJOs?
Any help to restore the “uniform schema” capability would be greatly appreciated.
Thanks!