Debezium SQL Server v2: Best practices for handling Schema Namespace fragmentation across multiple DB shards (SpecificRecord issue)

Hi everyone,

I am looking for advice on how to handle Avro schema namespaces when ingesting data from multiple identical SQL Server databases (sharding scenario) using the FullyManaged SQL Server Connector v2.

The Scenario: We have N databases with identical schemas (e.g., ShardedDB_01, ShardedDB_02…) writing to the same Kafka topics. In our legacy pipeline (Connector v1), the default behavior produced a uniform namespace for all records, regardless of the physical source database. This allowed our Java Consumers to simply use SpecificRecord with a single generated avsc class for all incoming messages, without any custom configuration.

The Problem with v2: The Debezium v2 connector now generates schemas where the database name is hardcoded into the namespace by default:

  • Source A: my.prefix.ShardedDB_01.dbo.MyTable
  • Source B: my.prefix.ShardedDB_02.dbo.MyTable

Even though the fields are identical, Schema Registry treats these as completely different schemas/namespaces.

The Impact: Because of this fragmentation, we cannot use the standard SpecificDeserializer anymore. The consumer expects a specific class (e.g., com.mycompany.avro.MyTable), but receives records with schemas pointing to dynamic, DB-specific namespaces. We are forced to fallback to GenericRecord, losing type safety, which is a significant regression from our v1 experience.

My Questions:

  1. Source Side (Normalization): Is there a native Debezium v2 configuration or a standard SMT pattern to exclude the Database Name from the namespace (restoring the v1-like uniform behavior: my.prefix.dbo.MyTable)?
  • Note: We attempted a recursive custom SMT to rewrite the namespace, but traversing Debezium’s complex, deep structures (before/after structs) proved to be too memory-intensive, causing OOM errors on high-throughput workers.
  1. Consumer Side: If normalization at the source is not possible, how do you handle SpecificRecord deserialization in a sharded v2 scenario? Is there a standard way to map multiple writer schemas to a single reader schema without maintaining N duplicate POJOs?

Any help to restore the “uniform schema” capability would be greatly appreciated.
Thanks!

Hi @229178

are you looking for something like this:

hth,

michael

Hi Michael, thanks for the link.

However, SetSchemaMetadata appears to support only static schema name definitions.

Since I am ingesting from N different tables from X different DB, I cannot hardcode a single static value; I need the schema names to be generated dynamically
Additionally, SetSchemaMetadata is shallow and does not fix the nested before/after namespace issues, leaving the internal structures incompatible for SpecificRecord.