How to enable validation of Avro default fields in Confluent Cloud Schema Registry?

I know how to enable validation of Avro defaults in Schema Registry on Confluent Platform.
You just need to set:

SCHEMA_REGISTRY_SCHEMA_PROVIDERS_AVRO_VALIDATE_DEFAULTS: 'true'

in your cp-schema-registry container (or make the equivalent change to its server config if running outside of containers).

But I can’t find a way to enable that in Confluent Cloud’s Schema Registry. Is it possible?

I notice that if one enters a schema via Confluent Cloud Console, the UI provides a “validate” button which will complain if the schema contains invalid default fields (for example, “default: null, type: string”). But if one publishes a message to Confluent Cloud using such a schema (generated offline), Schema Registry allows the schema. (And if you then try to view the messages in the corresponding topic via Confluent Cloud Console’s message viewer, they appear as raw bytes, evidently due to a deserialization error parsing the invalid default in the schema.)

So it appears the Confluent Cloud Console applies “avro validate default”, but Confluent Cloud Schema Registry does not. I would like to find a way to configure Confluent Cloud Schema Registry to validate avro defaults. Has anyone found a way to do this?

hey @Ben

did you check

Best,
Michael

That’s something else. “Broker-Side Schema Validation” means that when a producer publishes a message, the broker would check that the message references a schema.

ie. that’s for validating messages (the feature is confusingly named, it’s not validating any schemas, it’s validating messages to be sure they were formatted according to a schema).

I’m asking about validating default fields within Avro schemas.

sorry misunderstood at first sight

for some testing I’ve used http://avro.tarantool.org/ some time ago
as well as GitHub - leocalm/avro_validator: A pure python avro schema validator

I’m aware of many ways to validate Avro schemas, including their defaults. But I work at a large company and would like to enable Avro default field validation in Confluent Cloud’s Schema Registry so that no invalid schemas are allowed. That’s safer than relying on all developers always validating their schemas via offline tools before publishing. Stream Governance should include schema quality, and it could, if Confluent Cloud would offer a way to enable “avro.validate.defaults” in its Schema Registry.

p.s. It’s unfortunate that the Avro community gave so much importance to backwards compatibility that they made validating of defaults an opt-in feature. This issue could have been avoided completely if “avro.validate.defaults” were the norm everywhere.

Update: currently there is no way to enable this in Confluent Cloud.

One workaround is that you can invoke validation of default fields when posting to Confluent Cloud Schema Registry by passing an additional querystring flag:

subjects/[subject-name]/versions?normalize=true

But most clients don’t explicitly call Schema Registry. That’s done automatically by client libraries, so depending on your stack you’d need to make some custom patch to your client library, or pre-register your schema ahead of time. That workaround doesn’t solve our use case, which is a company with a large number of developers working with a range of tech stacks. We want the Confluent Cloud Schema Registry to validate default fields automatically, not only when an additional flag is passed.

There’s a Confluent feature request to add an option to configure Schema Registry to validate default fields in Avro: FF-11188 (if you too want this feature, please help lobby Confluent to prioritize implementing it).

Per discussion, Schema Registry is multi-tenant, even if your cluster is dedicated (a surprise to me), so simply updating the Schema Registry server config is apparently not an option due to its impact on other customers who have (and want?) schemas with invalid default fields.

I doubt many customers genuinely want schemas with invalid default fields (what would be the benefit?) and that this is another unintended consequence of the Avro community choosing to make Avro default field validation opt-in rather than opt-out.

I may be misunderstanding the functionality, but last I read the Avro spec, defaults are only applicable to the consumer/deserializer, and not used as part of compatibility validation when introduced by a serializer/producer-request?

In any case, another alternative workaround would be to run Schema Registry on your own, pointed at the same _schemas topic, but with a different group.id, then migrate your clients to that, with the server configs that you need.

Defaults are part of compatibility validation. If you have forwards compatibility for example and you remove a field, that’s allowed if the field was optional (had a default) and not if the field was required (link). It’s true that default fields (what to actually use as a default) are only relevant to a consumer, not a producer, but when a producer publishes a schema, that schema is evaluated for its overall fitness (whether it’s syntactically valid, whether it’s compatible, etc.), not just its fitness from the POV of a producer.

Agree that using Schema Registry on its own is a possible workaround, but then we’d need to 1) tell all developers in all our different products to ignore the online client code provided by Confluent Cloud and instead use our alternate schema registry, with its own authentication/RBAC/etc. which would need to be managed separately and 2) lose related Stream Governance features such as tags, business metadata, discoverability via search in Confluent Cloud, etc…

For now we’re simply urging all developers to validate default fields on their own before posting to Confluent Cloud Schema Registry. That’s better than nothing, but not as safe as having SR reject invalid schemas.

You can now turn on automatic normalization for schemas, which will cause defaults to validated as well. Use PUT /config and set the normalize flag to true. See Schema Registry API Reference | Confluent Documentation

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.