How to enable validation of Avro default fields in Confluent Cloud Schema Registry?

Ben · 23 May 2023 12:34

I know how to enable validation of Avro defaults in Schema Registry on Confluent Platform.
You just need to set:

SCHEMA_REGISTRY_SCHEMA_PROVIDERS_AVRO_VALIDATE_DEFAULTS: 'true'

in your cp-schema-registry container (or make the equivalent change to its server config if running outside of containers).

But I can’t find a way to enable that in Confluent Cloud’s Schema Registry. Is it possible?

I notice that if one enters a schema via Confluent Cloud Console, the UI provides a “validate” button which will complain if the schema contains invalid default fields (for example, “default: null, type: string”). But if one publishes a message to Confluent Cloud using such a schema (generated offline), Schema Registry allows the schema. (And if you then try to view the messages in the corresponding topic via Confluent Cloud Console’s message viewer, they appear as raw bytes, evidently due to a deserialization error parsing the invalid default in the schema.)

So it appears the Confluent Cloud Console applies “avro validate default”, but Confluent Cloud Schema Registry does not. I would like to find a way to configure Confluent Cloud Schema Registry to validate avro defaults. Has anyone found a way to do this?

mmuehlbeyer · 23 May 2023 13:54

hey @Ben

did you check

Best,
Michael

Ben · 23 May 2023 14:21

That’s something else. “Broker-Side Schema Validation” means that when a producer publishes a message, the broker would check that the message references a schema.

ie. that’s for validating messages (the feature is confusingly named, it’s not validating any schemas, it’s validating messages to be sure they were formatted according to a schema).

I’m asking about validating default fields within Avro schemas.

mmuehlbeyer · 23 May 2023 14:54

sorry misunderstood at first sight

for some testing I’ve used http://avro.tarantool.org/ some time ago
as well as GitHub - leocalm/avro_validator: A pure python avro schema validator

Ben · 23 May 2023 18:49

I’m aware of many ways to validate Avro schemas, including their defaults. But I work at a large company and would like to enable Avro default field validation in Confluent Cloud’s Schema Registry so that no invalid schemas are allowed. That’s safer than relying on all developers always validating their schemas via offline tools before publishing. Stream Governance should include schema quality, and it could, if Confluent Cloud would offer a way to enable “avro.validate.defaults” in its Schema Registry.

p.s. It’s unfortunate that the Avro community gave so much importance to backwards compatibility that they made validating of defaults an opt-in feature. This issue could have been avoided completely if “avro.validate.defaults” were the norm everywhere.

Ben · 6 June 2023 11:39

Update: currently there is no way to enable this in Confluent Cloud.

One workaround is that you can invoke validation of default fields when posting to Confluent Cloud Schema Registry by passing an additional querystring flag:

subjects/[subject-name]/versions?normalize=true

But most clients don’t explicitly call Schema Registry. That’s done automatically by client libraries, so depending on your stack you’d need to make some custom patch to your client library, or pre-register your schema ahead of time. That workaround doesn’t solve our use case, which is a company with a large number of developers working with a range of tech stacks. We want the Confluent Cloud Schema Registry to validate default fields automatically, not only when an additional flag is passed.

There’s a Confluent feature request to add an option to configure Schema Registry to validate default fields in Avro: FF-11188 (if you too want this feature, please help lobby Confluent to prioritize implementing it).

Per discussion, Schema Registry is multi-tenant, even if your cluster is dedicated (a surprise to me), so simply updating the Schema Registry server config is apparently not an option due to its impact on other customers who have (and want?) schemas with invalid default fields.

I doubt many customers genuinely want schemas with invalid default fields (what would be the benefit?) and that this is another unintended consequence of the Avro community choosing to make Avro default field validation opt-in rather than opt-out.

OneCricketeer · 6 June 2023 19:47

I may be misunderstanding the functionality, but last I read the Avro spec, defaults are only applicable to the consumer/deserializer, and not used as part of compatibility validation when introduced by a serializer/producer-request?

OneCricketeer · 6 June 2023 19:50

In any case, another alternative workaround would be to run Schema Registry on your own, pointed at the same _schemas topic, but with a different group.id, then migrate your clients to that, with the server configs that you need.

Ben · 7 June 2023 10:54

Defaults are part of compatibility validation. If you have forwards compatibility for example and you remove a field, that’s allowed if the field was optional (had a default) and not if the field was required (link). It’s true that default fields (what to actually use as a default) are only relevant to a consumer, not a producer, but when a producer publishes a schema, that schema is evaluated for its overall fitness (whether it’s syntactically valid, whether it’s compatible, etc.), not just its fitness from the POV of a producer.

Agree that using Schema Registry on its own is a possible workaround, but then we’d need to 1) tell all developers in all our different products to ignore the online client code provided by Confluent Cloud and instead use our alternate schema registry, with its own authentication/RBAC/etc. which would need to be managed separately and 2) lose related Stream Governance features such as tags, business metadata, discoverability via search in Confluent Cloud, etc…

For now we’re simply urging all developers to validate default fields on their own before posting to Confluent Cloud Schema Registry. That’s better than nothing, but not as safe as having SR reject invalid schemas.

ryokota · 30 August 2023 15:45

You can now turn on automatic normalization for schemas, which will cause defaults to validated as well. Use PUT /config and set the normalize flag to true. See Schema Registry API Reference | Confluent Documentation

system · 6 September 2023 15:45

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Avro schema issues Schema Registry	6	7594	10 February 2021
Schema validation - can't produce a record when validation is switched on: record rejected by the record interceptor Schema Registry	2	4946	22 June 2023
Issue Adding Schema Registry to Control Center Schema Registry	2	3641	31 January 2025
Confluent Schema Registry Schema Registry	0	2627	16 January 2023
Kafka-avro-console-producer error Clients	3	5191	19 April 2022

How to enable validation of Avro default fields in Confluent Cloud Schema Registry?

Related topics