Protobuf schema validation surprise/questions

I’m testing out protobuf schemas / validation with schema registry for the first time. I’m using the protobuf serializer/deserializer and auto schema registration.

For a given topic, I first published a compound protobuf msg (UserHeartRate) which uses other msg types defined in the same .proto file as fields. Both the supporting msg definitions and the compound msg definition are defined at the top level of the file – the supporting msg definitions aren’t nested in the compound msg definition.

I then attempted to publish to the same topic one of the supporting msg types (User), and expected this to fail since it’s not compatible with the UserHeartRate msg schema. But it was serialized and published w/ no errors, seeming to indicate that any msg type defined in a .proto file will be acceptable per schema validation, vs. just the first msg type published. That was surprising and troubling as it doesn’t do what I expected.

So I figured maybe the supporting msg types need to be nested within the main, compound msg type to prevent this and allow only the compound type to be allowed.

So I did the same exercise with a new topic using a modified version of the .proto that only had UserHeartRate2 defined at the top level and had the supporting types nested. But again, I was able to publish a UserHeartRate2.User2 msg to the topic I’d just published a UserHeartRate2 to.

Why is this possible, and how does one “fix” this behavior to disallow publishing of supporting msg types to a topic where only the top-level msg type should be allowed?

Is my understanding correct that it’s the protobuf serializer that checks the msg compatibility with the schema?

Below are the 2 versions of the .proto file, which show up exactly like that when I view the topic schema in Control Center.

Thanks for your help.

syntax = "proto3";

package com.livongo.protobuf.kafka.internal_non_production.template;

import "livongo/protobuf/common/types/absoluteTimestamp.proto";

message Uuid {
    string value = 1;
}

message User {
    Uuid uuid = 1;
    string pid = 2;
    bool active = 3;
}

message DeviceAssignment {
    string sim_id = 1;
    Uuid user_uuid = 2;
}

message UserHeartRate {
    Uuid user_uuid = 1;
    uint32 rate = 2;
    com.livongo.protobuf.common.types.AbsoluteTimestamp reading_time = 3;
}
syntax = "proto3";

package com.livongo.protobuf.kafka.internal_non_production.template;

import "livongo/protobuf/common/types/absoluteTimestamp.proto";

message UserHeartRate2 {
    Uuid2 user_uuid = 1;
    uint32 rate = 2;
    com.livongo.protobuf.common.types.AbsoluteTimestamp reading_time = 3;

    message Uuid2 {
        string value = 1;
    }

    message User2 {
        Uuid2 uuid = 1;
        string pid = 2;
        bool active = 3;
    }

    message DeviceAssignment2 {
        string sim_id = 1;
        Uuid2 user_uuid = 2;
    }
}

The message used is actually also encoded in the bytes written to Kafka, so what you got is expected. If you want a 1-to-1 relation between the schema used and the message, you need to have only one message in your schema. For example by using references to bring in other messages, instead of including them all in the same schema.