I apologize if this is against the rules but I posted a question in Github and am trying to understand the general process of using a Producer (GoLang confluent-kafka-go) and Schema Registry for protobuf use case.
I haven’t got a response yet from anyone on github, trying here to see if anyone can help. here is the GitHub issue with all the details:
opened 01:13AM - 09 Dec 22 UTC
Description
===========
Hi,
I am new to Kafka Connect and working with Prot… obuf serialization and also this entire process in general. My current task is to evaluate the data flow process between Kafka topics and inserts into TimescaleDB using JDBC sink connector with Kafka Schema Registry. I have almost everything up and running and am trying to test the E2E flow using a sample producer.
However, I had a few general questions on Kafka Protobuf Producer, an example is provided in this repo here: https://github.com/confluentinc/confluent-kafka-go/blob/5ba3caae52b04aaf7b4e22d5e30737b42ca948cd/schemaregistry/serde/protobuf/protobuf.go
I was hoping someone could explain to me a few things here:
In the referenced example above, only the schema URL is given. But let's say that you had multiple proto schema subjects available (example: `proto.testrecord` and `proto.anotherrecord`) in the Schema Registry. In this new example, we have the two schemas showing as:
`proto.testrecord`
```
message TestRecord {
string cluster_name = 1;
string id = 2;
string hostname = 3;
string metric = 4;
int64 value = 5;
string value_text = 6;
int64 timestamp = 7;
}
```
`proto.anotherrecord`
```
message AnotherRecord{
string source_name= 1;
string map_id= 2;
string hostname = 3;
string metric_group = 4;
int64 value = 5;
string value_text = 6;
int64 timestamp = 7;
}
```
Let's say you also have two topics and one (producer 1) should use the first schema subject and the second (producer 2) should use the second schema subject to validate/conform the data for inserts. Now let's you are creating a producer for first topic (producer 1) and it should use the `proto.testrecord` subject from the Schema Registry for serialization. How would you configure/tell the producer to use the correct schema subject and also the exact version (if multiple existed)? Or are not supposed to specify those due to the way the process works?
I noticed the repo example provided doesn't specify any of that information and I am trying to understand exactly how it knows or defaults to a certain subject and version.
According to this: https://github.com/confluentinc/confluent-kafka-go/blob/5ba3caae52b04aaf7b4e22d5e30737b42ca948cd/schemaregistry/serde/config.go#L36. Looks like you can specify the SchemaId (via `UseSchemaID` var) which tells it which schema subject to use but unsure about how to specify versioning.
I probably missed something during my reading and am not understanding things correctly and could use some help/discussion.
Any help is appreciated, thanks!
For the benefit of future readers of this post, here is an answer that was posted on Github.
opened 01:13AM - 09 Dec 22 UTC
Description
===========
Hi,
I am new to Kafka Connect and working with Prot… obuf serialization and also this entire process in general. My current task is to evaluate the data flow process between Kafka topics and inserts into TimescaleDB using JDBC sink connector with Kafka Schema Registry. I have almost everything up and running and am trying to test the E2E flow using a sample producer.
However, I had a few general questions on Kafka Protobuf Producer, an example is provided in this repo here: https://github.com/confluentinc/confluent-kafka-go/blob/5ba3caae52b04aaf7b4e22d5e30737b42ca948cd/schemaregistry/serde/protobuf/protobuf.go
I was hoping someone could explain to me a few things here:
In the referenced example above, only the schema URL is given. But let's say that you had multiple proto schema subjects available (example: `proto.testrecord` and `proto.anotherrecord`) in the Schema Registry. In this new example, we have the two schemas showing as:
`proto.testrecord`
```
message TestRecord {
string cluster_name = 1;
string id = 2;
string hostname = 3;
string metric = 4;
int64 value = 5;
string value_text = 6;
int64 timestamp = 7;
}
```
`proto.anotherrecord`
```
message AnotherRecord{
string source_name= 1;
string map_id= 2;
string hostname = 3;
string metric_group = 4;
int64 value = 5;
string value_text = 6;
int64 timestamp = 7;
}
```
Let's say you also have two topics and one (producer 1) should use the first schema subject and the second (producer 2) should use the second schema subject to validate/conform the data for inserts. Now let's you are creating a producer for first topic (producer 1) and it should use the `proto.testrecord` subject from the Schema Registry for serialization. How would you configure/tell the producer to use the correct schema subject and also the exact version (if multiple existed)? Or are not supposed to specify those due to the way the process works?
I noticed the repo example provided doesn't specify any of that information and I am trying to understand exactly how it knows or defaults to a certain subject and version.
According to this: https://github.com/confluentinc/confluent-kafka-go/blob/5ba3caae52b04aaf7b4e22d5e30737b42ca948cd/schemaregistry/serde/config.go#L36. Looks like you can specify the SchemaId (via `UseSchemaID` var) which tells it which schema subject to use but unsure about how to specify versioning.
I probably missed something during my reading and am not understanding things correctly and could use some help/discussion.
Any help is appreciated, thanks!