Ingest protobuf data from api endpoint and produce to Kafka topic as avro

Hi all!

I’m pretty new to Kafka and the world of streaming. I have a python app that gets data from an API endpoint that is serialized as protobuf. Currently, I am deserializing to JSON using google’s protobuf API (from google.protobuf.json_format import MessageToJson) and producing that to a Kafka topic. However, I want to produce this data as Avro. Essentially, I need to convert protobuf to avro somehow.

My first thought was to use Schema Registry, register the protobuf schema and the avro schema that I manually created from the protobuf schema. Then, use KafkaProtobufDeserializer with value.deserializer on the data I get from the API and serialize it using KafkaAvroSerializer. I’m not sure if this approach is possible; even if I can register both schemas in the schema registry (or create two registries), can KafkaAvroSerializer serialize data that wasn’t deserialized with KafkaAvroDeserializer? Can KafkaProtobufDeserializer deserialize data that was serialized with KafkaProtobufSerializer?

Looking online to find a solution, Kafka Connect kept popping up. Is Kafka Connect available for the python client? Can I use Kafka Connect to get data from an API endpoint (http source) and convert the data from protobuf to avro? Is it possible to create a custom kafka http source connector in Python?

Any help and suggestions are appreciated here! Thank you in advance :slight_smile:

Kafka only stores bytes, and serializer and deserializer must match; it won’t automatically convert formats for you. You’ve got your examples backwards; the deserializer cannot deserialize what wasn’t serialized with the serializer. Once you deserialize something into an in memory data structure, then you’re free to re-serialize to another format.

In other words, you’ll need to parse and convert your data within your Python app. The registry won’t help with that.

Kafka Connect is a Java server with its own REST API. It doesn’t accept Protobuf API requests, though you could write a source connector in a JVM language (instead of Python) to poll data from some external server using a Protobuf based client, but you’ll still have to convert to Avro, or more specifically, a Connect Struct class.

Alternatively, there’s no major benefit to having Avro, IMO. Keep data as Protobuf, then the Registry will help.

Hi! Thanks for your response and explaining everything.

I was definitely making this more complicated than I needed to. I didn’t think the registry would help, but Kafka Connect’s connectors seemed like they would work. I think I can parse the protobuf and use that or use a protobuf-to-avro package.

I just wanted to experiment with Avro; I appreciate how Avro handles schema.

appreciate how Avro handles schema

Protobuf schemas aren’t too dissimilar. If you’ve used a gRPC API, then that already will have a schema, and there’s no need to translate to Avro.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.