GenericAvroSerde vs SpecificAvroSerde

Great thread including Andrew Almeida and @nbuesing on the difference between GenericAvroSerde and SpecificAvroSerde. I’ve had this same question in the past, so I figured this could be useful for posterity! Pulled from the Confluent Community Slack.


Andre Almeida Hi guys, having first steps in KafkaStreams (After doing a lot with ksql) and i have a noob question Whats the fundamental difference between GenericAvroSerde and SpecificAvroSerde. In which situation should each one be used?

Neil Buesing so this is all based on Avro have a generic record parser and a specific record parser. If you know your structure at the compile time of your kafka streams application, having POJOs that have been built from your avro definition will make the code you write in your application a lot cleaner. Less code like String foo = (String) record.get("field") ), stronger typed, but more coupled.Now if data structures are not known at compile time (e.g. ksqlDB for example) then you would need to use the generic record Avro, and then need the GenericAvroSerde.Also, If you are working with your kafka streams application with known data structures at the time you built your application, I strongly recommend using specific record parser; especially if you used logical types. Until recently GenericRecord would not apply the logical type converters. See pull 1762/63 as well.

DGS-785 Add logical type converter to Kafka avro serializer/deserializer (#17… by rayokota · Pull Request #1781 · confluentinc/schema-registry

Neil Buesing for 90% of the streams applications I have written where Avro was involved, I used specific-record Datum Reader and POJOs generated from gradle-avro-plugin. For applications that are configuration driven, I will use generic record. Along with the logical type concern — String types are also another concern Unless you know the magic avro setting GenericRecord will have strings as Utf8 objects (not Strings), but it does extend from Character Sequence.

GitHub - davidmc24/gradle-avro-plugin: A Gradle plugin to allow easily performing Java code generation for Apache Avro. It supports JSON schema declaration files, JSON protocol declaration files, and Avro IDL files.

Andre Almeida oh, great, i think now i understand how it works and the major difference between them.Thank you very much!!For my case, i think the best is, for now, Generic records because i want to read from topics where the schema is being managed by Kafka Conenctor itself (debezium)

Andre Almeida i really appreciate your awesome help!

Neil Buesing yes, generic would be very good for your use-case. Just keep in mind logical types. Decimals from the db can be tricky. Debezium allows you to say you want decimals as strings if that becomes problematic. I once did jdbc_source → generic → elastic sink and I told elastic since that the elastic index was pre-created so all decimals ended up as byte not converted from logical types…

Neil Buesing I am hopeful that the logical type support added to the Kafka Deserializer addresses that issue by applying them so GenericRecord actually has a BigDecimal (and not byte[ ]) but I haven’t tested that yet since that fix came after my involvement in that project.

Andre Almeida Yep, valid points you raised. But for now, I’m just trying to read from a topic and print it in the console, no luck so far, I got some deserializing errors, at least it’s what logs are telling me. But well, tomorrow it’s another day. Ty for you input

4 Likes

Fun fact, with Rust it’s almost the same, but in that case to get the ‘specific’ variant the user should issue an additional call, as we can’t use reflection there. Also the Rust Avro library had a bug so it didn’t work well with schema evolution, still trying to get the ‘specific’ variant if the schema had evolved. But a fix is almost merged, AVRO-3240: fix deserializer schema backward compatibility by ultrabug · Pull Request #1379 · apache/avro · GitHub. Also the Rust Avro library was recently moved to apache :raised_hands:.

1 Like