Confluent Cloud Schema Registry, JSON schemas, and Spark

ginjsta · 19 May 2022 03:21

Hey there!

I’m attempting to consume from a Confluent Cloud Kafka topic using spark and I would like to make use of the built-in schema registry where the schema has been created/registered for defining the schema. I’m running into a few hiccups trying to make this work. ( It should be noted I’ve scoured the internet for 6+ hours looking for answers which includes time spent in the forum, before I decided to post here )

Schema Registry Used: Yes, Confluent Cloud
Secured: Yes
Topic Schema Format: JSON schema
Consumer: Spark readstream ( in databricks )
References: consume-avro-data-from-kafka-topics-and-secured-schema-registry-with-databricks-confluent-cloud
Databricks Spark with Schema Registry

The above article assumes the topic messages are AVRO serialized with an AVRO schema. This is not my case, they are JSON serlialized with a JSON schema. Can I still make use of the schema registry for defining the schema in spark or will I need to create custom processes for retrieving the schema?

I’m happy to provide more context as needed in order to find an optimal solution.

Regards,
Brandon

OneCricketeer · 27 May 2022 14:01

You can always consume byte arrays from Kafka with Spark (which is the default, anyway) and wrap the JSONSchemaDeserializer in a Spark UDF which you’d invoke through a dataframe select function.

This blog shows Avro, yes, but at least the steps to extract and download the schema will be useful since the wire format for the Registry payloads are similar. I’m not sure how well Spark works with JSONSchema, however, so you might still need to manually define a Spark Structtype.

Topic		Replies	Views
Databricks Spark with Schema Registry Schema Registry	9	12008	29 December 2023
How to write data in SchemaRegistry format with sparkstructured? Schema Registry	0	302	12 June 2024
KafkaJsonSchemaSerializer unable to retrieve schema from Schema Registry Confluent Cloud	2	8653	8 June 2023
Confluent Schema Registry Schema Registry	0	2633	16 January 2023
Integrating Spark Structured Streaming with the Confluent Schema Registry Confluent Cloud	0	4056	18 June 2021

Confluent Cloud Schema Registry, JSON schemas, and Spark

Related topics