Confluent Kafka to Delta Lake in Azure Platform

Hi

I’m NK, Solution Architect for Swedbank Sweden. I’m looking for a direct integration from Kafka to Delta Lake in Azure. Does anyone integrate this process? Looking something like the DeltaLake Sink connector. Confluent provides one but it supports only AWS cloud.

Thanks
NK

Hi Narasimha,

Unfortunately, as you said, there is currently not a connector available for Delta Lake in Azure Databricks.

There are a couple workarounds you can try:

The easiest way should be to create a pipeline in Databricks using Spark Streaming, that reads a topic from your Kafka cluster and writes its dataframe to a Delta Lake table. There is a complete example in this blog post.

One possibility I’m exploring is using the ADLS Sink connector to send events to files into the Azure Storage, and then ingest those files incrementally into Delta Lake using Databricks Auto Loader. Maybe this is not the most efficient way because you are dividing the process in two parts: the writing into the storage as a sort of staging and then the Delta integration.

There are also libraries outside of Databricks that allows you to send data from Kafka to Delta Lake that you can try. In this one, the Azure support is currently on development, but is based on delta-rs so it should be available soon.

Maybe someone here in the forum comes with a better alternative. Let me know what you think.

Hi Kuro

Thank you for the reply… Yes… we thought of Spark Streaming - Kafka but it involves hosting application outside the Kafka network. Trying to use Kafka components as much as possible.

AutoLoader is not an option… Because it’s a stream process on batch ADLS files.

Thanks
NK

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.