I’m NK, Solution Architect for Swedbank Sweden. I’m looking for a direct integration from Kafka to Delta Lake in Azure. Does anyone integrate this process? Looking something like the DeltaLake Sink connector. Confluent provides one but it supports only AWS cloud.
Unfortunately, as you said, there is currently not a connector available for Delta Lake in Azure Databricks.
There are a couple workarounds you can try:
The easiest way should be to create a pipeline in Databricks using Spark Streaming, that reads a topic from your Kafka cluster and writes its dataframe to a Delta Lake table. There is a complete example in this blog post.
One possibility I’m exploring is using the ADLS Sink connector to send events to files into the Azure Storage, and then ingest those files incrementally into Delta Lake using Databricks Auto Loader. Maybe this is not the most efficient way because you are dividing the process in two parts: the writing into the storage as a sort of staging and then the Delta integration.
There are also libraries outside of Databricks that allows you to send data from Kafka to Delta Lake that you can try. In this one, the Azure support is currently on development, but is based on delta-rs so it should be available soon.
Maybe someone here in the forum comes with a better alternative. Let me know what you think.