After testing Kafka Connect (source Connectors) on local VM’s (Windows 10 WSL) a client of mine is now ready to start the provisioning of Kafka in the cloud.
We will be starting with 3 Source Connectors (these will be self managed) and Kafka will need to be connected to a Spark environment on Azure HDInsight.
So the question will be:
- Do we go with Confluent Cloud with Azure VM’s running kafka Connect (in distributed mode) and connect Azure HDInsight spark to Confluent Cloud.
- Provision the Kafka environment (Workers, Zookeepers, etc) all through Azure HDInsight as we already need Azure for Self-Managed Connectors and Spark.
- Provision my own Confluent Cloud on Azure VM’s with Azure VM’s running kafka Connect in Disti mode.
The environment will have to reside on Azure and will not have a high data throughput load due to the nature of the data and provision cost needs to be managed closely (e.g. as cost effective as possible) as this is for a POC.
Connectors to be used are SFTP, HTTP Rest & SpoolDir.