Getting ready to deploy Kafka for POC

rvaneynd · 29 March 2021 15:47

After testing Kafka Connect (source Connectors) on local VM’s (Windows 10 WSL) a client of mine is now ready to start the provisioning of Kafka in the cloud.
We will be starting with 3 Source Connectors (these will be self managed) and Kafka will need to be connected to a Spark environment on Azure HDInsight.

So the question will be:

Do we go with Confluent Cloud with Azure VM’s running kafka Connect (in distributed mode) and connect Azure HDInsight spark to Confluent Cloud.
Provision the Kafka environment (Workers, Zookeepers, etc) all through Azure HDInsight as we already need Azure for Self-Managed Connectors and Spark.
Provision my own Confluent Cloud on Azure VM’s with Azure VM’s running kafka Connect in Disti mode.

The environment will have to reside on Azure and will not have a high data throughput load due to the nature of the data and provision cost needs to be managed closely (e.g. as cost effective as possible) as this is for a POC.
Connectors to be used are SFTP, HTTP Rest & SpoolDir.

Any advice?

Thank you,
Raoul

rmoff · 29 March 2021 19:13

Can you clarify the difference between options 1 and 3 on your list? In option 3 did you have in mind a self-managed Confluent deployment?

rvaneynd · 29 March 2021 20:47

Robin,

Thank you for the response.
Option 1 is using the current Confluent Cloud environment on Confluent.io where for me Option 3 was downloading the Confluent Community edition and install it on the customer Azure environment. But looking at the licensing structure, I do not believe this is possible. So, lets disregard. Option 3 completely.

I am currently reading your excellent blog Running a self-managed Kafka Connect worker for Confluent Cloud
This is basically my Option 1 and looks the most interesting, but due one of the connectors we will be using (SpoolDir CSV) and have pull data from a remote server using either sshfs or cifs we will not be able to use docker, I believe. Also I would like to run it as a distributed kafka connect cluster for my source connectors (I hope this makes sense).

Raoul

rmoff · 30 March 2021 08:04

OK, I understand better now. So you’ve got:

Use Confluent Cloud
Use HDInsights
Self-manage Confluent Platform

You’ll need to do your own evaluation of 1 vs 2 - this might help in terms of what to look at and evaluate.

In terms of deploying Kafka Connect as a distributed cluster you can do this using Docker if you want. It depends on your target runtime environment quite how you’d do it though.

rvaneynd · 30 March 2021 10:10

Robin,

Thank you. Interesting reading and it addresses my concerns regarding sizing the correct hardware and managing a Kafka Environment.

Raoul.

system · 12 April 2021 15:27

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Configuring connector from Confluent Hub with Confluent Cloud Confluent Cloud	4	3728	5 June 2021
Install plugin connector in confluent cloud Kafka Connect	3	3405	16 December 2021
New blog: Running a self-managed Kafka Connect worker for Confluent Cloud Kafka Connect	1	3255	9 February 2021
Azure Blob Storage Connectors compatible versions Managed Connectors	0	1222	19 October 2023
Kafka Connect in Confluent Cloud? Sure! Confluent Cloud	1	4279	17 November 2020

Getting ready to deploy Kafka for POC

Related topics