Currently I have a distributed worker running on linux and sending json to ES using the ES connector.
If I wanted to make this highly available I would create another linux server running the distributed worker. Now if i run the distributed worker on both machines with an ES connector set up for both when I send the message both workers will send the same message to ES and ES will delete documents that are the same.
Is this how it is expected to work or is there a configuration that needs to be done?
It sounds like you’ve not formed the workers into a Kafka Connect group. When configured correctly the work is split across tasks which execute on one, or the other, worker - not both (unless there is more than one task).
I have seem to get a lot further now. I now see the leader URL of the IP of one of the workers in both of the workers logs.
When I send data though it does not duplicate but it seems like it only send to one of the workers.
When I use this rest.advertised.host.name in the worker config files I am just putting the IP address of the linux machine. I assume that the leaderURL is just picked as a master.
In a Kafka Connect sink, the tasks are essentially consumer threads and receive partitions to read from. If you have 10 partitions and have tasks.max set to 5, each task will receive 2 partitions to read from and track the offsets. If you have configured tasks.max to a number above the partition count Connect will launch a number of tasks equal to the partitions of the topics it’s reading.
If you change the partition count of the topic you’ll have to relaunch your connect task, if tasks.max is still greater than the partition count, Connect will start that many tasks.
If there are multiple connect workers the tasks will attempt to be distributed across the workers.
(adapted from a SO answer I made a few years back)
Hi,
when running workers in Distributed mode, GROUP_ID: 4 is important one, it should match to the first worker and is required in order to determine the Cluster that the worker will be part off.