requestTaskReconfiguration() task downtime

Hello,

I am wondering how I can dynamically reconfigure tasks without stopping & restarting the tasks (for the same connector) whose config does not change.

For context, I am developing a very simple custom connector that listens to many TCP sockets concurrently in a select() loop and forwards the binary data read off the socket to a Kafka topic.

I need to listen to ~1200 devices (sockets) concurrently (configs available in a database) and in a distributed way so naturally Connect lends itself to this problem quite well and works great for static config. With very simple config changes I can distribute the ~1200 sockets in many permutations: say 100 tasks with 12 sockets each or 300 tasks with 4 sockets each. This is so great!

My only problem is that occasionally we add a new device to the DB, remove a device from the DB, or the IP address for a device changes. I am currently reacting to these changes in a monitoring thread in my connector which calls requestTaskReconfiguration() and again all the devices are spread across tasks as expected. However, all tasks (for this connector) have to stop and be restarted when this happens which causes data loss. The loss is because these devices are “dumb” - there’s no concept of “partition” or “offset”; rather if a client is connected it sends data, if not it drops data. So unfortunately a task can’t just “pick up where it left off.”

If only 1 device changes IP address, I would like only the task listening to that device to be affected instead of stopping and starting all other tasks for this connector (which means closing and reopening all the other sockets). And ideally, all other sockets in the same task wouldn’t have to close and reopen either.

Or if a new device is added to our DB, ideally a single task will get new config that has the added device, but all other tasks whose config does not change should not be stopped.

I’ve read about Kafka’s incremental cooperative rebalancing which I thought would be the solution but, after further digging, it appears to only affect new workers joining or leaving (temporarily or permanently) the cluster and not task reconfiguration.

My current thought is to manually assign devices to logical groups (say in the DB or elsewhere - outside the scope of Connect & connector/task config) and have each group be the task’s config so each task is then responsible for monitoring changes to its group and it can keep unaffected sockets open when a change occurs. This is then outside the scope of the normal connector/task lifecycle.

Maybe a way to sum up what I’m looking for would be “incremental task reconfiguration.”

Any guidance would be greatly appreciated.

Thank you.