Hi all,
We have replication solution using Kafka Connect Data is being read using Debezium MS SQL connector into multiple topics, and then written to PostgreSQL using JDBC sink connector. Each topic/table has dedicated sink. Using “pk.mode”: “record_value” , which generates UPSERTs. Topic has 10 partitions. There are 2 workers in Kafka Connect cluster.
For now we are using “tasks.max”: “1” in the sink, which means that all writes for given table are done by single worker in single thread.
As JBDC connector supports multiple threads, should we be using more than 2 task in this scenario? Are there any best practices or recommendations in this matter?
I see a couple of potential issues here:
- Parallel writes might cause locking on DB side. I saw at least one issue when deadlocks occurred using multiple tasks (Multiple tasks in kafka-connect-jdbc sink causing deadlock · Issue #385 · confluentinc/kafka-connect-jdbc · GitHub)
- Can we ensure that changes are applied in the same order? For example, we have a backlog CDC changes for database record id =1:
id=1, value='A'
id=1, value='B'
id=1, value='C'
Is is possible that these messages will be distributed across multiple workers, and then UPSERT’s applied in the wrong order?
Any thoughts or suggestions?
Thanks