Kafka Connect Worker guarantees

mohanr · 10 February 2022 15:04

Hi,
I was wondering if the Connect cluster can benefit from the Streams framework, Changelog topics and ‘Exactly-once’ delivery. Is there buffering in the worker cluster ?
I read the ‘backpressure’ section in Streams Architecture | Confluent Documentation
The 2 types of CDC mechanisms that could impact the worker cluster in our case are log-tailing and JDBC Query polls.
Thanks

sarwarbhuiyan · 24 February 2022 11:54

The connect framework has some in-memory buffering but only for the short batches it is consuming or producing to the topics. With source connectors, we can’t make any assumptions on whether the source system can be re-read in case of an error but if it can (say a SQL Database with a table that has an id and/or timestamp), the connect worker keeps the track of processed logical offsets in a connect-offsets topics. Sink connectors on the other hand just read as regular consumers from Kafka topics so they can try again. This way, for the most part you achieve at-least once semantics. There are some edge cases such as non-retriable source systems or permanent errors from sink systems which could require skipping and continuing.

The ideas from changelog topics in Streams might be interesting but would require a complete rethinking of the connect architecture. As well, in Kafka Streams you usually only deal with From Kafka To Kafka topologies and that can make use of transactions to ensure exactly once processing. As soon as you make external calls like REST APIs or SQL queries, we cannot make that guarantee anymore so we’re back to at least once processing (unless you skip over permanent exceptions).

I hope this gives some colour on this topic.

system · 26 March 2022 11:55

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Regarding Kafka-connect and a cross-data center configuration Kafka Connect	4	1379	25 October 2023
JDBC sink connector - multiple workers Kafka Connect	1	3154	13 November 2022
CDC using Debezium Kafka Connect	5	3587	23 March 2022
Increasing offset commit times during peak traffic Kafka Connect	1	3214	22 October 2021
Tasks and Partitions Rebalancing Mechanism Self-Managed Connectors	0	1226	19 October 2023

Kafka Connect Worker guarantees

Related topics