Can a source connector update its offset without producing records?

Hi,
This is the same question stated in this topic, but the original was never answered and is now locked.

I have a source connector that, at times, needs to skip large volumes of data, so it produces no records during large periods of time. If the connector task restarts before it finally produces a new record (and updates the offset), it will lose all the progress and will have to reprocess all the skipped data. Are there ways for the offset to be updated even when no record is produced? In case this is not natively supported, are there common strategies that people use to work around the current design?

1 Like

Hi. Iā€™m the author of the forum post you linked to. As a workaround, we publish a tiny synthetic record to a ā€œblack holeā€ topic. The only purpose of the record is to tell the Kafka Connect framework about the source offset associated with the otherwise ignored event. For more details, see the documentation for the Couchbase Kafka connector setting for this option.

We would also love to hear what other people are doing, and whether there have been any developments on the Kafka side to address [KAFKA-3821] Allow Kafka Connect source tasks to produce offset without writing to topics - ASF JIRA

Thanks,
David

Hey David,

Thank you very much for responding! After I read your post, I briefly looked at the Couchbase connector to see if I could figure out if you had solved the issue and how, but I couldnā€™t find it. Letā€™s see if we get more suggestions, but Iā€™ll probably follow your approach :slight_smile:

Cheers,
Joaquim

Hello.

One idea: did you try to create your own custom SMT and return null from ā€˜applyā€™ method to materialize you want to skip the current Record ?

https://docs.confluent.io/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/transforms/Transformation.html

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.