We are in the process of building a custom sink connector using Kafka Connect with the exactly-once delivery semantics feature. Some aspect of the tasks and partitions rebalancing can break the exactly-once delivery, here is our questions:
- Data Flush: When a rebalance occurs, is the data flushed from the current collections/buffers of the tasks? Specifically, how does the rebalance impact the state of records that have been ingested by the connector but not yet persisted to the sink?
- Offset Commitment: How are offsets managed during this rebalancing process? Is there a possibility of offsets being committed even if the data hasn’t been flushed to the sink, especially considering our target of exactly-once delivery?
- Best Practices: For those who have developed sink connectors aiming for exactly-once semantics, are there any best practices or considerations when handling rebalances?
Thanks for your help,