Tasks and Partitions Rebalancing Mechanism


We are in the process of building a custom sink connector using Kafka Connect with the exactly-once delivery semantics feature. Some aspect of the tasks and partitions rebalancing can break the exactly-once delivery, here is our questions:

  1. Data Flush: When a rebalance occurs, is the data flushed from the current collections/buffers of the tasks? Specifically, how does the rebalance impact the state of records that have been ingested by the connector but not yet persisted to the sink?
  2. Offset Commitment: How are offsets managed during this rebalancing process? Is there a possibility of offsets being committed even if the data hasn’t been flushed to the sink, especially considering our target of exactly-once delivery?
  3. Best Practices: For those who have developed sink connectors aiming for exactly-once semantics, are there any best practices or considerations when handling rebalances?

Thanks for your help,