I have a Kafka Connect cluster running the Confluent s3 Sink Connector, configured for EOS.
Our current flush.size
is set to a rather high number which was ideal for reading in historical data. After having caught up to the current upstream, this high value obviously causes high latency in transferring the records.
I don’t see a reason why - all other things being equal, changing the flush.size
should invalidate EOS. It perhaps wouldn’t be “exactly-once” in the meta-sense that the files would not look exactly the same if I were to restart from the beginning (including clearing outputs, for the sake of argument), but I don’t see a reason why the actual file content would be impacted in this case. However, with cases such as these the devil is often in the details…
I do have some ways to check this, but it would entail a lot more work in identifying and cleaning up the duplicates since there are no automatic processes in place at the moment (under the verified assumption that the connector provides EOS).
Am I wrong? Would changing (only) flush.size
invalidate EOS?