Does changing flush.size of the Confluent s3 sink invalidate EOS?

filpano · 22 December 2022 13:11

I have a Kafka Connect cluster running the Confluent s3 Sink Connector, configured for EOS.

Our current flush.size is set to a rather high number which was ideal for reading in historical data. After having caught up to the current upstream, this high value obviously causes high latency in transferring the records.

I don’t see a reason why - all other things being equal, changing the flush.size should invalidate EOS. It perhaps wouldn’t be “exactly-once” in the meta-sense that the files would not look exactly the same if I were to restart from the beginning (including clearing outputs, for the sake of argument), but I don’t see a reason why the actual file content would be impacted in this case. However, with cases such as these the devil is often in the details…

I do have some ways to check this, but it would entail a lot more work in identifying and cleaning up the duplicates since there are no automatic processes in place at the moment (under the verified assumption that the connector provides EOS).

Am I wrong? Would changing (only) flush.size invalidate EOS?

OneCricketeer · 22 December 2022 19:57

if I were to restart from the beginning

But that has nothing to do with the flush.size. If you reset the consumer group offset and start writing new files, of course you’ll end up with duplicate records in the bucket.

filpano · 22 December 2022 20:23

Sorry, perhaps I wasn’t clear in that line of thought - “restart from the beginning” also meant clearing existing s3 outputs in that thought experiment. That’s why I specifically referred to the file content. Otherwise, yes, I agree.

Topic		Replies	Views
Improve S3 sink connector upload speed to S3 bucket Self-Managed Connectors	3	3763	31 October 2022
S3 Sink Connector - Small File Creation Issue Managed Connectors	0	2040	7 September 2023
S3 sink connector Kafka Connect	2	4199	10 June 2021
How to Make GCS Sink Connector Faster or Update Offfset Confluent Cloud	0	788	15 February 2024
S3 Sink Connector: "exactly once" when late data is possible Self-Managed Connectors	1	3391	14 March 2022

Does changing flush.size of the Confluent s3 sink invalidate EOS?

Related topics