S3 sink connector

Hi,
I’m using the s3 sink connector with the following configs:

    "partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
    "locale":"en",
    "path.format":"'date'=YYYY-MM-dd",
    "partition.duration.ms":"86400000",
    "rotate.interval.ms":"3600000",
    "timestamp.extractor":"RecordField",
    "timestamp.field": "dateCreated"

and also:
"flush.size": "100"

My issue is that in s3 I see files are keeping modifying in past partitions, this prevents me to run daily jobs (say at 00:00 UTC on the past partitions)

What might be the reason? how can I make sure all data is flushed at 00:00?

Hi @amir, i think you should use rotate.schedule.interval.ms instead of rotate.interval.ms for your scenario. This configuration is useful when you have to commit your data based on current server time. → Amazon S3 Sink Connector Configuration Properties | Confluent Documentation

And you might also want to try DailyPartitioner class.
It’s basicly a TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd and partition.duration.ms=86400000. Or you can simply try changing your path.format on your current config. → Amazon S3 Sink Connector for Confluent Platform | Confluent Documentation

1 Like

Hi batu,
tried to use rotate.schedule.interval.ms instead of rotate.interval.ms initially it looks like it enforces commits every 30min (the interval i choose), but when a day passed i still observe data modification in the partition of the part day :confused:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.