S3 sink connector

Hi,
I’m using the s3 sink connector with the following configs:

    "partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
    "locale":"en",
    "path.format":"'date'=YYYY-MM-dd",
    "partition.duration.ms":"86400000",
    "rotate.interval.ms":"3600000",
    "timestamp.extractor":"RecordField",
    "timestamp.field": "dateCreated"

and also:
"flush.size": "100"

My issue is that in s3 I see files are keeping modifying in past partitions, this prevents me to run daily jobs (say at 00:00 UTC on the past partitions)

What might be the reason? how can I make sure all data is flushed at 00:00?

Hi @amir, i think you should use rotate.schedule.interval.ms instead of rotate.interval.ms for your scenario. This configuration is useful when you have to commit your data based on current server time. → https://docs.confluent.io/kafka-connect-s3-sink/current/configuration_options.html#connector

And you might also want to try DailyPartitioner class.
It’s basicly a TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd and partition.duration.ms=86400000. Or you can simply try changing your path.format on your current config. → https://docs.confluent.io/kafka-connect-s3-sink/current/index.html?ajs_aid=8812d94b-346b-4bd5-9db2-8ea86c1845d6&ajs_uid=225349#partitioning-records-into-s3-objects

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.