Hi,
I’m using the s3 sink connector with the following configs:
"partitioner.class":"io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
"locale":"en",
"path.format":"'date'=YYYY-MM-dd",
"partition.duration.ms":"86400000",
"rotate.interval.ms":"3600000",
"timestamp.extractor":"RecordField",
"timestamp.field": "dateCreated"
and also:
"flush.size": "100"
My issue is that in s3 I see files are keeping modifying in past partitions, this prevents me to run daily jobs (say at 00:00 UTC on the past partitions)
What might be the reason? how can I make sure all data is flushed at 00:00?
Hi @amir , i think you should use rotate.schedule.interval.ms
instead of rotate.interval.ms
for your scenario. This configuration is useful when you have to commit your data based on current server time. → https://docs.confluent.io/kafka-connect-s3-sink/current/configuration_options.html#connector
And you might also want to try DailyPartitioner class.
It’s basicly a TimeBasedPartitioner with path.format='year'=YYYY/'month'=MM/'day'=dd
and partition.duration.ms=86400000
. Or you can simply try changing your path.format
on your current config. → https://docs.confluent.io/kafka-connect-s3-sink/current/index.html?ajs_aid=8812d94b-346b-4bd5-9db2-8ea86c1845d6&ajs_uid=225349#partitioning-records-into-s3-objects
1 Like
system
Closed
10 June 2021 08:12
5
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.