Understanding rotate.schedule.interval.ms

jlambert · 23 November 2022 23:29

We are using debezium → s3 sink to replicate CDC to S3. This pipeline is working exactly as expected.

A few of the tables, and thus topics, are extremely low volume. It can easily be days between updates which means there’s always one lagging record waiting to be flushed to S3 with the rotate.interval.ms configuration because it is waiting for another record to determine that a time window is complete and needs to be flushed.

After doing some reading, it looks like the setting I am looking for is rotate.schedule.interval.ms which is intended to flush topics based on wall clock time rather than the timestamps in the messages. I have set this to be 3x rotate.interval.ms where if a table/topic isn’t being continuously updated, after a while we flush what we have.

Having set that and restarting the connector, I don’t see any of the monitored dangling messages being flushed on the schedule interval. To verify the pipeline was working, I updated a row, saw the lag move to 2, flush on rotate.interval.ms which dropped the lag back to 1, but that last message has not flushed after many schedule interval timeperiods.

Am I misunderstanding the usage of this parameter, are there other parameters that need to be set (timezone is set) for this to work, or what else can I do to troubleshoot this?

system · 23 December 2022 23:29

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Amazon S3 Sink Connector Self-Managed Connectors	0	1847	11 June 2023
S3 sink connector Kafka Connect	2	4218	10 June 2021
Too much Lag in CDC Pipeline (Confluent JDBC Sink Connector) Kafka Connect	1	1448	9 December 2023
S3 sink connector generates files twice in a day Managed Connectors	12	4014	30 June 2024
Exactly once - S3 Sink - documentation questions Kafka Connect	4	3861	18 June 2021

Understanding rotate.schedule.interval.ms

Related topics