We are using debezium → s3 sink to replicate CDC to S3. This pipeline is working exactly as expected.
A few of the tables, and thus topics, are extremely low volume. It can easily be days between updates which means there’s always one lagging record waiting to be flushed to S3 with the rotate.interval.ms
configuration because it is waiting for another record to determine that a time window is complete and needs to be flushed.
After doing some reading, it looks like the setting I am looking for is rotate.schedule.interval.ms
which is intended to flush topics based on wall clock time rather than the timestamps in the messages. I have set this to be 3x rotate.interval.ms
where if a table/topic isn’t being continuously updated, after a while we flush what we have.
Having set that and restarting the connector, I don’t see any of the monitored dangling messages being flushed on the schedule interval. To verify the pipeline was working, I updated a row, saw the lag move to 2, flush on rotate.interval.ms
which dropped the lag back to 1, but that last message has not flushed after many schedule interval timeperiods.
Am I misunderstanding the usage of this parameter, are there other parameters that need to be set (timezone
is set) for this to work, or what else can I do to troubleshoot this?