Incresing the number of tasks for an S3 sink connector

shlomi · 30 March 2021 13:55

Hey all,

I have a source connector (debezium) that fetch data from Postgres into Kafka. In addition, I have a S3 sink that writes that data from Kafka into S3. When I create the S3 sink connector I set the number of tasks to 1.
Now I want to allow more tasks / threads / workers, in order to lower the lag from Kafka to S3, I have modified the connector, to have 17 tasks and I can see that Kafka connect list 17 tasks, but while looking at the logs it still looks like the first task (with id=0) it doing all the work.

Any ideas on how to fix it?

currently I’m running a single Docker container on a single EC2 instance, running both connectors (debezium and S3).

Thanks.

rmoff · 30 March 2021 14:31

How many partitions does the topic have? IIRC the connector can parallelise operation over multiple partitions only.

shlomi · 30 March 2021 14:50

That makes sense. I have 17 topics and each one has only 1 partition, so I thought that each task can process a different topic / partition (in my case).

rmoff · 30 March 2021 15:29

Yes, I would expect to see it do that. Can you share your connector config?

shlomi · 30 March 2021 17:41

sorry wrong thread, will update with the connector config later. Thanks

shlomi · 31 March 2021 07:09

{
"config": {
    "behavior.on.null.values": "ignore",
    "connector.class": "io.confluent.connect.s3.S3SinkConnector",
    "flush.size": "1000000",
    "format.class": "io.confluent.connect.s3.format.parquet.ParquetFormat",
    "locale": "US",
    "partition.duration.ms": "31556952000",
    "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
    "path.format": "'year'=YYYY/",
    "rotate.schedule.interval.ms": "180000",
    "s3.bucket.name": "s3_sink_data",
    "s3.region": "eu-west-1",
    "storage.class": "io.confluent.connect.s3.storage.S3Storage",
    "tasks.max": "17",
    "timezone": "UTC",
    "topics.dir": "db_data",
    "topics.regex": "db_data.public.*"
},
"name": "db-to-s3-sink"
}

could it be that since I created the connector with only 1 task all the existing partitions have been “assigned” to the first task?

system · 30 April 2021 07:09

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom kafka sink connector always runs with only one task Self-Managed Connectors	0	884	12 January 2024
S3 Sink Connector writes only 3 messages for each file Kafka Connect	4	3134	7 April 2022
What is the actual maximum value for max.tasks property? Kafka Connect	5	2557	30 October 2023
How to increase the partition of existing topic while using S3 sink connector Kafka Connect	11	3769	11 June 2021
More than 1 task on single table Kafka Connect	5	3726	16 November 2021

Incresing the number of tasks for an S3 sink connector

Related topics