I’m using a 100 Partition topic with 3 Replicas and 2 ISR in a MSK serverless cluster. Datagen is producing the data using the “Orders” quickstart and unlimited throughput. I’m consuming this data using a Confluent S3 Sink connector.
My EC2 instance running the sink connector ingests 3.73GB/min data from my MSK cluster in 15 minutes and uploads only 2.5GB/min data to S3 in the same time frame. All the instance’s resources are underutilized and I’m using a S3 endpoint in my VPC. I measured the time taken to upload 2.5GB worth of files to s3 using the aws cli and it came out to be 0m18.036s , which is way faster than what my s3 connector process is managing.
Following is my S3 sink config-
All other settings have been left as default.
I’m using m5.4Xlarge instance to run this connector in standalone mode. CPU utilization is 40%, Burst balance is 99.1%, Network IN and Network OUT work are well under the EC2 limits and the heap size for my connect process is 56GB
did you try
also check the docs regarding flush size etc
Hi Michael, Thanks for replying. I have not touched s3.part.size and it’s currently set at it’s default value of 25MB. My flush size is 50000 and size of each message is 160 bytes . The size of each object uploaded in my S3 bucket is 7.6MB, which matches exactly with the product of my flush size and size of each message, 50000*160 bytes=7.6MB.
Even if i assume that nearly 7MB of messages remain unflushed for each of my 100 partitions at the time I record the stats, it would be 7*100=700MB across the 100 partitions and wouldn’t explain the drift between ingress and egress for the connector.
As for the docs for flush.size, i found these lines to be pretty relevant:
flush.size specifies the number of records per partition the connector needs to write before completing a multipart upload to S3.
So, in my case, it would be ~700mb of messages being upload in 25 MB parts to S3.
I feel like lowering the flush.size from 50k would improve my upload speed but didn’t find anything explicit in the docs that would back up this conjecture.
my understanding is also that lowering the flush.size could speed up the upload.
will test it by myself, might take some time