I am using the kafka-producer-perf-test.sh to do some performance testing on my end. My topics are hosted locally (using docker) and also on Confluent Cloud. I am trying to compare the difference in this set (mostly about the network latency and the side effects it has on the subsequent metrics).
Oddly, I am running into a problem I was not expecting - in regards to batch size for each request being sent.
My set up is as follows (in both cases)
- topic with 3 partitions
- batch.size = 16384 (default batch size)
On running the performance test, I see this difference in the batch size
batch-size-avg : 46,539.622 (bytes) - roughly batch size * 3 (no of partitions)
batch-size-avg : 15,555.225 (bytes) - roughly batch size * 1 (1/3 rd no of partitions)
What could be the reason in the decreased throughput for Confluent Cloud? It looks like batching is happening for only 1 partition for a request (instead of 3 partitions as seen for the local cluster).
I do have to mention - in both the cases, the records are being generated with the null keys. Below is the script used to run the tests
Local Cluster /path/to/kafka-producer-perf-test.sh \ --topic perf_test_1_replica_3_partition \ --num-records 100000 \ --record-size 1024 \ --throughput -1 \ --producer-props \ bootstrap.servers=localhost:9092 \ acks=1 \ max.in.flight.requests.per.connection=1 \ batch.size=16384 \ --print-metrics Confluent Cluster /path/to/kafka-producer-perf-test.sh \ --topic perf_test_3_replica_3_partition \ --num-records 100000 \ --record-size 1024 \ --throughput -1 \ --producer.config /path/to//producer.config \ --producer-props bootstrap.servers=***-*****.centralus.azure.confluent.cloud:9092 \ acks=1 \ max.in.flight.requests.per.connection=1 \ batch.size=16384 \ --print-metrics