I am using the kafka-producer-perf-test.sh to do some performance testing on my end. My topics are hosted locally (using docker) and also on Confluent Cloud. I am trying to compare the difference in this set (mostly about the network latency and the side effects it has on the subsequent metrics).
Oddly, I am running into a problem I was not expecting - in regards to batch size for each request being sent.
My set up is as follows (in both cases)
- topic with 3 partitions
- ack=1
- max.in.flight.requests.per.connection=1
- batch.size = 16384 (default batch size)
On running the performance test, I see this difference in the batch size
Local Cluster
batch-size-avg : 46,539.622 (bytes) - roughly batch size * 3 (no of partitions)
Confluent Cluster
batch-size-avg : 15,555.225 (bytes) - roughly batch size * 1 (1/3 rd no of partitions)
What could be the reason in the decreased throughput for Confluent Cloud? It looks like batching is happening for only 1 partition for a request (instead of 3 partitions as seen for the local cluster).
I do have to mention - in both the cases, the records are being generated with the null keys. Below is the script used to run the tests
Local Cluster
/path/to/kafka-producer-perf-test.sh \
--topic perf_test_1_replica_3_partition \
--num-records 100000 \
--record-size 1024 \
--throughput -1 \
--producer-props \
bootstrap.servers=localhost:9092 \
acks=1 \
max.in.flight.requests.per.connection=1 \
batch.size=16384 \
--print-metrics
Confluent Cluster
/path/to/kafka-producer-perf-test.sh \
--topic perf_test_3_replica_3_partition \
--num-records 100000 \
--record-size 1024 \
--throughput -1 \
--producer.config /path/to//producer.config \
--producer-props bootstrap.servers=***-*****.centralus.azure.confluent.cloud:9092 \
acks=1 \
max.in.flight.requests.per.connection=1 \
batch.size=16384 \
--print-metrics
Thanks
Gautam