So, in my projects, something happens a lot : I write a Kafka Java producer, and I have to make sure all the important configuration is properly set.
This quick guide is intended as reference to make sure everything is covered!
bootstrap.servers
: Kafka server URL. Put several URLs in there, 3 is recommended for production deployment.
key.serializer
: mandatory even if you use null keys. The fully classified name of your serializer. See class implementing org.apache.kafka.common.serialization
. If you use avro, use io.confluent.kafka.serializers.KafkaAvroSerializer.
value.serializer
: same as key serializer, but for values.
schema.registry.url
: if you are using avro for serdes, URL of your schema-registry.
partitioner.class
: if you want to use fancy partitioning, set this property. See this great article to understand what are the differences between existing partitioners.
acks
: set this to ‘all
’ if you care about not losing your data. If you have a specific use case, consider other values. Set min.insync.replicas
to 2 on the topic for acks to work properly.
retries
: I always set this to Integer.MAX_VALUE
, so 2147483647
. The number of retries is then bound by delivery.timeout.ms
.
enable.idempotence
: set this to true for minimum overhead, maximum data quality. Prevents duplicates and out of order messages at almost no cost. Prerequisite: max.in.flight.requests.per.connection
must be <= 5
(this is the default) and acks must be all
.
linger.ms
: how long messages accumulate before they are sent to the broker in batches. Test what works for you according to your message size and your desired latency is throughput.
batch.size
: max size of each batch before they are sent. Batch will be sent if either the batch size is greater than batch.size, or the time elapsed since batch creation is greater than linger.ms. Test your setup to determine the optimal value.
compression.type
: if you get poor performance during your tests, compression can help significantly - up to 40% of performance gain in my experience. This is recommended if you use a serialization format which is text and token heavy (XML or Json).