Java Client quick reference configuration

So, in my projects, something happens a lot : I write a Kafka Java producer, and I have to make sure all the important configuration is properly set.

This quick guide is intended as reference to make sure everything is covered!

bootstrap.servers : Kafka server URL. Put several URLs in there, 3 is recommended for production deployment.

key.serializer : mandatory even if you use null keys. The fully classified name of your serializer. See class implementing org.apache.kafka.common.serialization. If you use avro, use io.confluent.kafka.serializers.KafkaAvroSerializer.

value.serializer : same as key serializer, but for values.

schema.registry.url: if you are using avro for serdes, URL of your schema-registry.

partitioner.class : if you want to use fancy partitioning, set this property. See this great article to understand what are the differences between existing partitioners.

acks : set this to ‘all’ if you care about not losing your data. If you have a specific use case, consider other values. Set min.insync.replicas to 2 on the topic for acks to work properly.

retries : I always set this to Integer.MAX_VALUE, so 2147483647. The number of retries is then bound by delivery.timeout.ms.

enable.idempotence : set this to true for minimum overhead, maximum data quality. Prevents duplicates and out of order messages at almost no cost. Prerequisite: max.in.flight.requests.per.connection must be <= 5 (this is the default) and acks must be all.

linger.ms : how long messages accumulate before they are sent to the broker in batches. Test what works for you according to your message size and your desired latency is throughput.

batch.size : max size of each batch before they are sent. Batch will be sent if either the batch size is greater than batch.size, or the time elapsed since batch creation is greater than linger.ms. Test your setup to determine the optimal value.

compression.type : if you get poor performance during your tests, compression can help significantly - up to 40% of performance gain in my experience. This is recommended if you use a serialization format which is text and token heavy (XML or Json).

3 Likes

This is cool :slight_smile:

MAX_INT retries is now the default, so perhaps worth highlighting that users may want to tune delivery.timeout.ms (you mentioned it, but in the context of retries. It makes more sense the other way around.

1 Like

Thank you @gwenshap for your feedback. I’m working on the consumer version, and it will be interesting to know which settings are important for them. I’ll post them as an edit soon.