So, in my projects, something happens a lot : I write a Kafka Java producer, and I have to make sure all the important configuration is properly set.
This quick guide is intended as reference to make sure everything is covered!
bootstrap.servers : Kafka server URL. Put several URLs in there, 3 is recommended for production deployment.
key.serializer : mandatory even if you use null keys. The fully classified name of your serializer. See class implementing
org.apache.kafka.common.serialization. If you use avro, use io.confluent.kafka.serializers.KafkaAvroSerializer.
value.serializer : same as key serializer, but for values.
schema.registry.url: if you are using avro for serdes, URL of your schema-registry.
enable.idempotence : set this to true for minimum overhead, maximum data quality. Prevents duplicates and out of order messages at almost no cost. Prerequisite:
max.in.flight.requests.per.connection must be <=
5 (this is the default) and acks must be
linger.ms : how long messages accumulate before they are sent to the broker in batches. Test what works for you according to your message size and your desired latency is throughput.
batch.size : max size of each batch before they are sent. Batch will be sent if either the batch size is greater than batch.size, or the time elapsed since batch creation is greater than linger.ms. Test your setup to determine the optimal value.
compression.type : if you get poor performance during your tests, compression can help significantly - up to 40% of performance gain in my experience. This is recommended if you use a serialization format which is text and token heavy (XML or Json).