🎧 Capacity Planning Your Apache Kafka Cluster

alice.richardson · 30 August 2022 07:19

There’s a new Streaming Audio episode - check it out!

How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance?

When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.

Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup. A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.

Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).

Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.

Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.

Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective.

EPISODE LINKS

Listen to the episode

Topic		Replies	Views
Recording ready to view: SPEAKER Q&A THREAD: 17 November 2022 - Sizing Apache Kafka® Clusters Events	0	3428	9 November 2022
🎧 Scaling Apache Kafka Clusters on Confluent Cloud ft. Ajit Yagaty and Aashish Kohli News and Blogs	0	2813	11 May 2022
What is the best configuration for apache kafka/confluent kafka if i use kafka connect? Kafka Connect	5	2607	21 October 2022
Best practices / requirements for kafka connect Kafka Connect	2	3985	27 November 2021
🎧 Handling 2 Million Apache Kafka Messages Per Second at Honeycomb News and Blogs	0	2926	15 March 2022

🎧 Capacity Planning Your Apache Kafka Cluster

Related topics