Best practices for Increasing replication factor

FrVaBe · 8 October 2021 14:07

Hi,

we want to add a 4th Broker Node to a Apache Kafka Cluster 2.6 (not the Confluent Platform) and looking for the best practices to increase the replication factor of the current topics.
As far as I read here (Will topics still be available while their partitions are being reassigned?) it should be no problem to increase the replication factor whithout downtime for the clients. But I guess there are some best practices we should follow and I have a few questions about this:

The replicas configuration is ordered (example [2,3,1]). What does that mean about leader and followers? Is “2” always the leader after starting the cluster? Oder is it even always the leader when all repliacs are available (so gets elected back when a failed node rejoins the cluster)?
When I add the new broker 4 at the end of the replicas list, does that mean that that broker will very unlikely will become the leader of this partition ever?
Should I distribute the new broker equally distributed in the replica lists of my partitions (about 6000)?
Do I cause a stress on the cluster if I insert the new broker at a certain position (for example the first one; is the new replica then unnecessarily chosen to be the leader right after getting insync)?
Is ist advisable to put all partions (6000) in one reassignment json file and process them together or should I split that in smaller parts?
Any other things I should consider?

Sorry, that is a long list of questions, but I feel I should now about the answers and not assume how the cluster behaves. Of course we will test the procedure but this might not be possible with the full data load. Thus all your comments are really useful for me.

Thanks in advance
Franz

mmuehlbeyer · 8 October 2021 14:43

Hey,

just to be sure

which config are we talking about?

I guess the ISR replica list?

Michael

FrVaBe · 11 October 2021 12:42

That ist the list I get when I make a ‘kafka-topics describe’, so yes, I guess it is the ISR replica list.

mmuehlbeyer · 11 October 2021 13:03

would you share the output of the

kafka-topics describe

FrVaBe · 11 October 2021 19:52

There are about 6000 Partitions. I don’t mind to share an example but why (I am not in the office and would deliver this later)? We have added another node to a cluster with 3 nodes. The number of replicas of the existing topics is 3. The ISR list contains 3 entries. The order of the nodes is different for different topics.

We have topics with log compaction as well as without. What further insights that are relevant here can be expected from the describe?

mmuehlbeyer · 11 October 2021 21:00

Hi,

the describe was just out of curiosity and have a common point to start.
So if you already added the cluster you could use the partition reassignement tool to move some partitions to your new broker.

for reference:
https://kafka.apache.org/documentation/#basic_ops_cluster_expansion

partitions will be then moves to the new broker, after succesful replication of the partition and joining the in-sync-replica list an existing replication ( eg on broker 3) will be removed

which kafka/ confluent version are you running?

another possibility might be to increase the replication factor.

FrVaBe · 12 October 2021 10:20

Yeah, I will use the partition reassignement tool. And I am on Apache Kafka Version 2.6 and I am not using the confluent platform (as mentioned in my post).

The process of what I have to do is relatively clear to me, but I have not yet done it in this dimension. We will do tests on examples and test data, but I would like to estimate what the effects are in PROD. And how I specifically configure or distribute the new broker in the individual partitions. I would be very grateful for any advice on this. Feel free to have another look at my post.

mmuehlbeyer · 12 October 2021 14:57

did this sometimes on PROD systems with nearly no impact on the applications.

you could prevent impact on network side I with using the

–throttle
parameter during rebalance

in my experience it was easier to move whole topics to the new broker
So I would start moving the existing topics date like the following:

`kafka-reassign-partitions.sh --zookeeper $yourzk --topics-to-move-json-file yourtopic.json --broker-list “1,2,3,4” --generate

copy the output to a new file and execute
kafka-reassign-partitions.sh --zookeeper yourzk --reassignment-json-file new-assign.json --execute

cluster will start the reassignment and will only move some of the partitions to the new node.
new partition leader will be selected but it’s not necessarily the newly added broker

Topic		Replies	Views
Increasing Availability for a Kafka Cluster Architecture and Design	1	2154	12 September 2023
🎧 Using Kafka-Leader-Election to Improve Scalability and Performance News and Blogs	0	2628	12 January 2023
Will topics still be available while their partitions are being reassigned? Ops	3	4765	23 March 2021
Could Someone Give me Advice on Kafka Cluster Configuration for High Availability? Kafka Streams	0	422	6 May 2024
Larger Replication Factor : While Producing data Ops	3	3365	19 July 2021

Best practices for Increasing replication factor

Related topics