I’m experiencing an issue with Kafka MirrorMaker 2 in our active-active replication setup between two Kafka clusters. Our design configuration is setting 2 MM2 in each DC, and the replication in each MM2 should be unidirectional. If both MM2 works as expected, then we should get active active replication without problem. However after we start the 2 MM2. The replication seem to only be unidirectional and not bi-directional. I attached below property file for each MM2 cluster.
MM2 1
connect-mirror-maker.properties: |
# Kafka Connect for MirrorMaker 2
clusters=clusterA1,clusterB1
clusterA1.bootstrap.servers=my-kafka.kafka-replicate-1:9092
clusterB1.bootstrap.servers=my-kafka.kafka-replicate-2:9092
# Customize these values with the address of your target cluster
clusterA1.config.storage.replication.factor=3
clusterB1.config.storage.replication.factor=3
clusterA1.offset.storage.replication.factor=3
clusterB1.offset.storage.replication.factor=3
clusterA1.status.storage.replication.factor=3
clusterB1.status.storage.replication.factor=3
clusterB1->clusterA1.enabled=true
clusterA1->clusterB1.enabled=false
# MirrorMaker configuration. Default value for the following settings is 3
offset-syncs.topic.replication.factor=3
heartbeats.topic.replication.factor=3
checkpoints.topic.replication.factor=3
topics=.*
groups=.*
tasks.max=2
replication.factor=3
refresh.topics.enabled=true
sync.topic.configs.enabled=true
refresh.topics.interval.seconds=30
topics.blacklist=.[-.]internal, ..replica, __consumer_offsets,cluster...
groups.blacklist=console-consumer-., connect-., __.*
clusterB1->clusterA1.emit.heartbeats.enabled=true
clusterB1->clusterA1.emit.checkpoints.enabled=true
clusterA1->clusterB1.emit.heartbeats.enabled=false
clusterA1->clusterB1.emit.checkpoints.enabled=false
MM2 2
connect-mirror-maker.properties: |
# Kafka Connect for MirrorMaker 2
clusters=clusterA2,clusterB2
clusterA2.bootstrap.servers=my-kafka.kafka-replicate-1:9092
clusterB2.bootstrap.servers=my-kafka.kafka-replicate-2:9092
# Customize these values with the address of your target cluster
clusterA2.config.storage.replication.factor=3
clusterB2.config.storage.replication.factor=3
clusterA2.offset.storage.replication.factor=3
clusterB2.offset.storage.replication.factor=3
clusterA2.status.storage.replication.factor=3
clusterB2.status.storage.replication.factor=3
clusterB2->clusterA2.enabled=false
clusterA2->clusterB2.enabled=true
# MirrorMaker configuration. Default value for the following settings is 3
offset-syncs.topic.replication.factor=3
heartbeats.topic.replication.factor=3
checkpoints.topic.replication.factor=3
topics=.*
groups=.*
tasks.max=2
replication.factor=3
refresh.topics.enabled=true
sync.topic.configs.enabled=true
refresh.topics.interval.seconds=30
topics.blacklist=.[-.]internal, ..replica, __consumer_offsets,cluster...
groups.blacklist=console-consumer-., connect-., __.*
clusterB2->clusterA2.emit.heartbeats.enabled=false
clusterB2->clusterA2.emit.checkpoints.enabled=false
clusterA2->clusterB2.emit.heartbeats.enabled=true
clusterA2->clusterB2.emit.checkpoints.enabled=true
Our Kafka version is 3.5.1. For the MM2 we use confluentinc/cp-kafka-connect:7.5.1 docker image
The reason for the same kafka server but we provide two different cluster name is we think the problem due to those connect worker get put into the same replication group and might have some conflict. Because say if I bring up MM2 using profile 1 first and bring up MM2 using profile 2 second. The topic from cluster B can successfully replicate to cluster A but not vice versa.
We receive some error but not sure they are directly related to the replication problem, for example
-
[2023-11-13 22:15:05,206] INFO [Worker clientId=clusterA2->clusterB2, groupId=clusterA2-mm2] Group coordinator my-kafka-controller-1.my-kafka-controller-headless.kafka-replicate-2.svc.cluster.local:9092 (id: 2147483646 rack: null) is unavailable or invalid due to cause: error response NOT_COORDINATOR.
-
org.apache.kafka.connect.runtime.distributed.NotLeaderException: This worker is not able to communicate with the leader of the cluster, which is required for dynamically-reconfiguring connectors. If running MirrorMaker 2 in dedicated mode, consider enabling inter-worker communication via the ‘dedicated.mode.enable.internal.rest’ property.
Is there anyone who successfully setup MM2 active active replication using similar design like ours can maybe tell which configuration I did it wrong or miss? Thanks