I have a cluster of three nodes version 2.0.1 with the following specs cpu: 12 ram: 16
and yesterday we had a tragedy, we updated the service (increased the number of streams) that works with kafka, and after that we stopped receiving messages to our kafka, after viewing the kafka logs such messages came out
journalctl -u confluent-kafka -f
[2021-12-09 00: 52: 26,778] ERROR [ReplicaManager broker = 1] Error processing append operation on partition command.alphaPOST.ToleuGateway (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.UnknownProducerIdException: Found no record of producerId = 483486 on the broker. It is possible that the last message with the producerId = 483486 has been removed due to hitting the retention limit.
[2021-12-09 00: 52: 36,043] INFO [Partition command.alphaPOST.CrmGateway-1 broker = 1] Shrinking ISR from 1,3,2 to 1 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,054] INFO [Partition command.alphaPOST.NotificationCoreLowPriority-8 broker = 1] Shrinking ISR from 1,3,2 to 1 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,078] INFO [Partition user-manager-core-6f6644b6b-bdfhh.core.user-manager.alpha.response-0 broker = 1] Shrinking ISR from 1,3,2 to 1,2 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,087] INFO [Partition conductor-worker-d599bff85-tc57n.worker.alpha.response-0 broker = 1] Shrinking ISR from 1,3,2 to 1 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,108] INFO [Partition error.alpha.alpha.40035001-0 broker = 1] Shrinking ISR from 1,3,2 to 1 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,108] INFO [Partition command.alphaPOST.CrmGateway-1 broker = 1] Expanding ISR from 1 to 1,3 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,122] INFO [Partition error.alpha.alpha.40035001-0 broker = 1] Expanding ISR from 1 to 1.3 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,122] INFO [Partition command.alphaPOST.CrmGateway-1 broker = 1] Expanding ISR from 1,3 to 1,3,2 (kafka.cluster.Partition)
[2021-12-09 00: 52: 36,140] INFO [Partition error.alpha.alpha.40035001-0 broker = 1] Expanding ISR from 1,3 to 1,3,2 (kafka.cluster.Partition)
as you can see, the partitions are constantly getting larger or smaller, which apparently blocked writing to topics I saw that there is such a bug [KAFKA-4477] Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted. - ASF JIRA however, in the description of the bug there was an old version, and 2.01 which we are using is relatively newer Also on the dev environment, we use Vanilla ( not-confluent ) Kafka 2.6.0 and there, too, in the logs I saw similar messages
server.log.2021-09-16-22: [2021-09-16 23: 01: 05,321] INFO [Partition product-worker-dfc5444d8-75cbs.zeebe.worker.product.alpha.alpha.response-0 broker = 0] Shrinking ISR from 2,1,0 to 0. Leader: (highWatermark: 0, endOffset: 0). Out of sync replicas: (brokerId: 2, endOffset: -1) (brokerId: 1, endOffset: -1). (kafka.cluster.Partition)
server.log.2021-09-16-22: [2021-09-16 23: 01: 05,323] INFO [Partition -core-57c4c9df98-mhlcl.core.registry.alpha.alpha.response-0 broker = 0] Shrinking ISR from 2.1.0 to 2.0. Leader: (highWatermark: 0, endOffset: 0). Out of sync replicas: (brokerId: 1, endOffset: -1). (kafka.cluster.Partition)
that is, this bug is not resolved in version 2.6.0 or is there another reason?