Hi again,
so we finally deployed cp-kafka to our production system and unfortunately we are experiencing some issues.
I am not sure how severe they are since things are working somewhat, but we do get frequent error messages regarding org.apache.kafka.common.errors.OutOfOrderSequenceException’s
Error on Leader
- [2025-02-05 12:34:28,863] ERROR [ReplicaManager broker=6] Error processing append operation on partition humio-ingest-4 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1007 at offset 15261130 in partition humio-ingest-4: 607459 (incoming seq. number), 606828 (current end sequence number)
- [2025-02-05 12:47:40,775] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1005 at offset 974182 in partition global-events-0: 9452 (incoming seq. number), 9457 (current end sequence number)
- [2025-02-05 12:51:30,088] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 0 at offset 975431 in partition global-events-0: 18605 (incoming seq. number), 18610 (current end sequence number)
- [2025-02-05 12:51:36,462] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975448 in partition global-events-0: 9965 (incoming seq. number), 9970 (current end sequence number)
- [2025-02-05 12:51:36,865] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975448 in partition global-events-0: 9965 (incoming seq. number), 9970 (current end sequence number)
- [2025-02-05 12:51:37,734] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975448 in partition global-events-0: 9965 (incoming seq. number), 9970 (current end sequence number)
- [2025-02-05 12:51:38,737] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975448 in partition global-events-0: 9965 (incoming seq. number), 9970 (current end sequence number)
- [2025-02-05 12:51:39,739] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975448 in partition global-events-0: 9965 (incoming seq. number), 9970 (current end sequence number)
- [2025-02-05 12:51:40,664] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975512 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:41,666] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975529 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:42,673] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975537 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:43,675] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975546 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:44,677] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975551 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:45,679] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975565 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:46,681] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975573 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:47,683] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975577 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:51:48,685] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 975580 in partition global-events-0: 9965 (incoming seq. number), 9971 (current end sequence number)
- [2025-02-05 12:59:46,574] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 978182 in partition global-events-0: 10373 (incoming seq. number), 10378 (current end sequence number)
- [2025-02-05 13:00:39,962] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1000 at offset 978387 in partition global-events-0: 16735 (incoming seq. number), 16740 (current end sequence number)
- [2025-02-05 13:03:37,506] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1000 at offset 979483 in partition global-events-0: 16883 (incoming seq. number), 16888 (current end sequence number)
- [2025-02-05 13:20:59,405] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1005 at offset 985473 in partition global-events-0: 11364 (incoming seq. number), 11369 (current end sequence number)
- [2025-02-05 13:31:19,588] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1005 at offset 989146 in partition global-events-0: 12075 (incoming seq. number), 12080 (current end sequence number)
- [2025-02-05 13:36:13,680] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1000 at offset 990859 in partition global-events-0: 18842 (incoming seq. number), 18847 (current end sequence number)
- [2025-02-05 13:40:44,089] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1006 at offset 992355 in partition global-events-0: 30213 (incoming seq. number), 30218 (current end sequence number)
- [2025-02-05 13:48:51,455] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 0 at offset 995193 in partition global-events-0: 22487 (incoming seq. number), 22492 (current end sequence number)
- [2025-02-05 13:53:13,274] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 996477 in partition global-events-0: 13900 (incoming seq. number), 13905 (current end sequence number)
- [2025-02-05 14:04:55,596] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1002 at offset 1000429 in partition global-events-0: 14642 (incoming seq. number), 14647 (current end sequence number)
- [2025-02-05 14:18:49,955] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1000 at offset 1005123 in partition global-events-0: 21299 (incoming seq. number), 21304 (current end sequence number)
- [2025-02-05 14:35:08,230] ERROR [ReplicaManager broker=6] Error processing append operation on partition global-events-0 (kafka.server.ReplicaManager)
- org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order sequence number for producer 1006 at offset 1010649 in partition global-events-0: 35137 (incoming seq. number), 35142 (current end sequence number)
Errors non Leader
===> Configuring …
Running in KRaft mode…
SSL is enabled.
===> Running preflight checks …
===> Check if /var/lib/kafka/data is writable …
===> Running in KRaft mode, skipping Zookeeper health check…
===> Using provided cluster id …
===> Launching …
===> Launching kafka …
[2025-02-04 01:47:11,591] ERROR [ReplicaManager broker=5] Error processing append operation on partition humio-ingest-1 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 2 at offset 2272665 in humio-ingest-1 is 72, which is smaller than the last seen epoch 74
[2025-02-04 01:47:11,706] ERROR [ReplicaManager broker=5] Error processing append operation on partition humio-ingest-18 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 2 at offset 5360966 in humio-ingest-18 is 72, which is smaller than the last seen epoch 74
[2025-02-04 01:47:11,709] ERROR [ReplicaManager broker=5] Error processing append operation on partition humio-ingest-8 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 2 at offset 1096921 in humio-ingest-8 is 72, which is smaller than the last seen epoch 74
[2025-02-04 01:47:11,710] ERROR [ReplicaManager broker=5] Error processing append operation on partition humio-ingest-21 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 2 at offset 2972471 in humio-ingest-21 is 72, which is smaller than the last seen epoch 74
[2025-02-04 01:47:11,710] ERROR [ReplicaManager broker=5] Error processing append operation on partition humio-ingest-11 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of producer 2 at offset 3943298 in humio-ingest-11 is 72, which is smaller than the last seen epoch 74
[2025-02-05 06:46:28,324] WARN [ReplicaFetcher replicaId=5, leaderId=6, fetcherId=0] Partition transientChatter-events-3 marked as failed (kafka.server.ReplicaFetcherThread)
[2025-02-05 06:46:28,352] WARN [ReplicaFetcher replicaId=5, leaderId=6, fetcherId=0] Partition transientChatter-events-8 marked as failed (kafka.server.ReplicaFetcherThread)
[2025-02-05 06:46:28,353] WARN [ReplicaFetcher replicaId=5, leaderId=6, fetcherId=0] Partition transientChatter-events-0 marked as failed (kafka.server.ReplicaFetcherThread)
[2025-02-05 06:46:28,353] WARN [ReplicaFetcher replicaId=5, leaderId=6, fetcherId=0] Partition global-events-0 marked as failed (kafka.server.ReplicaFetcherThread)
[2025-02-05 06:46:28,490] WARN [UnifiedLog partition=global-events-0, dir=/data/cpkafka-data] Non-monotonic update of high watermark from (offset=856339, segment=[840919:25808681]) to (offset=856337, segment=[-1:-1]) (kafka.log.UnifiedLog)
[2025-02-05 06:51:27,312] WARN [UnifiedLog partition=transientChatter-events-3, dir=/data/cpkafka-data] Non-monotonic update of high watermark from (offset=172818, segment=[96158:102485808]) to (offset=172817, segment=[-1:-1]) (kafka.log.UnifiedLog)
[2025-02-05 06:51:27,476] WARN [UnifiedLog partition=global-events-0, dir=/data/cpkafka-data] Non-monotonic update of high watermark from (offset=857967, segment=[840919:29013233]) to (offset=857965, segment=[-1:-1]) (kafka.log.UnifiedLog)
[2025-02-05 11:21:31,493] WARN [ReplicaFetcher replicaId=5, leaderId=4, fetcherId=0] Partition transientChatter-events-10 marked as failed (kafka.server.ReplicaFetcherThread)
Errors controller
- [2025-02-03 11:04:44,783] WARN [QuorumController id=3] Performing controller activation. The metadata log appears to be empty. Appending 1 bootstrap record(s) in metadata transaction at metadata.version 3.8-IV0 from bootstrap source ‘the binary bootstrap metadata file: /data/cpkafka-data/bootstrap.checkpoint’. Setting the ZK migration state to NONE since this is a de-novo KRaft cluster. (org.apache.kafka.controller.QuorumController)
- [2025-02-03 11:05:12,157] WARN [QuorumController id=3] Broker 4 registered with feature metadata.version that is unknown to the controller (org.apache.kafka.controller.ClusterControlManager)
- [2025-02-05 06:46:54,394] WARN [QuorumController id=3] Broker 6 registered with feature metadata.version that is unknown to the controller (org.apache.kafka.controller.ClusterControlManager)
- [2025-02-05 11:22:07,155] WARN [QuorumController id=3] Broker 4 registered with feature metadata.version that is unknown to the controller (org.apache.kafka.controller.ClusterControlManager)
The error in the title is what the client sees, and I assume it is correlated - ie. clients wants to push data, but its out of sequence and thus its not being processed.
The question is why do those messages out of sequence (every so often - I dont think its all of them, we process millions of data rows per day and only get a couple of hundred errors).
The client app (Logscale) is pointing to Kafka, but I can’t see any reason behind the errors, which is why I am here again
Our config is fairly trivial (ignoring podman and SELinux, but I checked those), the only options not related to communication, SSL and logging are
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 \
-e KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1 \
-e KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 \
from the defaults.
Topics and details regarding those are handled by the applications so I am not sure if any of that would cause any of the issues we see?
- I wonder why controller and broker on same release (“release”: “7.8.0-83”, ) would not handle metadata identically but I dont assume thats an issue.
- I have no idea if the client or ser broker is handling the offsets and is confusing them or if they get lost/dropped or whatever
I also checkted ntp/time settings, network package drops and everything else I could come with, to no avail…
Thanks,
cheers