Streams app with EOS gets stuck restoring after upgrade to 2.8

fmethot · 27 August 2021 18:44

Continuing the discussion from Streams app with EOS gets stuck restoring after upgrade to 2.8:

Thanks I will try to do that…
We were able to reproduce it this morning again:
We force-deleted all 3 pod running kafka.
After that, one of the partition can’t be restored. (like reported in previous post)
For that partition, we noticed these logs on the broker

[2021-08-27 17:45:32,799] INFO [Transaction Marker Channel Manager 1002]: Couldn’t find leader endpoint for partitions Set(__consumer_offsets-11, command-expiry-store-changelog-9) while trying to send transaction markers for commands-processor-0_9, these partitions are likely deleted already and hence can be skipped (kafka.coordinator.transaction.TransactionMarkerChannelManager)

Then we stop the kstream app, and restarted kafka cleanly (with proper graceperiod)

Restarting the Kstream app another time,
we noticed this message on the app log:

2021-08-27 18:34:42,413 INFO [Consumer clientId=commands-processor-76602c87-f682-4648-859b-8fa9b6b937f3-StreamThread-1-consumer, groupId=commands-processor] The following partitions still have unstable offsets which are not cleared on the broker side: [commands-9], this could be either transactional offsets waiting for completion, or normal offsets waiting for replication after appending to local log [commands-processor-76602c87-f682-4648-859b-8fa9b6b937f3-StreamThread-1] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

Is there any way to clean up that transaction?

KStream config :
StreamsConfig.RETRY_BACKOFF_MS_CONFIG, 2000
StreamsConfig.REPLICATION_FACTOR_CONFIG, 2
StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000
StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 24MB
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), “earliest”
StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE
producer.delivery.timeout.ms=120000
consumer.session.timeout.ms=30000
consumer.heartbeat.interval.ms=10000
consumer.max.poll.interval.ms=300000
num.stream.threads=1

Topic		Replies	Views
Streams app with EOS gets stuck reading Global KTable input topic after upgrade to 2.7+ Kafka Streams	2	4415	23 August 2021
Changelog restoration with persistent storage Kafka Streams	3	167	16 December 2024
Raising error to the application since no reset policy is configured Kafka Streams	2	51	20 March 2025
State store may have migrated to another instance after a broker failure Kafka Streams	3	4278	24 August 2021
Problem with exactly-once semantics in kafka-streams Kafka Streams	4	64	11 August 2025

Streams app with EOS gets stuck restoring after upgrade to 2.8

Related topics