Continuing the discussion from Streams app with EOS gets stuck restoring after upgrade to 2.8:
Thanks I will try to do that…
We were able to reproduce it this morning again:
We force-deleted all 3 pod running kafka.
After that, one of the partition can’t be restored. (like reported in previous post)
For that partition, we noticed these logs on the broker
[2021-08-27 17:45:32,799] INFO [Transaction Marker Channel Manager 1002]: Couldn’t find leader endpoint for partitions Set(__consumer_offsets-11, command-expiry-store-changelog-9) while trying to send transaction markers for commands-processor-0_9, these partitions are likely deleted already and hence can be skipped (kafka.coordinator.transaction.TransactionMarkerChannelManager)
Then we stop the kstream app, and restarted kafka cleanly (with proper graceperiod)
Restarting the Kstream app another time,
we noticed this message on the app log:
2021-08-27 18:34:42,413 INFO [Consumer clientId=commands-processor-76602c87-f682-4648-859b-8fa9b6b937f3-StreamThread-1-consumer, groupId=commands-processor] The following partitions still have unstable offsets which are not cleared on the broker side: [commands-9], this could be either transactional offsets waiting for completion, or normal offsets waiting for replication after appending to local log [commands-processor-76602c87-f682-4648-859b-8fa9b6b937f3-StreamThread-1] (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
Is there any way to clean up that transaction?
KStream config :
StreamsConfig.RETRY_BACKOFF_MS_CONFIG, 2000
StreamsConfig.REPLICATION_FACTOR_CONFIG, 2
StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000
StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 24MB
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), “earliest”
StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE
producer.delivery.timeout.ms=120000
consumer.session.timeout.ms=30000
consumer.heartbeat.interval.ms=10000
consumer.max.poll.interval.ms=300000
num.stream.threads=1