Hi, we have a kafka streams application. We also have raw Kafka producer and consumers.
We are trying to detect two scenarios:
- When the cluster becomes unavailable for a prolonged duration, i.e when there is an outage.
- When some of the core topics which were previously created, go missing.
We need to handle both the cases in a different way and need some way to distinguish them.
Based on the current observations, it seems that Kafka is designed to be resilient to broker failures and keeps retrying internally. In such cases, the streams does not throw any exception.
Also, when the topics do not exist, we get a TimeoutException in kafka producer, but it does not specify that exception was due to broker not available or the topics are missing. The description is there in the exception message, but we don’t want to rely on the text, rather require a different exception type.
There was UnknownTopicOrPartitionException being thrown in older versions, but that does not seem to be the case in latest versions.
Any help in understanding existing patterns to handle such scenarios would be helpful.