We have developed a SourceConnector with 10 tasks reading changes from a database. Occasionally, poll() stops being called. The connector status still shows RUNNING. Where can we find why poll() is not being called?
Do you know if stop()
was called?
How about anything in the worker log? It may help to enable trace logging, e.g., to be able to see the trace / info / debug logging in here.
Seems to be due to [KAFKA-10792] Source tasks can block herder thread by hanging during stop - ASF JIRA
Do you know that stop
is hanging? (based on logging or other evidence)
What version of Kafka are you on? KAFKA-10792 appears to be fixed for recent versions. I see some stop
/ poll
semantics issue being discussed in KAFKA-15090
which might be related to what you’re seeing.
This is what the issue was with our connector Jira and I spoke with our developer since I posted this.
Our connector assumes that start() completes before stop() is called, but that doesn’t seem to be the case. Such that stop() can be called before start() has made connections to the couchbase cluster. Therefore the connections that stop() is supposed to close do not exist yet, so stop() just exists. Then start() creates the connections - but the connector is now stopped. And those connections just sit idle, with the couchbase server processes blocking on those connections. We do have a fix, and I am trying to (a) reproduce the problem that the customer had; and (b) demonstrate that it does not happen with the fix. KAFKA-10792 came up when the developer was searching the issue and they made a note of it - but they also said that even using versions where that is fixed, our connector could still have the start()/stop() issue.
Update: All sorted out an tested. Was able to reproduce the issue on our old connector and show that the new connector does not have that issue.