We have long running JDBC source connectors connected to Oracle and seeing a pattern where after about 24 hrs the tasks are failing with “Closed connection” error.
Shouldn’t the connector detect a stale/dropped DB connection and attempt a retry?
Following is code where it is failing in connector sourcein TableQuerier.java:107
The JDBC source connector does detect when connections are stale/closed. All the code uses the following implementation for database connections:
As you can see on line 49 there is a method called getConnection() that is called throughout the whole code and it does a verification against the database regarding the health of the connection. Turn your log to DEBUG and look up for the following message:
Unable to check if the underlying connection is valid
If it appears in the logs then it may be your database that is not being able to check the connections given the SQL statement provided for tests.
Also, the connections are created depending on a logic that tries to search for a suitable dialect in the classloader based on the JDBC URL specified in the property connection.url. Double check if the JAR file that you have deployed on the Kafka Connect servers can handle connection checks appropriately. If you are in doubt about this you can always force the dialect by specifying it on the property dialect.name.
Finally, if your JDBC connections are long running as you say then the database itself might be rejecting new connection attempts because the pool limit has been exhausted. While increasing the pool limit might be tempting; what I think you should check is whether your Kafka Connect tasks could finish up their doings as quickly as possible as how the tasks are designed to be. Taking too much time to complete may indicate that the amount of records to be fetched from the database are too high given the infrastructure you are running for Kafka Connect. More servers may spread the tasks over the cluster and increase your capacity to complete the doings quickly.