Troubleshooting Debezium Connector: Intermittent Failures

Hi ,

Background: -
We’re running Debezium connector on Kafka Connect deployed as a statefulsets on EKS.

  • Kafka Connect running in distributed mode hosted in AWS EKS.

Problem: -

  • We’re intermittently encountering an issue where debezium connector not pushing events to the Kafka topic.

  • We are facing this issue once or twice in a week across all deployed environments.

Troubleshooting Steps Taken:
-There are no errors in the logs, which makes troubleshooting challenging.

  • The REST API “GET /connectors?expand=status” also shows connector & task in Running Status.
  • Connector didn’t resume upon restarting connector “POST /connectors/(string:name)/restart”.

Observations: -
The connector resumes pushing the event to Kafka topic upon restarting the Kafka Connect EKS pods .

Versions:
Debezium version : 2.5.0.Final
kafka and Kafka connect version : 2.12.3.7.0
library we are using is : debezium-connector-mysql-2.5.0.Final.jar
for Database we are using AWS RDS
DBVersion: 8.0.mysql_aurora.3.05.2

We are using following config for Debezium-connector .

"name": "debezium-connector_${DEBEIZUM_SPRINT_NUM}", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "${DB_WRITER_URL}", "database.port": "${DB_PORT_NUMBER}", "database.user": "${DB_USER_NAME}", "database.password": "${DB_PASSWORD}", "database.ssl.mode": "required", "database.server.id": "${RANDOM_NUMBER}", "database.server.name": "${ENVIRONMENT}", "schema.history.internal.kafka.bootstrap.servers": "kafka.${HALF_URL}:9092", "schema.history.internal.kafka.topic": "dbhistory", "table.include.list": "", "column.include.list": "", "time.precision.mode": "connect", "include.schema.changes": "true", "decimal.handling.mode": "double", "bigint.unsigned.handling.mode":"long", "event.processing.failure.handling.mode": "fail", "inconsistent.schema.handling.mode":"warn", "heartbeat.interval.ms": 60000, "connect.timeout.ms": 60000, "tombstones.on.delete":true, "binary.handling.mode":"bytes", "skipped.operations": "t", "topic.prefix": "cdc_event_schema", "transforms": "route", "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.route.regex": "([^.]+)\\\\.([^.]+)\\\\.([^.]+)", "transforms.route.replacement": "\$2.\$3", "snapshot.mode": "schema_only" }

Needed Assistance: -

  • We’ve already checked the configurations for potential issues, but haven’t identified anything specific.
  • We’d appreciate any insights on how to diagnose the root cause.
  • Are there any known configuration issues or limitations that could cause this behavior?
  • Additionally, any advice on debugging strategies for Kafka Connect and Debezium connectors would be extremely.

-Observation by Debezeium Team-

he Debezium community reported that the log contained messages indicating the connector was functioning correctly, but Kafka Connect might be experiencing performance bottlenecks in writing to the broker. For further context and discussion, please refer to the Debezium community forum thread.

https://debezium.zulipchat.com/#narrow/stream/348104-community-mysql-mariadb/topic/Troubleshooting.20Debezium.20Connector.3A.20.20Intermittent.20Failures

Needed Assistance: -

  • We’ve already checked the configurations for potential issues, but haven’t identified anything specific.
  • We’d appreciate any insights on how to diagnose the root cause.
  • Are there any known configuration issues or limitations that could cause this behavior?
  • Additionally, any advice on debugging strategies for Kafka Connect and Debezium connectors would be extremely helpful.

Hello! We started to experience the same issue without doing any upgrades. Our deployment is rather old ( debezium 1.8 and kafka 2.7 ) and suddenly we’re getting almost daily failures. The symptoms are exactly the ones you have described. This has been working flawlessly for years.

Were you able to find a solution/cause to this issue?