Issue with Kafka Connect (Debezium PostgreSQL) Replication Slots Becoming Inactive

Hello everyone,

I’m facing an issue while streaming data between different PostgreSQL databases using Kafka Connect with the Debezium PostgreSQL connector (version 7.5.0). The problem is related to replication slots becoming extended and eventually inactive, even though data streams successfully when the slots are in the extended state.


Environment Details:

  • Kafka Connect Version: 7.5.0
  • Connector Plugin: io.debezium.connector.postgresql.PostgresConnector
  • PostgreSQL Plugin: pgoutput
  • Database Configuration:
    • Multiple PostgreSQL databases hosted on a single VM node
    • Each source connector produces multiple Kafka topics

Observed Issue:

  • Replication slots transition to the extended state within a few hours.
  • After a couple of days, these slots turn inactive.
  • Despite this, data continues to stream while in the extended state.
  • I suspect this may be a result of inefficient offset management or WAL cleanup delays due to misconfiguration.

PostgreSQL Settings:

  • max_wal_size = 2GB
  • Using logical replication with pgoutput

Sample Connector Configuration:

{
“name”: “mdm_source_tables_connector”,
“config”: {
“connector.class”: “io.debezium.connector.postgresql.PostgresConnector”,
“database.hostname”: “xx.xx.xx.xxx”,
“database.port”: 5432,
“database.user”: “xxx”,
“database.password”: “xxxx”,
“database.dbname”: “mdmdb”,
“database.server.name”: “xx.xx.xx.xxx”,
“table.include.list”: “mdm.vas_master,mdm.product_master,…”,
“snapshot”: “incremental”,
“snapshot.mode”: “initial”,
“plugin.name”: “pgoutput”,
“slot.name”: “slot_mdm_source_tables_small_trg2”,
“publication.name”: “dbz_mdm_small”,
“publication.autocreate.mode”: “filtered”,
“topic.prefix”: “mdm”,
“database.history.kafka.bootstrap.servers”: “localhost:9092”,
“database.history.kafka.topic”: “schema-changes.mdm”,
“key.converter”: “io.confluent.connect.avro.AvroConverter”,
“key.converter.schema.registry.url”: “http://schema-registry:8081”,
“key.converter.enhanced.avro.schema.support”: true,
“value.converter”: “io.confluent.connect.avro.AvroConverter”,
“value.converter.schema.registry.url”: “http://schema-registry:8081”,
“value.converter.schemas.enable”: true,
“transforms”: “unwrap,removeFields”,
“transforms.unwrap.type”: “io.debezium.transforms.ExtractNewRecordState”,
“transforms.removeFields.type”: “org.apache.kafka.connect.transforms.ReplaceField$Value”,
“transforms.removeFields.blacklist”: “before,source”,
“include.schema.changes”: true,
“poll.interval.ms”: “1000”,
“offset.flush.interval.ms”: “5000”,
“heartbeat.interval.ms”: “5000”,
“max.batch.size”: “4096”,
“max.queue.size”: “16384”,
“slot.initialization.mode”: “minimal”,
“max.queue.size.in.bytes”: “10485760”,
“database.connection.timeout.ms”: “30000”
}
}


  • Starting the connector in always snapshot mode initially.
  • Switched to initial mode once topics start getting created.
  • Verified WAL size limits (max_wal_size = 2GB).
  • Monitoring shows the connector is in running state during extended state and is streaming data, but eventual after the slot goes to inactive state, the connector still shows as running but the data is not streaming to kafka topic.

Questions:

  1. Why are replication slots becoming inactive even though streaming works for a while?
  2. Is there any Debezium/Kafka Connect tuning required to avoid this state?
  3. How can I ensure continuous streaming without hitting WAL overflow or slot inactivity?
  4. Is the connector setup optimal for large table sets across multiple databases?

Any suggestions, best practices, or configuration improvements are greatly appreciated.

Thank you in advance!

hey @sasidhar welcome :slight_smile:

did you check the database logs and activity?

what does the following sql shows you:


SELECT
  slot_name,
  pg_size_pretty(
    pg_wal_lsn_diff(
      pg_current_wal_lsn(), restart_lsn)) AS retained_wal,
  active,
  restart_lsn FROM pg_replication_slots;

best,
michael

SELECT pid AS process_id,
  usename AS username,
  datname AS database_name,
  client_addr AS client_address,
  application_name,
  backend_start,
  state,
  state_change
FROM pg_stat_activity
WHERE usename IS NOT NULL;