Hi Folks,
We have a simple single node setup to replicate data from an SQL server to Snowflake using debezium source and snowflake sink connectors. Ocassionally the sink connectors failed throwing an out of memory error. These could be easily resolved by restarting the connectors, however we decided to increase the heap size using KAFKA_HEAP_OPTS=“-Xms256M -Xmx16G”.
The kafka connect container was re-created when heap size was increased, everything ran fine for a few hours and then the connect worker became unhealthy. Logs show that pretty much all tasks are failing with out of memory errors. Restarting the connect worker helps for about 10 minutes and we start seeing the same behavior all over.
In addition to connectors failing, we noticed the following errors in the logs.
ERROR Unexpected exception in Thread[KafkaBasedLog Work Thread - ConnectConfigs,5,main] (org.apache.kafka.connect.util.KafkaBasedLog)
java.lang.OutOfMemoryError: Java heap spaceERROR Unexpected exception in Thread[KafkaBasedLog Work Thread - ConnectOffsets,5,main] (org.apache.kafka.connect.util.KafkaBasedLog)
java.lang.OutOfMemoryError: Java heap spaceWARN Could not stop task (org.apache.kafka.connect.runtime.WorkerSourceTask)
java.lang.OutOfMemoryError: Java heap space
Exception in thread “KafkaBasedLog Work Thread - ConnectStatus” java.lang.OutOfMemoryError: Java heap spaceUncaught exception in herder work thread, exiting: (org.apache.kafka.connect.runtime.distributed.DistributedHerder) java.lang.OutOfMemoryError: Java heap space
Connectors failing due to heap space issues was something we could easily handle but now since increasing the heap space the connect worker itself is failing. Restarting and increasing or decreasing heap size is not helping at all. REST API is unresponsive when the connect worker becomes unhealthy.
We are in a POC stage working with 300+ tables, things were going running great with a daily load of 40 million events until we made this change.
confluent version: 6.2.0
connect-01:
container_name: connect-01
image: confluentinc/cp-kafka-connect:${KAFKA_VERSION}
networks:
- pipeline
restart: unless-stopped
depends_on:
- kafka
- zookeeper
ports:
- 8083:8083
environment:
CONNECT_GROUP_ID: 1
CONNECT_REST_PORT: 8083
CONNECT_BOOTSTRAP_SERVERS: ‘kafka:9092’
CONNECT_REST_ADVERTISED_HOST_NAME: connect-01
CONNECT_STATUS_STORAGE_TOPIC: ConnectStatus
CONNECT_OFFSET_STORAGE_TOPIC: ConnectOffsets
CONNECT_CONFIG_STORAGE_TOPIC: ConnectConfigs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_KEY_CONVERTER: ‘org.apache.kafka.connect.json.JsonConverter’
CONNECT_INTERNAL_VALUE_CONVERTER: ‘org.apache.kafka.connect.json.JsonConverter’
CONNECT_PLUGIN_PATH: /usr/share/java,/etc/kafka-connect/jars/
KAFKA_HEAP_OPTS: ${CONNECT_HEAP_SETTINGS}
volumes:
- $PWD/plugins/debezium/$DBZ_VERSION:/etc/kafka-connect/jars/debezium
- $PWD/plugins/snowflake/$SF_VERSION:/etc/kafka-connect/jars/snowflake
I would highly appreciate any advise on resolving this issue. Thanks for your time.
-Shiva