Kafka Connect Elasticsearch sink stops sending records

robbyd · 11 February 2021 18:39

Hey guys another quick question that you can help me out with.
Everything seems to be running fine for the most part. If i put 1 million records into a kafka producer the ES connector will start picking up all the records and put them in ES.

For some reason the workers will just stop sending data after 200,000-300,000. I don’t see any errors in the logs. If I restart the workers they pick up again. Any idea what could cause this?

rmoff · 11 February 2021 19:38

What’s the status of the connector and tasks within it? You can check using the Kafka Connect REST API:

Exploring the Kafka Connect REST API - YouTube
Connect REST Interface | Confluent Documentation

robbyd · 11 February 2021 20:08

So I am using the curl commands. When I use curl 'localhost:8083/connectors/elastic-search-topic12/tasks' | jq I get below

[
  {
    "id": {
      "connector": "elastic-search-topic12",
      "task": 0
    },
    "config": {
      "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
      "type.name": "_doc",
      "topics": "topic1",
      "tasks.max": "2",
      "transforms.routeTS.type": "org.apache.kafka.connect.transforms.TimestampRouter",
      "transforms": "routeTS",
      "transforms.routeTS.topic.format": "${topic}-${timestamp}",
      "key.ignore": "true",
      "schema.ignore": "true",
      "transforms.routeTS.timestamp.format": "YYYY-MM-dd",
      "task.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkTask",
      "value.converter.schemas.enable": "false",
      "name": "elastic-search-topic12",
      "connection.url": "https://",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter",
      "key.converter": "org.apache.kafka.connect.storage.StringConverter"
    }
  },
  {
    "id": {
      "connector": "elastic-search-topic12",
      "task": 1
    },
    "config": {
      "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
      "type.name": "_doc",
      "topics": "topic1",
      "tasks.max": "2",
      "transforms.routeTS.type": "org.apache.kafka.connect.transforms.TimestampRouter",
      "transforms": "routeTS",
      "transforms.routeTS.topic.format": "${topic}-${timestamp}",
      "key.ignore": "true",
      "schema.ignore": "true",
      "transforms.routeTS.timestamp.format": "YYYY-MM-dd",
      "task.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkTask",
      "value.converter.schemas.enable": "false",
      "name": "elastic-search-topic12",
      "connection.url": "https://",
      "value.converter": "org.apache.kafka.connect.json.JsonConverter",
      "key.converter": "org.apache.kafka.connect.storage.StringConverter"
    }
  }
]

When I use curl 'localhost:8083/connectors/elastic-search-topic12/status' | jq
I get this below.

 {
   "error_code": 404,
   "message": "No status found for connector elastic-search-topic12"
 }

Any idea why I don’t see a status. Data does go through when I send it though. Am i missing something?

rmoff · 11 February 2021 21:44

How many distributed workers do you have running?

robbyd · 11 February 2021 22:00

So currently I have 2 running. Each one is running on its own linux machine. I have this added in both worker configs.

rest.advertised.host.name=172.31..
rest.advertised.port=8083

Each broadcasting its own IP. It seems to work fine from what I can see when I send messages through kafka but something could be wrong with my set up.

Thanks again for helping out.

rmoff · 11 February 2021 22:06

Can you share the two connect worker properties configuration files?

robbyd · 11 February 2021 22:35

Sure here is the first config file.

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
#bootstrap.servers=b-1.test-cluster.uxyzje.c2.kafka.us-east-1.amazonaws.com:9092,b-2.test-cluster.uxyzje.c2.kafka.us-east-1.amazonaws.com:9092
bootstrap.servers=b-1.****.kafka.us-east-1.amazonaws.com:9092,b-2.****.kafka.us-east-1.amazonaws.com:9092,b-3.****.kafka.us-east-1.amazonaws.com:9092,b-4.****.c8.kafka.us-east-1.amazonaws.com:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster2

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true	
value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=4
#offset.storage.partitions=25

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=connect-configs
config.storage.replication.factor=4

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=connect-status
status.storage.replication.factor=4
#status.storage.partitions=5

# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

# These are provided to inform the user about the presence of the REST host and port configs 
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
rest.host.name=
rest.port=8083

consumer.max.poll.interval.ms=3000000

# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
rest.advertised.host.name=172.31.133.111
rest.advertised.port=8083

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include 
# any combination of: 
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples: 
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/home/ubuntu/confluent-6.0.0/share/confluent-hub-components

Here is the second config file.

# A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.
#bootstrap.servers=b-1.test-cluster.uxyzje.c2.kafka.us-east-1.amazonaws.com:9092,b-2.test-cluster.uxyzje.c2.kafka.us-east-1.amazonaws.com:9092
bootstrap.servers=b-1.****.kafka.us-east-1.amazonaws.com:9092,b-2.****.kafka.us-east-1.amazonaws.com:9092,b-3.****.kafka.us-east-1.amazonaws.com:9092,b-4.****.c8.kafka.us-east-1.amazonaws.com:9092
# unique name for the cluster, used in forming the Connect cluster group. Note that this must not conflict with consumer group IDs
group.id=connect-cluster2

# The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true	
value.converter.schemas.enable=true

# Topic to use for storing offsets. This topic should have many partitions and be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
offset.storage.topic=connect-offsets
offset.storage.replication.factor=4
#offset.storage.partitions=25

# Topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated,
# and compacted topic. Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
config.storage.topic=connect-configs
config.storage.replication.factor=4

# Topic to use for storing statuses. This topic can have multiple partitions and should be replicated and compacted.
# Kafka Connect will attempt to create the topic automatically when needed, but you can always manually create
# the topic before starting Kafka Connect if a specific topic configuration is needed.
# Most users will want to use the built-in default replication factor of 3 or in some cases even specify a larger value.
# Since this means there must be at least as many brokers as the maximum replication factor used, we'd like to be able
# to run this example on a single-broker cluster and so here we instead set the replication factor to 1.
status.storage.topic=connect-status
status.storage.replication.factor=4
#status.storage.partitions=5

# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=10000

# These are provided to inform the user about the presence of the REST host and port configs 
# Hostname & Port for the REST API to listen on. If this is set, it will bind to the interface used to listen to requests.
rest.host.name=
rest.port=8083

consumer.max.poll.interval.ms=3000000

# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
rest.advertised.host.name=172.31.137.67
rest.advertised.port=8083

# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include 
# any combination of: 
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Examples: 
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/home/ubuntu/confluent-6.0.0/share/confluent-hub-components

rmoff · 11 February 2021 22:38

What’s the output of this?

curl -s http://localhost:8083/connectors?expand=info&expand=status

robbyd · 11 February 2021 23:33

Here are the results

{
    "elastic-search-topic12": {
        "info": {
            "name": "elastic-search-topic12",
            "config": {
                "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
                "type.name": "_doc",
                "topics": "topic1",
                "tasks.max": "2",
                "transforms.routeTS.type": "org.apache.kafka.connect.transforms.TimestampRouter",
                "transforms": "routeTS",
                "transforms.routeTS.topic.format": "${topic}-${timestamp}",
                "key.ignore": "true",
                "schema.ignore": "true",
                "transforms.routeTS.timestamp.format": "YYYY-MM-dd",
                "value.converter.schemas.enable": "false",
                "name": "elastic-search-topic12",
                "connection.url": "https://****",
                "value.converter": "org.apache.kafka.connect.json.JsonConverter",
                "key.converter": "org.apache.kafka.connect.storage.StringConverter"
            },
            "tasks": [{
                    "connector": "elastic-search-topic12",
                    "task": 0
                }, {
                    "connector": "elastic-search-topic12",
                    "task": 1
                }
            ],
            "type": "sink"
        }
    }
}

[1]+ Done curl -s http://localhost:8083/connectors?expand=info

rmoff · 12 February 2021 10:47

I’m a bit puzzled by the output because this is the kind of thing I get for http://localhost:8083/connectors?expand=info&expand=status, note the status object.

{
  "source-datagen-orders-us": {
    "info": {
      "name": "source-datagen-orders-us",
      "config": {
        "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector",
        "topic.creation.default.partitions": "6",
        "tasks.max": "1",
        "topic.creation.default.replication.factor": "1",
        "name": "source-datagen-orders-us",
        "kafka.topic": "orders",
        "schema.keyfield": "orderid",
        "schema.filename": "/data/orders_us.avsc"
      },
      "tasks": [
        {
          "connector": "source-datagen-orders-us",
          "task": 0
        }
      ],
      "type": "source"
    },
    "status": {
      "name": "source-datagen-orders-us",
      "connector": {
        "state": "RUNNING",
        "worker_id": "ksqldb:8083"
      },
      "tasks": [
        {
          "id": 0,
          "state": "RUNNING",
          "worker_id": "ksqldb:8083"
        }
      ],
      "type": "source"
    }
  }
}

From the looks of it you only ran curl -s http://localhost:8083/connectors?expand=info (maybe you got the & with a space before it?)

robbyd · 12 February 2021 13:06

So when I run that first it does not get formatted correctly. On the original is formatted it myself. Here is a picture of the command and results.

When you hit enter that last line shows up. Not sure why I can not see the status of the connector.
I have also added another connector since the last time I posted that.

rmoff · 12 February 2021 13:51

try quotation marks:

curl -s "http://localhost:8083/connectors?expand=info&expand=status"

robbyd · 12 February 2021 14:05

So I just see an empty set. Like there is nothing running. Shows { }.
Does this mean they are not running? Confused though cause data is being sent through and im not seeing any issue right now.

In my worker configs i have both of these enable is this correct? Just curious.

rest.host.name=
rest.port=8083

consumer.max.poll.interval.ms=3000000

# The Hostname & Port that will be given out to other workers to connect to i.e. URLs that are routable from other servers.
rest.advertised.host.name=172.31.137.67
rest.advertised.port=8083

robbyd · 12 February 2021 18:26

Something is really odd. If I create a test topic and send like 150 messages it goes in just fine but for some reason its just sending so much data. So Messages are in ES but my network output is huge and does not stop. I have deleted all other connectors and just using this test one.

See the output below.

Any idea what is causing this? When i kill the process it goes back down. If I restart its lower but nothing is being sent.

I can try maybe starting from scratch see if that helps.

Topic		Replies	Views
Kafka Connect Elasticsearch sink bandwidth Kafka Connect	4	3384	3 March 2021
Elasticsearch sink connector instance goes dormant and not consuming data Kafka Connect	1	3127	25 July 2021
Elasticsearch Sink Connector, unrecoverable exception Kafka Connect	4	7878	30 July 2021
Kafka connect to ES Kafka Connect	9	3427	8 February 2021
Elastic sink Connector "failed" status Kafka Connect	3	6585	4 September 2021

Kafka Connect Elasticsearch sink stops sending records

Related topics