Hello I am pretty new to kafka connect only about 2 weeks in of using it so bare with me.
Our environment is all in the AWS. So we have Kafka brokers and ES in AWS.
We are currently trying to send data from sql server to ES.
We have gotten all the data into a producer and now trying to stream the data to ES.
I am currently running in distributed mode with a ES sink connector.
Once I run that seems to start pushing data into ES pretty quickly but then kinda get stuck.
I am able to query the data that does come in though.
This is where it usually get stuck.
After a little time it will then throw this error below:
sending LeaveGroup request to coordinator b-2.*****.amazonaws.com:9092 due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
Not sure if its to much data being pushed through or something wrong with my configuration. Also not sure what config file where this max.poll.interval can be added to.
In a nutshell, what is happening is that time required to send each batch to ES is longer than the time between subsequent Kafka poll calls and the heartbeat does not arrive on time.
As itβs mentioned in the log message, you have to either increase max.poll.interval.ms or make smaller batches when sending data to ES. For this, you can specify any of them in your connector configuration. The property keys are, respectively:
That setting is not in the ES Sink Connector, but in the common consumer Connector configuration (you can imagine it like a parent class). More information here:
Anything to do with the consumer I need to put that in front.
so for > batch.size I need to use > consumer.batch.size. Is that correct?
Also, I am running this bin/connect-distributed -daemon. Is this the best way to run it in the background or is like a service better? How do users usually do this in production? If you do run it as a daemon whats the best way to stop the worker?
Thank you again you have been a really great help.
batch.size is not a consumer setting but a specific setting define by the ES Sink Connector, no consumer prefix is needed. More details here.
For production environments, itβs recommended to run Kafka Connect in distributed mode. You can do it in the background or as a service, up to you. To stop the process, send a kill signal to the connect process kill <pid>.
So I was able to add those new parameters and I do not get the error anymore. It seems like it works for the first 30 seconds and sends lots of data but then after that it seems to get stuck. I donβt see any error though. Only thing I see in the log is this.
Any idea what else can be done to try and get this data into ES. I set the batch to 100 but still doesnβt seem to work. Could it be something in my ES config file?
Does tasks.max effect anything in the connector?