kuro
13 April 2022 07:23
1
Hey community,
We are working on deploying Kafka Connect in distributed mode hosted on AWS ECS with autoscaling via Fargate. To do this, we provide AWS ECR with a Dockerfile which contains our Connect instance configuration:
FROM confluentinc/cp-kafka-connect-base:latest
EXPOSE 8083
COPY ./plugins/ /opt/kafka/plugins/
ENV CONNECT_PLUGIN_PATH=/opt/kafka/plugins,/usr/share/java
ENV CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_CONFIG_STORAGE_TOPIC=connect-configs
ENV CONNECT_GROUP_ID=ibdata-mock-connect-cluster
ENV CONNECT_INTERNAL_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
ENV CONNECT_INTERNAL_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter
ENV CONNECT_KEY_CONVERTER=org.apache.kafka.connect.storage.StringConverter
ENV CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_OFFSET_STORAGE_TOPIC=connect-offsets
ENV CONNECT_REST_PORT=8083
ENV CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_STATUS_STORAGE_TOPIC=connect-status
ENV CONNECT_VALUE_CONVERTER=io.confluent.connect.avro.AvroConverter
ENV CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=xxx
ENV CONNECT_BOOTSTRAP_SERVERS=xxx
ENV AWS_ACCESS_KEY_ID=xxx
ENV AWS_SECRET_ACCESS_KEY=xxx
ENV CONNECT_REST_ADVERTISED_HOST_NAME=test
The xxx variables are going to be the same for all the workers in the cluster, but thats not the case with the CONNECT_REST_ADVERTISED_HOST_NAME. Currently, with ECS autoscaling, all the workers that will be deployed in the Connect Cluster will share the REST Host Name, and thats not working properly (thanks Robin Moffatt ).
Is there any way to dynamically configure this variable for each Dockerfile? Or maybe a workaround to make this work with the same advertised host name for all instances?
Thanks a lot in advance!
Brais
hey @kuro
welcome
are looking for something like this?
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: MIT-0
#!/bin/bash
set -x
JSON=$(curl ${ECS_CONTAINER_METADATA_URI}/task)
echo $JSON
TASK=$(echo $JSON | jq -r '.Containers[0].Networks[0].IPv4Addresses[0]')
echo $TASK
BROKER_ARN=$(aws servicediscovery discover-instances --region $REGION --namespace-name FargateWorkshopNamespace --service-name $MSK_SERVICE | jq -r '.Instances[0].Attributes.broker_arn')
BOOTSTRAP_SERVERS=$(aws kafka get-bootstrap-brokers --region $REGION --cluster-arn $BROKER_ARN | jq -r '.BootstrapBrokerStringTls')
CONNECT_REST_ADVERTISED_HOST_NAME=$TASK CONNECT_BOOTSTRAP_SERVERS=$BOOTSTRAP_SERVERS /etc/confluent/docker/run
best,
michael
kuro
13 April 2022 09:21
3
Hello @mmuehlbeyer ,
Thats exactly what we were looking for ! I’ve tested the configuration for our Connect scenario and it worked, the variable was asigned properly on deploy time.
Here is the final product:
set_env.sh
#!/bin/bash
set -x
JSON=$(curl ${ECS_CONTAINER_METADATA_URI}/task)
echo $JSON
TASK=$(echo $JSON | jq -r '.Containers[0].Networks[0].IPv4Addresses[0]')
echo $TASK
CONNECT_REST_ADVERTISED_HOST_NAME=$TASK /etc/confluent/docker/run
Dockerfile
FROM confluentinc/cp-kafka-connect-base:latest
EXPOSE 8083
COPY ./plugins/ /opt/kafka/plugins/
ENV CONNECT_PLUGIN_PATH=/opt/kafka/plugins,/usr/share/java
ENV CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_CONFIG_STORAGE_TOPIC=connect-configs
ENV CONNECT_GROUP_ID=ibdata-mock-connect-cluster
ENV CONNECT_INTERNAL_KEY_CONVERTER=org.apache.kafka.connect.json.JsonConverter
ENV CONNECT_INTERNAL_VALUE_CONVERTER=org.apache.kafka.connect.json.JsonConverter
ENV CONNECT_KEY_CONVERTER=org.apache.kafka.connect.storage.StringConverter
ENV CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_OFFSET_STORAGE_TOPIC=connect-offsets
ENV CONNECT_REST_PORT=8083
ENV CONNECT_STATUS_STORAGE_REPLICATION_FACTOR=1
ENV CONNECT_STATUS_STORAGE_TOPIC=connect-status
ENV CONNECT_VALUE_CONVERTER=io.confluent.connect.avro.AvroConverter
ENV CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL=xxx
ENV CONNECT_BOOTSTRAP_SERVERS=xxx
ENV AWS_ACCESS_KEY_ID=xxx
ENV AWS_SECRET_ACCESS_KEY=xxx
RUN wget -O /usr/local/bin/jq https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 && chmod +x /usr/local/bin/jq
COPY set_env.sh /etc/set_env.sh
CMD ["/etc/set_env.sh"]
I think it will be great to have the IP of the container as default value of the CONNECT_REST_ADVERTISED_HOST_NAME , in the same way as the port has the default “8083” here:
https://github.com/confluentinc/kafka-images/blob/4b8b751f0b0ea8a4473eedc1c5540a9e8fb9021c/kafka-connect-base/include/etc/confluent/docker/configure
Maybe with an aproach similar to what KAFKA_JMX_HOSTNAME is using in the launch file?
export KAFKA_JMX_HOSTNAME=${KAFKA_JMX_HOSTNAME:-$(hostname -i | cut -d" " -f1)}
Do you think that could be an useful addition to the configuration?
Thanks again for your help, I owe you a beer.
Cheers,
Brais
hi @kuro
ok cool
mmh might be useful, but if the ip you get is a private one you might get some issues
best,
michael
system
Closed
13 May 2022 10:43
5
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.