Kafka-connect-HDFS

Hi! I can’t download HDFS Sink connectors !
Actually, i’m trying to install kafka connector in docker and i need to download the .JAR of Kafka-connect-hdfs connector before running it. When i go on this HDFS 3 Sink Connector | Confluent Hub , the download 's button is not clickable. Could someone help me, please?

Here is the configuration of the service:

kafka-connect:
image: confluentinc/cp-kafka-connect:latest
container_name: kafka-connect
volumes:
- ./connector:/usr/share/java
environment:
CONNECT_BOOTSTRAP_SERVERS: ‘kafka:9092’
CONNECT_GROUP_ID: ‘kafka-connect-group’
CONNECT_CONFIG_STORAGE_TOPIC: ‘kafka-connect-config’
CONNECT_OFFSET_STORAGE_TOPIC: ‘kafka-connect-offset’
CONNECT_STATUS_STORAGE_TOPIC: ‘kafka-connect-status’
CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_REST_ADVERTISED_HOST_NAME: ‘kafka-connect’
CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
CONNECT_PLUGIN_PATH: ‘/usr/share/java’
ports:
- 8083:8083

hey @Damilola

I would recommend installing it via command line:

add the following to your compose file:

command:
        - bash
        - -c
        - |
          echo "Installing Kafka Connect hdfs"
          confluent-hub install confluentinc/kafka-connect-hdfs3:1.1.25
          #
          echo "Launching Kafka Connect worker"
          /etc/confluent/docker/run &
          #
          echo "Waiting for Kafka Connect to start listening on 0.0.0.0:8083 ⏳"
          while : ; do
            curl_status=$$(curl -s -o /dev/null -w %{http_code} http://0.0.0.0:8083/connectors)
            echo -e $$(date) " Kafka Connect listener HTTP state: " $$curl_status " (waiting for 200)"
            if [ $$curl_status -eq 200 ] ; then
            break
            fi
            sleep 5
          done
          sleep infinity

best,
michael

Okay. Thanks
I’ll try it

It works. Thanks

But i have another question
This is a snippet from my docker-compose. How to connect kafka-connect to hdfs (namenode)? Or it is during the copy, I presice the url of HDFS ? Here (localhost://50070)

  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    container_name: namenode
    volumes:
      - namenode:/hadoop/dfs/name
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop-hive.env
    ports:
      - "50070:50070"
    networks:
      - elk

  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    container_name: datanode
    volumes:
      - datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop-hive.env
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    ports:
      - "50075:50075"
    networks:
      - elk

  kafka-connect:
    image: confluentinc/cp-kafka-connect:latest
    container_name: kafka-connect
    hostname: connect
    depends_on:
      - schema_registry
      - kafka
      - zookeeper
    environment:
      CONNECT_BOOTSTRAP_SERVERS: 'kafka:9092'
      CONNECT_REST_ADVERTISED_HOST_NAME: connect
      CONNECT_REST_PORT: 8083
      CONNECT_GROUP_ID: compose-connect-group
      CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
      CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_OFFSET_FLUSH_INTERVAL_MS: 10000
      CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
      CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
      CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
      CONNECT_KEY_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema_registry:8082'
      CONNECT_VALUE_CONVERTER: io.confluent.connect.avro.AvroConverter
      CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: 'http://schema_registry:8082'
      CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
      CONNECT_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      CONNECT_PLUGIN_PATH: /usr/share/java/kafka-connect-*
      CONNECT_LOG4J_LOGGERS: org.apache.zookeeper=ERROR,org.I0Itec.zkclient=ERROR,org.reflections=ERROR
    ports:
      - 8083:8083
    command: 
      - bash
      - -c
      - |
        confluent-hub install confluentinc/kafka-connect-hdfs:10.2.1
        /etc/confluent/docker/run
    networks:
      - elk

it’s done via worker configuration of kafka connect.

see HDFS 3 Sink Connector Configuration Properties | Confluent Documentation for reference

best,
michael

1 Like