Unhealthy Connect Loop - Confluent for Kuberenetes Quickstart

dcguim · 19 August 2021 10:04

I am currently trying to install all of the confluent services to our Kubernetes cluster, following this tutorial: Confluent for Kubernetes Quickstart | Confluent Documentation
I noticed while provisioning the Connect container on step 3.1 of the tutorial, that a unhealthy loop is triggered.

When inspecting the connect pod:

Normal   Created    2m32s                kubelet            Created container config-init-container
Normal   Started    2m31s                kubelet            Started container config-init-container
Warning  Unhealthy  27s                  kubelet            Readiness probe failed: Get "http://MYIP:8083/v1/metadata/id": dial tcp MYIP:8083: connect: connection refused
Warning  BackOff    23s (x2 over 25s)    kubelet            Back-off restarting failed container
Normal   Pulled     12s (x3 over 2m31s)  kubelet            Container image "confluentinc/cp-server-connect-operator:6.1.0.0" already present on machine
Normal   Created    11s (x3 over 2m31s)  kubelet            Created container connect
Normal   Started    11s (x3 over 2m31s)  kubelet            Started container connect

Therefore, I attempted to increase the initial delay in the Connect pod specification. However, it is not clear how to do this on the io.confluent.platform.v1beta1.Connect.spec. I also noticed that on on the official official connect helm chart the livenessProbe can be configured. But I didn’t find the same flexibility forreadinessProbe. I will try nevertheless to run the connect helm chart of confluent and compare if results in the same problems as in the Confluent for Kubernetes Quickstart.

dcguim · 24 August 2021 09:44

Just to mention that I did not experience the same problem in the Connect service using the github helm charts.

rohit2b · 24 August 2021 18:50

Hi - there’s a few things to look at.

Does Kafka come up with no errors?
Does $kubectl logs kafka-0 show any errors?
Does $kubetl get pods show Kafka pods as 3/3 (assuming you are running 3 brokers?

If Kafka is up, then It could be that the Connect containers need more time to pass their health checks. This is how you can change that in any component CustomResource:

kind: Connect
...
spec:
  replicas: 1
  ...
  podTemplate:
    resources:
      requests:
        cpu: 2000m
        memory: 8Gi
    probe:
      liveness:
        periodSeconds: 10
        failureThreshold: 5
        timeoutSeconds: 500

Topic		Replies	Views
Confluent For Kubernetes: Connect pod going in perpetual restart mode Containers	2	4071	2 September 2024
Confluent for Kubernetes - ksqldb -> issue ksqlDB	2	3867	16 June 2022
Kafka connect with k8s config Kafka Connect	11	3400	7 May 2022
CFK QuickStart Failed Containers	3	4053	20 April 2022
Stand alone cp kafka connect to confluent cloud Kafka Connect	2	2722	28 February 2023

Unhealthy Connect Loop - Confluent for Kuberenetes Quickstart

Related topics