Producer slows down if multiple consumers lag


We have a centralized Kafka cluster in the organization I work for.
Most of the data gets written to a single topic (Almost 5 billion events are ingested per day) and then read off by multiple consumers.

Sometimes due to some internal constraints, a subset of these consumers start lagging. This is not an issue for our application as this does not impact any user facing system.

What we have noticed though is this - Anytime a substantial number of consumers start lagging, our write latencies increase as well. Now this is completely counterintuitive given that producers and consumers are isolated from each other.

Is this a known behaviour or this is something unique to our setup ?

What are the values set for the below configurations on the producer side?

  • acks
  • buffer.memory

From your observation it looks like there is back-pressure happening on the producer side which may or may not be related to the consumer behavior.

Have you been able to get the metrics for the below on the producer side while you are seeing the behavior? They perhaps may give more insight as to what is happening.

  • select-rate
  • record-queue-time-avg
  • request-latency-avg