I just started diving into Kafka in order to assess if adequate as a solution for a specific requirement.
We aim for a redundant HA event aggregation system, with 3 controller/broker nodes in NA, EU, AS, and a relatively small and dynamic set of 300-600 producer nodes located all around the globe.
Ideally the producer would select the broker with the lowest latency, but it looks like in the community Kafka setup all producers would always connect to the leading controller. Right?
(That would be acceptable as the produced event datapoints will always fit into a single TCP packet, with low intercontinental latency.)
The documentation then says the Kafka cluster nodes should be located nearby with low latency and predictable+reliable bandwidth.
We would not only have a relative small set of producers (300-600). We also talk about only one event from each producer every 15 seconds on average. At max it can have up to one event per second from each producer. Over 5 minutes it reliably averages out to one event every 15secs.
In a second stage this telemetry aggregation system would be extended to additional metrics, where we can see up to 250 events per second from each producer in short bursts, which then also averages out to max 250 events every 15 secs. (600 producers x 250 events = 150k per second max)
My question now is, if this relative moderate load would allow us to safely run a cluster with 3 controller/brokers on 3 different continents? Or would it still be the better option to position the 3 nodes in the same region in different datacenters and networks, but close by with <10ms latency.