I’m building a large-scale chat app (~3.3 million users, 25k groups, 60–50k users per group). I’m considering storing all chat messages in Kafka instead of a traditional DB.
Key goals:
- Fetch historical messages when a user comes online.
- Deliver real-time messages.
- Avoid external DB to reduce latency.
Is Kafka suitable for this use case as a message store? Or should I use Kafka for real-time only and store messages externally?
What would be an ideal Kafka topic/partitioning strategy in such a multi-tenant system?
Architecture Overview I triedstrong text:
- Each client has:
- A
groupmessages-{clientId}
topic - A
usermessages-{clientId}
topic with 15 partitions
- A
- I use Kafka Streams to:
- Consume group messages
- Fan out each group message to all its members
- Produce one message per user into
usermessages-{clientId}
usinguserId
as the record key
Client Behavior:
- When a user comes online, they:
- Subscribe to
usermessages-{clientId}
- Use their
userId
as the key to get historical messages (retention-based) - Stay connected for real-time messages
- Subscribe to
Problem:
When I simulate adding 50 users, Kafka starts disconnecting or acting unstable. Each user triggers a subscription to the shared topic. I assume something about my design (partitioning, consumers, backpressure?) is causing overload or poor key distribution.
Questions:
- Is this per-user duplication to
usermessages
via Kafka Streams a scalable pattern? - Is 15 partitions enough for a topic shared by thousands of users (potentially millions)?
- Could too many consumer groups or heavy reads on one topic cause these disconnects?
- Would splitting per-user topics (e.g.,
usermessages-{userId}
) be more scalable? - Is Kafka designed to be used as a historical message store for millions of users, or should I offload to DBs like Cassandra for old messages?
My goal is to:
- Keep real-time delivery fast
- Allow quick retrieval of old messages (when user comes online)
- Avoid overloading brokers and consumers
Any guidance on:
- Partitioning strategy
- Kafka Streams performance tuning
- Alternatives to fan-out duplication
…would be much appreciated!