"Kafka for Storing and Streaming Messages in a Large-Scale Chat Application – Best Practices?"

I’m building a large-scale chat app (~3.3 million users, 25k groups, 60–50k users per group). I’m considering storing all chat messages in Kafka instead of a traditional DB.

Key goals:

  • Fetch historical messages when a user comes online.
  • Deliver real-time messages.
  • Avoid external DB to reduce latency.

Is Kafka suitable for this use case as a message store? Or should I use Kafka for real-time only and store messages externally?

What would be an ideal Kafka topic/partitioning strategy in such a multi-tenant system?

Architecture Overview I triedstrong text:

  • Each client has:
    • A groupmessages-{clientId} topic
    • A usermessages-{clientId} topic with 15 partitions
  • I use Kafka Streams to:
    1. Consume group messages
    2. Fan out each group message to all its members
    3. Produce one message per user into usermessages-{clientId} using userId as the record key

Client Behavior:

  • When a user comes online, they:
    • Subscribe to usermessages-{clientId}
    • Use their userId as the key to get historical messages (retention-based)
    • Stay connected for real-time messages

Problem:

When I simulate adding 50 users, Kafka starts disconnecting or acting unstable. Each user triggers a subscription to the shared topic. I assume something about my design (partitioning, consumers, backpressure?) is causing overload or poor key distribution.

Questions:

  1. Is this per-user duplication to usermessages via Kafka Streams a scalable pattern?
  2. Is 15 partitions enough for a topic shared by thousands of users (potentially millions)?
  3. Could too many consumer groups or heavy reads on one topic cause these disconnects?
  4. Would splitting per-user topics (e.g., usermessages-{userId}) be more scalable?
  5. Is Kafka designed to be used as a historical message store for millions of users, or should I offload to DBs like Cassandra for old messages?

My goal is to:

  • Keep real-time delivery fast
  • Allow quick retrieval of old messages (when user comes online)
  • Avoid overloading brokers and consumers

Any guidance on:

  • Partitioning strategy
  • Kafka Streams performance tuning
  • Alternatives to fan-out duplication

…would be much appreciated!