Discussion/Feedback on design for market data distribution system

We’ve built a market data distribution system using Kafka with Producers and Consumers coded in both .NET and Java.

The main considerations for market data distribution are:

  • low latency: measured as end-to-end time from receiving a data tick from the market to receipt by the Consumer app

  • high volume: at the open and close of the market the number and rate of ticks is very high. For example: end of day for last trading day (7/22/2021), the peak load for all symbols streaming was:
    1736 messages per second (sustained over 1 minute) (each message is about 50 bytes of binary data)

  • multiple users (consumers) who are consuming the same data: (and need to see the same messages) Client (consumer) apps need to receive messages for 600-1,500 symbols.

To meet these needs and due to the nature of market data, It seems like we made some “unorthodox” choices about how to structure the Kafka integration. Maybe given a greater knowledge of Kafka, there could have been (or could be) wiser choices made:

  • Topics: each symbol is its own topic: this means there will be up to 2,000 topics updated continuously. This seemed the right choice to enable to subscribe only the symbols they want and not needing to receive data for irrelevant symbols.

  • Partitions: only 1 Partition per Topic: The app which is consuming the data needs to receive the messages from the Topic in exact order. It seems like multiple consumers in a group with multiple partitions would push managing the message ordering the the client.

  • Cleanup Policy: uses Log Compaction (primarily). Only the latest market data tick (message) is of interest to the client. So Kafka Messages are using the Market Data symbol as a key so that it is minimizing the number of old (stale) ticks per symbol. On the other hand, some symbols receive data very slowly, so a Time based compaction may remove a message which is still needed. The Cleanup Policy is the same for all topics (since the rate of message by symbol is not predictable.)

  • Consumer Groups: every Consumer is a unique group. There are 2 reasons for this:
    1 - as noted above, messages for a symbol must be received by the Client application in the order they were broadcast by the external system.
    2 - Multiple Client apps (Consumers) subscribe to the same symbol, and they all need to receive at least the latest message in the topic.

  • Increasing throughput of Consumer Apps: Given the 1 Partition per Topic configuration, we enabled the client apps to configure how many Consumers to create, and to Round-Robin the symbol subscriptions across all the Consumers. So instead of having multiple Consumers per Topic, the Topics are split across the multiple Consumers.

If anyone is interested in discussing these points - I would greatly appreciate hearing different ideas about how these requirements could been met with alternate Kafka configurations. Also, it would be great to talk about how this system could be scaled if the load grows beyond what 1 Kafka machine can handle.

1 Like

Hello! Having dealt with market data in Kafka quite a lot, this is interesting to me!

One symbol per topic sounds quite tedious unless this is a well-defined universe of tickers that you don’t expect to change too frequently. If you have multiple applications consuming from these topics, why not have one topic per application (with a single consumer for that application) and write all of the symbols that that application cares about to that topic? If you use one partition for each of those application-specific topics and use log compaction, you’ll still get the data in order and the latest values.

I’d love to hear more about your specific use case and some of the alternative designs you may have come up with!