Interactive Query in Kafka Streams: The Next Generation [Kafka Summit 2022]

Interactive Query in Kafka Streams: The Next Generation
Date : April 26, 2022
Time : 4:00 PM - 4:45 PM BST


  • Vasiliki Papavasileiou, Software Engineer, Confluent
  • John Roesler, Staff Software Engineer II, Confluent

Kafka Streams offers Interactive Queries (IQ) that allow one to interact with the internal stream processing state from outside of the application. This functionality has proven invaluable to users over the years for everything from debugging to serving low latency queries straight from the Streams runtime.

However, the actual interfaces for IQ were designed in the very early days of Kafka Streams and have proven cumbersome to use. Adding new custom queries such as reverse scan requires changes to more than 6000 loc spanning 108 files. There is no way currently to customize how a query executes - if, for instance, it should use the caching layer or not. Moreover, there is no way to extend the query result to include extra information such as which store layers or segments participated in the query, execution time, cache hit/miss, etc. Finally, IQ allows users to tradeoff consistency for availability by querying standby stores during rebalances. Eventual consistency, although useful as a concept, makes it difficult to provide applications with a good user experience.

In this presentation, we unveil the next generation of Interactive Query (IQv2) that addresses all these shortcomings. We demonstrate the key benefits of the new query API:

  1. Customizability: You can use IQv2 to easily plug in your own store implementations and queries. You can use it, for example, to implement complex queries to push down filters and indices into the storage layer itself.
  2. Control: It gives you far more control over how each query is executed and exposes all the execution details you need to power high performance, production-grade queries.
  3. Consistency: It provides a tunable consistency model, allowing you to choose your tradeoff between consistency, availability, and latency.

Attend this talk if you want to learn about how to write high performance, semantically strong applications in a modern, data-in-motion environment.