What's the difference between using Kafka Streams and the Kafka Consumer?

rmoff · 3 December 2020 14:47

How’s it different writing an application with the Consumer API to process data?

miguno · 3 December 2020 15:05

Kafka’s Streams library (Apache Kafka) is built on top of the Kafka producer and consumer clients. Kafka Streams is significantly more powerful and also more expressive than the plain clients.

How’s it different writing an application with the Consumer API to process data?

It’s much simpler and quicker to write a real-world application start to finish with Kafka Streams than with the plain consumer.

Here are some of the features of the Kafka Streams API, most of which are not supported by the consumer client (it would require you to implement the missing features yourself, essentially re-implementing Kafka Streams).

Supports exactly-once processing semantics via Kafka transactions (what EOS means)
Supports fault-tolerant stateful (as well as stateless, of course) processing including streaming joins, aggregations, and windowing. In other words, it supports management of your application’s processing state out-of-the-box.
Supports event-time processing as well as processing based on processing-time and ingestion-time. It also seamlessly processes out-of-order data.
Has first-class support for both streams and tables, which is where stream processing meets databases; in practice, most stream processing applications need both streams AND tables for implementing their respective use cases, so if a stream processing technology lacks either of the two abstractions (say, no support for tables) you are either stuck or must manually implement this functionality yourself (good luck with that…)
Supports interactive queries (also called ‘queryable state’) to expose the latest processing results to other applications and services
Is more expressive: it ships with (1) a functional programming style DSL with operations such as map, filter, reduce as well as (2) an imperative style Processor API for e.g. doing complex event processing (CEP), and (3) you can even combine the DSL and the Processor API.
Has its own testing kit for unit and integration testing.

See http://docs.confluent.io/current/streams/introduction.html for a more detailed but still high-level introduction to the Kafka Streams API, which should also help you to understand the differences to the lower-level Kafka consumer client.

Beyond Kafka Streams, you can also use the event streaming database ksqlDB to process your data in Kafka. ksqlDB is built on top of Kafka Streams. It supports essentially the same features as Kafka Streams, but you write streaming SQL instead of Java or Scala. Programmatically, you can interact with ksqlDB via a CLI or a REST API; it also has a native Java client in case you don’t want to use REST.

Hope this helps!

rmoff · 4 December 2020 10:06

This is great, thanks @miguno!

rmoff · 20 January 2021 11:51

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
🎧 Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck News and Blogs	0	3059	4 November 2021
🎥 Streaming Apps and Poison Pills: handle the unexpected with Kafka Streams Kafka Streams	0	3233	12 November 2020
🎥 The Flux Capacitor of Kafka Streams and ksqlDB Kafka Streams	0	3318	12 November 2020
Kafka Streams 101 – resources to get started Kafka Streams	1	8077	5 November 2021
✍️ Kafka Streams 101 News and Blogs	0	2985	28 October 2021

What's the difference between using Kafka Streams and the Kafka Consumer?

Related topics