🎧 Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

alice.richardson · 26 May 2022 13:08

There’s a new Streaming Audio episode - check it out!

Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.

The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.

Why Use Apache Flink?

The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.

Why use ksqlDB/Kafka Streams?

Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.

There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter.

Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.

EPISODE LINKS

Listen to the episode

Topic	Replies	Views
🎧 Advanced Stream Processing with ksqlDB ft. Michael Drogalis News and Blogs	3452	11 August 2021
✍️ Serverless Stream Processing with Apache Kafka, Azure Functions, and ksqlDB News and Blogs	2712	10 August 2022
Get Started with Apache Flink [Free Resources] Stream Processing	2469	25 May 2023
✍️ How ksqlDB Works: Internal Architecture & Advanced Features News and Blogs	3165	25 August 2021
🎧 ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury News and Blogs	3034	1 December 2021

🎧 Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Related topics