Batch processing in kstreams

As you pointed our, there is ksqlDB and we will keep investing heavily to make it more expressive and to address current limitations. Personally, I think that ksqlDB already offers a programmatic way to execute SQL statements over STREAMs and TABLEs. Doing aggregations and joins is actually straightforward with both Kafka Streams and ksqlDB. You should check it out. If we will ever add a “de-duplication” operator to ksqlDB seems to be an open question though (I guess if there is enough user demand we might. On the other hand, you can do it already today: cf. Tombstone message in Table when filtering duplicate events.

For Kafka Streams, there is actually a KIP to add a “distinct” operation to the DSL as pointed out above. There is also a KIP to add more built-in aggregation functions: KIP-747 Add support for basic aggregation APIs - Apache Kafka - Apache Software Foundation

As you can see, there is are many things in-flight…

For ksqlDB, you can also follow KLIPs if you are interesting in future development: ksql/design-proposals at master · confluentinc/ksql · GitHub