Watch the recording and ask questions about this event in this thread!
- When: 21 September 2022 at 17:00 (UTC)
- Speaker(s): Ian Feeney and Roman Kolesnev
- Talk(s): Special Data in Motion: Near real-time processing of spatial data in Kafka Streams
Special Data in Motion: Near real-time processing of spatial data in Kafka Streams
Ian Feeney, Customer Innovation Engineer, Confluent
Roman Kolesnev, Staff Customer Innovation Engineer, Confluent
Ian is a Customer Innovation Engineer in the Customer Solutions and Innovation Division at Confluent where he is a member of a team helping customers get the most out of Confluent platform. He started his career as a graduate developer at the Royal Bank of Scotland, before moving into the geospatial field working for UK organizations such as the Forestry Commision, The Registers of Scotland, and Ordnance Survey. Ian is passionate about unlocking the power of spatial data to make the world a better place.
Roman is a Staff Customer Innovation Engineer at Confluent in the Customer Solutions & Innovation Division Labs team. His experience includes building business critical event streaming applications and distributed systems in the financial and technology sectors.
Kafka Streams applications can process fast-moving, unbounded streams of data. This gives us the capability to process and react to events from many sources in near real time as they converge in Kafka. However, if the events in these data streams have a spatial component and their spatial relationships with each other determine how they should be processed or reacted to, this raises some fundamental challenges. Determining that, for example, a person is within an area or that routes are intersecting requires access to geospatial operations which are not readily available in Kafka Streams.
In this talk, we will first set the scene with a geospatial 101. Then, using a simplified taxi hailing use case, we will look at two approaches for processing spatial data with Kafka Streams. The first approach is a naive approach which uses Kafka Streams DSL, geohashing and the Java Spatial4j library. The second approach is a prototype which replaces the RocksDB statestore with Apache Lucene (an embedded storage engine with powerful indexing, search and geospatial capabilities), and implements a stateful spatial join with the Transformer API.
Overall, this presentation will give you an understanding of how you might go about building custom processing capabilities on top of Kafka Streams for your own use cases.
Can you answer how a given event came to be? Is it an aggregation, a combination of multiple events with different sources? What are its origins?
Given the growing complexity of event streaming architectures - stateful processing, joins, fan-outs, multi-cluster flows - it is increasingly important to be able to accurately answer those questions, understand data flows and capture data provenance.
This talk will walk through how to use and extend OpenTelemetry Java agent auto instrumentation to achieve full end-to-end traceability in Kafka event streaming architectures involving multi-cluster deployments, the Connect platform and stateful KStream applications.
We will cover:
- Distributed Tracing concepts - context propagation and the OpenTelemetry implementation stack;
- Java agent auto instrumentation, problems faced when instrumenting service platforms (Connect), stateful applications (KStreams) and how auto instrumentation can be extended using loadable extensions to solve those problems;
- Demo of an end-to-end tracing implementation and a highlight of the interesting use cases it enables.