🎧 Data Management and Digital Transformation with Apache Kafka at Van Oord

alice.richardson · 29 April 2021 13:57

There’s a new Streaming Audio episode - check it out!

Imagine if you could create a better world for future generations simply by delivering marine ingenuity.

Van Oord is a Dutch family-owned company that has served as an international marine contractor for over 150 years, focusing on dredging, land infrastructure in the Netherlands, and offshore wind and oil & gas infrastructure.

Real-time insights into costs spent, the progress of projects, and the performance tracking of vessels and equipment are essential for surviving as a business. Becoming a data-driven company requires that all data connected, synchronized, and visualized—in fact, truly digitized.

This requires a central nervous system that supports:

Legacy (monolith environment) as well as microservices
ELT/ETL/streaming ETL
All type of data, including transactional, streaming, geo, machine, and (sea) survey/bathymetry
Master data/enterprise common data model

The need for agility and speed makes it necessary to have a fully integrated DevOps-infrastructure-as-code environment, where data lineage, data governance, and enterprise architecture are holistically embedded. Thousands of topics need to be developed, updated, tested, accepted, and deployed each day. This together with different scripts for connectors requires a holistic data management solution, where data lineage, data governance and enterprise architecture are an integrated part.

Thus, Marlon Hiralal (Enterprise/Data Management Architect, Van Oord) and Andreas Wombacher (Data Engineer, Van Oord) turned to Confluent for a three-month proof of concept and explored the pre-prep stage of using Apache Kafka® on Van Oord’s vessels.

Since the environment in Van Oord is dynamic with regards to the application landscape and offered services, it is essential that a stable environment with controlled continuous integration and deployment is applied. Beyond the software components itself, this also applies to configurations and infrastructure, as well as applying the concept of CI/CD with infrastructure as code. The result: using Terraform and Confluent together.

Publishing information is treated as a product at Van Oord. An information product is a set of Kafka topics: topics to communicate change (via change data capture) and topics for sharing the state of a data source (Kafka tables). The set of all information products forms the enterprise data model.

Apache Atlas is used as a data dictionary and governance tool to capture the meaning of different information products. All changes in the data dictionary are available as an information product in Confluent, allowing for consumers of information products to subscribe to the information and be notified about changes.

Van Oord’s enterprise architecture model must remain up to date and aligned with the current implementation. This is achieved by automatically inspecting and analyzing Confluent data flows. Fortunately, Confluent embeds homogeneously in this holistic reference architecture. The basis of the holistic reference architecture is a change data capture (CDC) layer and a persistent layer, which makes Confluent the core component of the Van Oord future-proof digital data management solution.

EPISODE LINKS

Listen to the episode

Topic		Replies	Views
Recording ready to view: SPEAKER Q&A THREAD: 21 April 2022- Apache Kafka® The Core Technology Events	0	3430	28 April 2022
🎧 Mastering DevOps with Apache Kafka, Kubernetes, and Confluent Cloud ft. Rick Spurgeon and Allison Walther News and Blogs	1	3303	4 February 2021
🎧 The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps News and Blogs	0	2890	24 February 2022
🎧 Distributed Systems Engineering with Apache Kafka ft. Roger Hoover News and Blogs	0	3249	20 November 2020
Doing DevOps with Confluent Cloud & K8s Ops	0	3414	13 January 2021

🎧 Data Management and Digital Transformation with Apache Kafka at Van Oord

Related topics