🎧 The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps

alice.richardson · 24 February 2022 16:10

There’s a new Streaming Audio episode - check it out!

When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload.

Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the challenge was to address scalability for real-time data feeds. The social media platform’s initial data system was built on Apache™Hadoop®, but the team later realized that operationalizing and scaling the system required a considerable amount of work.

When they started re-engineering the infrastructure, Jay observed a big gap in data streaming—on one end, data was being looked at constantly for analytics, while on the other end, data was being looked at once a day—missing real-time data interconnection. This ushered in efforts to build a distributed system that connects applications, data systems, and organizations for real-time data. That goal led to the birth of Kafka and eventually a company around it—Confluent.

Over time, Confluent progressed from focussing solely on Kafka as a software product to a more holistic view—Kafka as a complete central nervous system for data, integrating connectors and stream processing with a fully-managed cloud service.

Now as organizations make a similar shift from in-house infrastructure to fully-managed services, Jay outlines five guiding points to keep in mind:

Cloud-native systems abstract away operational efforts for you without infrastructure concerns
It’s important to have a complete ecosystem for Kafka, including connectors, a SQL layer, and data governance
A distributed system should allow data to be accessible everywhere and across organizations
Identifying a reliable storage infrastructure layer that is dependable, such as Amazon S3 is critical
Cost-effective models mean sustainability and systems that are easy to build around

EPISODE LINKS

Listen to the episode

Topic	Replies	Views
The Cloud-Native Chasm: Lessons Learned from Reinventing Apache Kafka® as a Cloud-Native, Online Service Resources	3258	10 December 2021
🎧 From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine News and Blogs	3049	13 January 2022
Recording ready to view: SPEAKER Q&A THREAD: 21 April 2022- Apache Kafka® The Core Technology Events	3425	28 April 2022
🎧 Using Apache Kafka as Cloud-Native Data System ft. Gwen Shapira News and Blogs	3040	7 December 2021
🎧 The Human Side of Apache Kafka and Microservices ft. SPOUD News and Blogs	3252	8 March 2021

🎧 The Evolution of Apache Kafka: From In-House Infrastructure to Managed Cloud Service ft. Jay Kreps

Related topics