There’s a new Streaming Audio episode - check it out!
When it comes to Apache Kafka®, there’s no one better to tell the story than Jay Kreps (Co-Founder and CEO, Confluent), one of the original creators of Kafka. In this episode, he talks about the evolution of Kafka from in-house infrastructure to a managed cloud service and discusses what’s next for infrastructure engineers who used to self-manage the workload.
Kafka started out at LinkedIn as a distributed stream processing framework and was core to their central data pipeline. At the time, the challenge was to address scalability for real-time data feeds. The social media platform’s initial data system was built on Apache™Hadoop®, but the team later realized that operationalizing and scaling the system required a considerable amount of work.
When they started re-engineering the infrastructure, Jay observed a big gap in data streaming—on one end, data was being looked at constantly for analytics, while on the other end, data was being looked at once a day—missing real-time data interconnection. This ushered in efforts to build a distributed system that connects applications, data systems, and organizations for real-time data. That goal led to the birth of Kafka and eventually a company around it—Confluent.
Over time, Confluent progressed from focussing solely on Kafka as a software product to a more holistic view—Kafka as a complete central nervous system for data, integrating connectors and stream processing with a fully-managed cloud service.
Now as organizations make a similar shift from in-house infrastructure to fully-managed services, Jay outlines five guiding points to keep in mind:
- Cloud-native systems abstract away operational efforts for you without infrastructure concerns
- It’s important to have a complete ecosystem for Kafka, including connectors, a SQL layer, and data governance
- A distributed system should allow data to be accessible everywhere and across organizations
- Identifying a reliable storage infrastructure layer that is dependable, such as Amazon S3 is critical
- Cost-effective models mean sustainability and systems that are easy to build around
EPISODE LINKS
- The Hitchhiker’s Guide to the Galaxy
- Hedonic treadmill
- Watch the video version of this podcast
- Join the Confluent Community
- Learn more with Kafka tutorials, resources, and guides at Confluent Developer
- Live demo: Intro to Event-Driven Microservices with Confluent
- Use PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)