🎧 Reddit Sentiment Analysis with Apache Kafka-Based Microservices

alice.richardson · 8 September 2022 07:19

There’s a new Streaming Audio episode - check it out!

How do you analyze Reddit sentiment with Apache Kafka® and microservices? Bringing the fresh perspective of someone who is both new to Kafka and the industry, Shufan Liu, nascent Developer Advocate at Confluent, discusses projects he has worked on during his summer internship—a Cluster Linking extension to a conceptual data pipeline project, and a microservice-based Reddit sentiment-analysis project. Shufan demonstrates that it’s possible to quickly get up to speed with the tools in the Kafka ecosystem and to start building something productive early on in your journey.

Shufan's Cluster Linking project extends a demo by Danica Fine (Senior Developer Advocate, Confluent) that uses a Kafka-based data pipeline to address the challenge of automatic houseplant watering. He discusses his contribution to the project and shares details in his blog—Data Enrichment in Existing Data Pipelines Using Confluent Cloud.

The second project Shufan presents is a sentiment analysis system that gathers data from a given subreddit, then assigns the data a sentiment score. He points out that its results would be hard to duplicate manually by simply reading through a subreddit—you really need the assistance of AI. The project consists of four microservices:

A user input service that collects requests in a Kafka topic, which consist of the desired subreddit, along with the dates between which data should be collected
An API polling service that fetches the requests from the user input service, collects the relevant data from the Reddit API, then appends it to a new topic
A sentiment analysis service that analyzes the appended topic from the API polling service using the Python library NLTK; it calculates averages with ksqlDB
A results-displaying service that consumes from a topic with the calculations

Interesting subreddits that Shufan has analyzed for sentiment include gaming forums before and after key releases; crypto and stock trading forums at various meaningful points in time; and sports-related forums both before the season and several games into it.

EPISODE LINKS

Listen to the episode

Topic	Replies	Views
✍️ A (Stream Processing Use Case) Recipe for Thankfulness News and Blogs	2980	23 November 2022
Next Generation Streamers: an Apache Kafka® Intern Showcase (IN-PERSON EVENT) Events	3192	5 August 2022
🎧 Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB News and Blogs	2882	19 May 2022
🎧 Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka News and Blogs	2748	15 September 2022
✍️ Create a Data Analysis Pipeline with Apache Kafka and RStudio News and Blogs	3173	13 July 2021

🎧 Reddit Sentiment Analysis with Apache Kafka-Based Microservices

Related topics