How to load data from flat file in a "no code" environment

Greg · 9 March 2022 03:30

Hi, I am brand new to Confluent/Kafka, and would like to use Confluent Cloud with ksqldb for “no code” data transformations before ingesting into my viz tool. I see there are no managed source connectors for CSVs as such (I don’t want to set up an SFTP server just for this purpose). I would greatly appreciate some guidance on the simplest way to proceed. (Note: I have seen the Confluent videos (@rmoff ) on loading from CSV using kafkacat/spooldir in a shell environment - I am not opposed to using a CLI, but I am trying to understand where the boundaries lie for the simplest possible no code architecture for my use case.)

My goal is to prove the solution myself manually (no code), then scale/automate it as my company grows; this is why I am choosing Confluent rather than fragmented solutions for E,L, and T.

Thanks!

rick · 9 March 2022 13:59

Welcome @Greg

The main reason there is no file based (like a CSV) fully managed source connector is that the connector will need access to a file system to read the records and load them into Kafka. In Confluent Cloud, there is no shared managed file system where you could put the file for the connector to read. Managed databases (like MongoDB Atlas) or fully managed object storage (like Amazon S3) provide hosted APIs for access to the data which source connectors use to read before loading into Kafka.

Maybe your data could be loaded into one of these managed services first (S3 possibly), before using Connect to source into Kafka.

Good Luck!

Greg · 9 March 2022 17:11

@rick - thanks very much for your response. I would love to use a fully managed S3 source connector, but don’t see one listed among the 27 such connectors in Confluent Cloud. If self-managed, does this mean I need to run an instance of the Confluent platform outside of Confluent Cloud, and if so, what’s the best way to get started with this?

rick · 9 March 2022 19:41

@Greg I’m sorry, I misspoke there is not a managed S3 source connector.

You have a few options for loading files this way. You don’t need to run a full Confluent Platform, you can just run a Kafka Connect instance. Here are some Confluent documents on that process: Connect Kafka Connect to Confluent Cloud | Confluent Documentation

Some people choose to use a lighter weight tool for this job than Connect. Connect being designed as an “always on distributed system”, it might be deemed too “heavy” for the job of loading files in an ad-hoc basis. An example tool that’s popular for this job is FileBeats from Elastic.

Of course, if your records could originate from a managed system, like a database table, then a managed connector would be great to use. Hope this helps

Greg · 9 March 2022 20:10

Hi @rick - thank-you! This helps a lot - it’s just the sort of perspective I needed. I’ll check out both options. Many thanks.

rick · 28 March 2022 17:54

@Greg FYI: Amazon S3 Source connector for Confluent Cloud Quick Start | Confluent Documentation

Topic		Replies	Views
New blog: Loading CSV data into Confluent Cloud using the FilePulse connector Kafka Connect	1	3250	25 April 2021
About the Managed Connectors category Managed Connectors	1	4032	9 February 2021
Snowflake source connector Confluent Cloud	2	3445	26 October 2021
S3 Source Connector for .CSV files Self-Managed Connectors	1	2847	22 December 2022
🎥 Kafka Connect in Action: Loading a CSV file into Kafka Self-Managed Connectors	1	3377	9 February 2021

How to load data from flat file in a "no code" environment

Related topics