How to load data from flat file in a "no code" environment

Hi, I am brand new to Confluent/Kafka, and would like to use Confluent Cloud with ksqldb for “no code” data transformations before ingesting into my viz tool. I see there are no managed source connectors for CSVs as such (I don’t want to set up an SFTP server just for this purpose). I would greatly appreciate some guidance on the simplest way to proceed. (Note: I have seen the Confluent videos (@rmoff ) on loading from CSV using kafkacat/spooldir in a shell environment - I am not opposed to using a CLI, but I am trying to understand where the boundaries lie for the simplest possible no code architecture for my use case.)

My goal is to prove the solution myself manually (no code), then scale/automate it as my company grows; this is why I am choosing Confluent rather than fragmented solutions for E,L, and T.

Thanks!

Welcome @Greg

The main reason there is no file based (like a CSV) fully managed source connector is that the connector will need access to a file system to read the records and load them into Kafka. In Confluent Cloud, there is no shared managed file system where you could put the file for the connector to read. Managed databases (like MongoDB Atlas) or fully managed object storage (like Amazon S3) provide hosted APIs for access to the data which source connectors use to read before loading into Kafka.

Maybe your data could be loaded into one of these managed services first (S3 possibly), before using Connect to source into Kafka.

Good Luck!

@rick - thanks very much for your response. I would love to use a fully managed S3 source connector, but don’t see one listed among the 27 such connectors in Confluent Cloud. If self-managed, does this mean I need to run an instance of the Confluent platform outside of Confluent Cloud, and if so, what’s the best way to get started with this?

@Greg I’m sorry, I misspoke there is not a managed S3 source connector.

You have a few options for loading files this way. You don’t need to run a full Confluent Platform, you can just run a Kafka Connect instance. Here are some Confluent documents on that process: Connect Kafka Connect to Confluent Cloud | Confluent Documentation

Some people choose to use a lighter weight tool for this job than Connect. Connect being designed as an “always on distributed system”, it might be deemed too “heavy” for the job of loading files in an ad-hoc basis. An example tool that’s popular for this job is FileBeats from Elastic.

Of course, if your records could originate from a managed system, like a database table, then a managed connector would be great to use. Hope this helps

Hi @rick - thank-you! This helps a lot - it’s just the sort of perspective I needed. I’ll check out both options. Many thanks.

@Greg FYI: Amazon S3 Source connector for Confluent Cloud Quick Start | Confluent Documentation

1 Like