Hi, I am brand new to Confluent/Kafka, and would like to use Confluent Cloud with ksqldb for “no code” data transformations before ingesting into my viz tool. I see there are no managed source connectors for CSVs as such (I don’t want to set up an SFTP server just for this purpose). I would greatly appreciate some guidance on the simplest way to proceed. (Note: I have seen the Confluent videos (@rmoff ) on loading from CSV using kafkacat/spooldir in a shell environment - I am not opposed to using a CLI, but I am trying to understand where the boundaries lie for the simplest possible no code architecture for my use case.)
My goal is to prove the solution myself manually (no code), then scale/automate it as my company grows; this is why I am choosing Confluent rather than fragmented solutions for E,L, and T.
The main reason there is no file based (like a CSV) fully managed source connector is that the connector will need access to a file system to read the records and load them into Kafka. In Confluent Cloud, there is no shared managed file system where you could put the file for the connector to read. Managed databases (like MongoDB Atlas) or fully managed object storage (like Amazon S3) provide hosted APIs for access to the data which source connectors use to read before loading into Kafka.
Maybe your data could be loaded into one of these managed services first (S3 possibly), before using Connect to source into Kafka.
@rick - thanks very much for your response. I would love to use a fully managed S3 source connector, but don’t see one listed among the 27 such connectors in Confluent Cloud. If self-managed, does this mean I need to run an instance of the Confluent platform outside of Confluent Cloud, and if so, what’s the best way to get started with this?
Some people choose to use a lighter weight tool for this job than Connect. Connect being designed as an “always on distributed system”, it might be deemed too “heavy” for the job of loading files in an ad-hoc basis. An example tool that’s popular for this job is FileBeats from Elastic.
Of course, if your records could originate from a managed system, like a database table, then a managed connector would be great to use. Hope this helps