Hey everyone,
I am new to this community and would like to request your help.
I’m exploring how to ingest files into Kafka in a distributed environment, and I’m trying to figure out the most reliable way to do it using Kafka Connect.
Here’s my use case:
I have files that become “ready” in a folder, and as soon as they’re ready, I need Kafka Connect to pick them up.
The file structure is a bit unusual:
A block of text at the top
The first line is fixed position/width fields
The next lines are key: value pairs
Then there’s an empty line
And right after that, there’s binary data that I also need to extract
So the file is basically a mix of structured text and binary payload.
My initial thought was to use the Spooldir Kafka Connect connector, since it naturally watches a directory and handles file offsets, fault tolerance, etc.
But I’m not sure whether Spooldir can handle this level of custom parsing — especially the mix of fixed-width metadata + key/value pairs + binary section.
What I’m trying to understand is:
Is Spooldir customizable enough for this kind of parsing?
If not, what is the “ideal” and most reliable way to approach this in the Kafka ecosystem?
Should I write a custom Kafka connector for this format?
I’m mainly looking for the cleanest long-term solution, not a hacky one.
Any guidance or shared experience would be super helpful.
Thanks!