Kafka heterogeneous struct file connector

Hey everyone,

I am new to this community and would like to request your help.

I’m exploring how to ingest files into Kafka in a distributed environment, and I’m trying to figure out the most reliable way to do it using Kafka Connect.

Here’s my use case:

I have files that become “ready” in a folder, and as soon as they’re ready, I need Kafka Connect to pick them up.

The file structure is a bit unusual:

A block of text at the top

The first line is fixed position/width fields

The next lines are key: value pairs

Then there’s an empty line

And right after that, there’s binary data that I also need to extract

So the file is basically a mix of structured text and binary payload.

My initial thought was to use the Spooldir Kafka Connect connector, since it naturally watches a directory and handles file offsets, fault tolerance, etc.

But I’m not sure whether Spooldir can handle this level of custom parsing — especially the mix of fixed-width metadata + key/value pairs + binary section.

What I’m trying to understand is:

Is Spooldir customizable enough for this kind of parsing?

If not, what is the “ideal” and most reliable way to approach this in the Kafka ecosystem?

Should I write a custom Kafka connector for this format?

I’m mainly looking for the cleanest long-term solution, not a hacky one.

Any guidance or shared experience would be super helpful.

Thanks!

Each file maps to one record in Kafka? If so, you could try the SpoolDirBinaryFileSourceConnector and a custom SMT to handle the parsing. Is the binary data base 64 encoded in the file or direct byte array? Parsing would be trickier if the latter case but still should be doable.

Some resources on writing an SMT:

  • blog
  • docs including pointer to a bunch of examples