How to stop RowFileInputReader after matching a specific line pattern (Text Header + Large Binary Tail)

Hi everyone,

I am using kafka-connect-file-pulse to process files that have a specific hybrid structure:

  1. A multi-line text header
  2. An empty line or custom characters line as a line separator
  3. A large amount of binary data.

My Goal: I want to ingest the entire header as a single Kafka record and then immediately stop reading the file to avoid the overhead of processing the large binary tail.

Possible Configuration:
I am thinking of using a MultiRowFilter to group the header and a DropFilter to discard the binary records,
While the DropFilter prevents the binary data from being sent to Kafka, the RowFileInputReader still continues to read the entire file line-by-line which isn’t convenient in case of large data.

So I would like to know :

  • Is there a way to signal the reader to stop/close the file stream as soon as a specific line with a special pattern is found ?
  • Is there a way to read like a fixed number of lines and discard the rest ?

I’m not finding a way to short circuit. It doesn’t look to be too complicated to implement this feature though. It’s pretty similar to read.max.wait.ms, except you would stop iterating on a pattern match rather than time elapsed. See here. Basically that change plus some config plumbing = win? See if you get similar advice on this GH issue and, if you’re interested, you might submit a PR to contribute this feature.