Hi everyone,
I am using kafka-connect-file-pulse to process files that have a specific hybrid structure:
- A multi-line text header
- An empty line or custom characters line as a line separator
- A large amount of binary data.
My Goal: I want to ingest the entire header as a single Kafka record and then immediately stop reading the file to avoid the overhead of processing the large binary tail.
Possible Configuration:
I am thinking of using a MultiRowFilter to group the header and a DropFilter to discard the binary records,
While the DropFilter prevents the binary data from being sent to Kafka, the RowFileInputReader still continues to read the entire file line-by-line which isn’t convenient in case of large data.
So I would like to know :
- Is there a way to signal the reader to stop/close the file stream as soon as a specific line with a special pattern is found ?
- Is there a way to read like a fixed number of lines and discard the rest ?