AWS S3 source connector, compressed input file with document array - HOW

georgelza · 18 June 2023 06:36

hi all, hope someone can assist.

As related to AWS S3 Source Connector.

I have S3 folder structure. (////)

Can the connector intelligently consume from the newest Hour as new hours new days new months are created.

and then into this , I’ve got a inbound stream of json documents, .json.gz compressed

each json file is structured as per below.
{
[
{doc1},
{doc2},
{doc3},
{doc4},
{doc5}
]
}
Looking for suggestions how to process this into a single document/message (on topic).
Can the S3 source connector do this, Everything is hosted on AWS so thinking Lambda , uncompress step and then a 2nd step that take the entire json document and iterate over the array, posting the individual documents onto a new processed topic.

First prize however would be if the S3 source connector could do this all by itself ?

G

georgelza · 19 June 2023 10:17

answer on this…

Source connector can’t currently do a pre process step (decompress) and then the size of the json document after the decompress puts it in a size that’s not advisable to post onto a topic (if it was smaller then a SMT step could have done the “decompile” of the document comprising and array of documents).

G

Topic		Replies	Views
S3 Source Connector Managed Connectors	2	1903	23 June 2023
S3 Source - Nested folder structure / Nested documents Kafka Connect	1	1658	14 July 2023
S3 Source Connector for .CSV files Self-Managed Connectors	1	2843	22 December 2022
S3 Sink Connector - Small File Creation Issue Managed Connectors	0	2040	7 September 2023
Kafka connect s3 Source Kafka Connect	3	3004	12 March 2022

AWS S3 source connector, compressed input file with document array - HOW

Related topics