I have a bunch of log files, which are basically just JSON arrays. I have to load them in a Kafka Topic and then onto a PostgreSQL database. How I am doing it right now is, using a Producer written in Python to process the JSON and serialize the data as AVRO and send it to Kafka. After that I am using JDBC sink connector to load it onto PostgreSQL. Is there a better way to do it? My main concern is that because I am writing my own producer code, I would have to end up maintaining it going forward. Hence, I was trying to find some Open Source Connectors that do the job but sadly none of them work (tried filepulse and spooldir). Is there a better way to do it? The log files being generated are from a proprietary data agent deployed on a system.
Welcome to the forum
Kafka Connect is indeed generally a great way to do this kind of integration. Can you expand a bit on “none of them work” in terms of the filepulse and spooldir connectors?
Hey, thank you for your response. Your tutorials really helped to set everything up.
So, I need each JSON array element to represent a row in the database. The explode array configuration was throwing me an error with filepulse. I raised an issue on Github but did not get a response yet. I tried Spooldir but it just stored the array as a single event in the topic. So I ended up creating my own producer.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.