I am Siva Masilamani working as Solutions Architect for an Insurance company.
I am very new to Kafka and it’s ecosystem. I have learnt few things over the past few months and was able to run Kafka, Schema registry and connect as POC. We have strong uses for Kafka in our company and are planning to integrate lot of applications using Kafka as one of the main tools. Currently I am working on a application which uses Oracle database and our task is to get the data into ES Search from Oracle db. It may sound easy as I could use JDBC source connector and ES Search connector to do that however, there is lot things to be happened between that. The problem is that the database gives us only the ID of the data and then we need to call another web service with the id which will eventually return JSON data that needs to be published to ES Search.
This is what I have done so far, I use JDBC source connector to monitor the table that will give us the ID and then I wrote our own HTTP connector to make REST call and then capture the response in JSON and then I use Producer from the same connector to publish to a new topic which will be listened by ES sink connector. I am not even sure if what I am doing is a good practice but it is working I can see the data end up in our ES. However there is one thing that is bugging me where I just publish JSON as string to the ES but I want to convert that into AVRO possibly dynamically and publish the schema to the registry when I produces the data to the said topic. This is where I need help as I could not find any help in converting JSON response into AVRO and how to producers AVRO schema in my connector.
Also , I need advise on to do initial load of data as the said table contains 5+ million rows which possibly means I need to call my endpoint the same no of time in each of the environments (dev/qa/uat and finally to PROD).