Kafka connect to Rest API to ES cache

Hello

I am Siva Masilamani working as Solutions Architect for an Insurance company.

I am very new to Kafka and it’s ecosystem. I have learnt few things over the past few months and was able to run Kafka, Schema registry and connect as POC. We have strong uses for Kafka in our company and are planning to integrate lot of applications using Kafka as one of the main tools. Currently I am working on a application which uses Oracle database and our task is to get the data into ES Search from Oracle db. It may sound easy as I could use JDBC source connector and ES Search connector to do that however, there is lot things to be happened between that. The problem is that the database gives us only the ID of the data and then we need to call another web service with the id which will eventually return JSON data that needs to be published to ES Search.

This is what I have done so far, I use JDBC source connector to monitor the table that will give us the ID and then I wrote our own HTTP connector to make REST call and then capture the response in JSON and then I use Producer from the same connector to publish to a new topic which will be listened by ES sink connector. I am not even sure if what I am doing is a good practice but it is working I can see the data end up in our ES. However there is one thing that is bugging me where I just publish JSON as string to the ES but I want to convert that into AVRO possibly dynamically and publish the schema to the registry when I produces the data to the said topic. This is where I need help as I could not find any help in converting JSON response into AVRO and how to producers AVRO schema in my connector.

Also , I need advise on to do initial load of data as the said table contains 5+ million rows which possibly means I need to call my endpoint the same no of time in each of the environments (dev/qa/uat and finally to PROD).

Thanks

1 Like

Hi @saachinsiva, welcome to the forum :slight_smile:

A few thoughts:

  1. Where is the data held that you’re doing the REST lookup calls against? Do you have the option of getting that source into Kafka? You could then do the joins within Kafka. This would also help massively with your initial bulk load and testing use cases.
  2. Rather than writing a custom connector to do the lookups I think Kafka Streams would be a better fit here
  3. When you write back to Kafka you can use different serialisers. Since I’m not a Java coder I can’t give you the specifics but my understanding is that you use the Schema Registry serde

Hello @rmoff

Thank you for getting back to me. The system that we make REST call stores highly sensitive information and it would be really difficult to get access to their database as it needs very high level approval and even if we did manage to get it, the data format (JSON) that we want to publish to Elastic Search is currently being prepared by a huge Oracle packages that pull data from lot of different tables so even if we manage to get the data into Kafka, we need to replicate the logic in the Oracle store procedures which is a daunting task.

Can you please explain me a bit more about second point because the way I understand is that it is possible only if we can manage to do the first step?.

I will definitely take a look into the link you shared and that is probably what I want now.

Thanks again.

This is the bit I was referencing:

I took this to understand that you wrote a connector for Kafka Connect; is that not the case?

Thanks again. Yes I wrote our own connector that makes REST call and publish the resulting JSON into the Topic which which is being listened by ES Sink connector.

e.g. JDBC Sourc connector → Topic 1 → MyConnector -->Topic 2 → ES Sink Connector → Elastic Search.

Here my connector just gets the ID from topic 1 makes REST API call and put the resulting JSON string into Topic 2. This is where I need help to convert to AVRO instead of just JSON string but the link you shared for Serializer might help me here.