Kafka Connector, both a source and a sink

Hi all.

I want to connect our system, based on Kafka, to a REST server. The server is able to create a resource and will respond with resource ID. We need to propagate this ID back to the our system, so the resources on our end can be properly linked.

Let’s call this resource a Note.

  1. So, our system (user interaction) will create our Note.
  2. It will be pushed out to Kafka topic “out-note”.
  3. Kafka Connect will pick it up and do a “POST /note” on the REST server.
  4. REST server will respond with {“id”: “01234-212314-222111-12346”, …}
  5. This message should be pushed back to Kafka, topic “in-note-updates”.

The last step is my problem. This sounds like the adapter should be both a source and a sink. Which is not the regular interface of Kafka Connect.

I could try to establish some sort of a back-channel, like writing to a file and using File Source connector, but that is awkward and error-prone.

Another option is to supply Kafka credentials to my adapter, so it can push a message to the “in-note-updates” topic. But that also feels wrong.

How do people usually solve synchronization with a request/response based service?

Is there a feature of Kafka Connect in particular that you’re looking to make use of in this? It sounds more like you just need a regular Kafka consumer/producer.

Hi RMoff.

The reason I am looking at Kafka Connect are the non-functional goodies it brings, like being able to write the Task, which is the gist of what needs to be done; letting KC do workload balancing and execution management. Logging and observability also come into the picture.

But you are right, this is, in essence, a regular 2-way producer/consumer.

Is it then recommended to abandon KC for such use-cases? Doesn’t this use case fall under the “charter of integration”? I would expect that I am not the only one who faced a request/response external endpoint and wanted to use KC to integrate.

Is it then recommended to abandon KC for such use-cases?

Possibly, yes. Because of this:

Which is not the regular interface of Kafka Connect.

If it doesn’t fit naturally, then it sounds like it’s the wrong fit. But—perhaps others will have different opinions, so perhaps they will weigh in here :slight_smile:

@nikola.milutinovic technically you can have a connector that operates as source and sink, but you have to pick which one to use. For illustration, this is a sink connector that copies from one topic to another. You can imagine having a sink connector that reads from the out-note topic and whose put method makes API calls and writes back to a different Kafka topic.

That being said, this makes me think that the Produce / Consume API might be better:

I.e., the fact that you are asking about synchronization across topics makes me think that a tightly coupled request / response using Kafka transactions will give you tighter control over failure scenarios. How bad is it if POST /note gets called multiple times for the same note? How bad is it if it doesn’t get called for a note?

1 Like

@dtroiano Thank you for an example of a 2-way connector. I was thinking along those lines, myself. As for transactional support, you could be on the more correct track, there.

We cannot allow a single note to be “lost”. Multiple calls should not be a big problem. However, missing the link between external note ID and internal note is also not acceptable.

So, maybe a custom written client that will have the full freedom (and responsibility) is in order. Thank you for sorting thing s out for me.

:+1: this sounds like a good direction. Regarding transactions, you basically have the consume-process-produce pattern described here:

To use transactional semantics in a consume-process-produce pattern and ensure each message is processed exactly once, a client application should set enable.auto.commit=false and commit offsets manually using the sendOffsetsToTransaction() method in the KafkaProducer interface.

But, offsets management and transactions might not even be required for you since you say “Multiple calls should not be a big problem.” It’ll come down to whether you want the convenience / ease and lower overhead (e.g., of automatic offset management and no transactions) in exchange for higher risk of dupe calls in failure scenarios.