Best way to consume, transform and produce between 2 confluent Kafka topics without ksqldb

Hello,
if using KSQLDB is not an option, what is the recommended pattern from Confluent to consume events from a confluent Kafka topic, perform transformation of event data as well as structure and publish to another confluent Kafka topic in the same cluster?

We have a confluent cluster deployed on prem. At present, we have developed a custom connector deployed on Kafka-connect that does this. Are there any known issues/limitations with this?

Let me know if you need any further details. Thanks for your inputs.

You could directly use kafka producer/consumer library to start. If streaming processing is really a thing for you company, maybe adopt flink(but it requires a learning curve and environment setup etc.)

1 Like

Thanks Xuanzi. Do you mean a standalone code that does this instead of a custom Kafka connector?
If so, any known limitation of using a custom connector?

MirrorMaker2 is the only connector that can consume and produce to Kafka. It’s not meant to be used for transforming data, only transferring bytes (mirroring topics).

Just Kafka Streams is the answer you’re looking for, which is what ksqlDB is an abstraction over.

The answer is not specific to Confluent.

Thanks for your reply. I understand that KSQLDB or Kafka streams is the recommended way and OOTB feature provided by the product. Mirrormaker is not an option as there is some transformation to be made.

What are the issues we might face if we use a custom kafka connect connector to do this apart from the maintainability of the custom connector code?

I have not tried this but you could look at MirrorMaker2 with SMTs. If you need to operate over more than one record at a time you will need streams, ksqldb, or your own consumer / producer code.

2 Likes

Flink or Spark would be more flexible options. But Kafka Streams solves your use case, exactly… Kafka Connect SMT code can be very rigid, and you actually need to run a cluster for it. Kafka Streams is just running a JAR file.

As mentioned, since you want data to exist all in the same cluster, that’s not a good use case for Kafka Connect.

1 Like