Kafka connect is not performing well for Jdbc sink to Oracle db

I am trying to see the performance of kafka connector JDBC sink from kafka topics to Oracle EXA database. We have an event driven architecture and we want to sink the records from a kafka topic once we receive a trigger. In my POC I created the connector after the kafka topic was filled with records (this comes from the DB and gets processed in Drools after which it is put into the kafka topics). I created a couple of connectors for 2 different tables, with 931 and 77 columns. Connector for the table with 77 columns looks like this -

{
"name": "testdb-sink-XXX_A_B_C_123",
"config": {
  "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
  "input.data.format": "JSON",
  "connection.host ": "********",
  "connection.port": "1521",
  "connection.user": "ABC_USER",
  "connection.password": "ABC",
  "connection.url": "jdbc:oracle:thin:@//xyz.df.test.ccc.intranet:1521/brown_dog",
  "db.name": "brown_dog",
  "tasks.max": "10",
  "insert.mode.databaselevel": true,
  "table.name.format": "XXX_A_B_C",
  "topics": "123-result-XXX_A_B_C",
  "confluent.topic.security.protocol": "PLAINTEXT",
  "transforms" : "changeCase,timestampNowField,dropID,TimestampConverter1,TimestampConverter2,TimestampConverter3",
  "transforms.changeCase.type" : "com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Value",
  "transforms.changeCase.from" : "LOWER_CAMEL",
  "transforms.changeCase.to" : "UPPER_UNDERSCORE",
  "transforms.timestampNowField.type" : "com.github.jcustenborder.kafka.connect.transform.common.TimestampNowField$Value",
  "transforms.timestampNowField.fields" : "CREATION_DATE",
  "transforms.dropID.type"     : "org.apache.kafka.connect.transforms.ReplaceField$Value",
  "transforms.dropID.blacklist": "ID",
  "transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
  "transforms.TimestampConverter1.format": "DD-MM-YY",
  "transforms.TimestampConverter1.field": "CREATION_DATE",
  "transforms.TimestampConverter1.target.type": "Date",
  "transforms.TimestampConverter2.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
  "transforms.TimestampConverter2.format": "DD-MM-YY",
  "transforms.TimestampConverter2.field": "LAST_UPDATE_DATE",
  "transforms.TimestampConverter2.target.type": "Date",
  "transforms.TimestampConverter3.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
  "transforms.TimestampConverter3.format": "DD-MM-YY",
  "transforms.TimestampConverter3.field": "REP_DATE",
  "transforms.TimestampConverter3.target.type": "Date"
}
}

This works and i have been able to transfer records from the topic to the database.
The topic has 10 partitions and I found that at any time only 5 database connections are opened.
The sink table has a composite primary key , one column ID has a sequence and that is null in the kafka topic, so I read that the connect will use the existing sequence to populate the column.
The connect server runs in a docker container on my local machine.
When the connector is created, I find that the velocity of transfer from kafka topics to the database is very slow, around 20,000 records per minute. There were 1.5 million records in the kafka topic and it took more than an hour to be transferred to the DB.

Is this expected performance ? What could be the reason for this performance.

What version of the connector are you using? A year ago there was an issue caused when clob support was improved. It caused the PK index to not be used (setNString being used for VARCHAR preventing PK from being used)

It has been fixed since then. somewhere between 10.0.1 to 10.0.3 it was fixed (if my memory is ok), but it has been 8 months since I have done any work with sinking to oracle via the connector.

In addition to upgrading / checking version I would take the SQL used and run an explain plan on it. Make sure that a full-table scan isn’t happening.

1 Like