I am trying to see the performance of kafka connector JDBC sink from kafka topics to Oracle EXA database. We have an event driven architecture and we want to sink the records from a kafka topic once we receive a trigger. In my POC I created the connector after the kafka topic was filled with records (this comes from the DB and gets processed in Drools after which it is put into the kafka topics). I created a couple of connectors for 2 different tables, with 931 and 77 columns. Connector for the table with 77 columns looks like this -
{
"name": "testdb-sink-XXX_A_B_C_123",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"input.data.format": "JSON",
"connection.host ": "********",
"connection.port": "1521",
"connection.user": "ABC_USER",
"connection.password": "ABC",
"connection.url": "jdbc:oracle:thin:@//xyz.df.test.ccc.intranet:1521/brown_dog",
"db.name": "brown_dog",
"tasks.max": "10",
"insert.mode.databaselevel": true,
"table.name.format": "XXX_A_B_C",
"topics": "123-result-XXX_A_B_C",
"confluent.topic.security.protocol": "PLAINTEXT",
"transforms" : "changeCase,timestampNowField,dropID,TimestampConverter1,TimestampConverter2,TimestampConverter3",
"transforms.changeCase.type" : "com.github.jcustenborder.kafka.connect.transform.common.ChangeCase$Value",
"transforms.changeCase.from" : "LOWER_CAMEL",
"transforms.changeCase.to" : "UPPER_UNDERSCORE",
"transforms.timestampNowField.type" : "com.github.jcustenborder.kafka.connect.transform.common.TimestampNowField$Value",
"transforms.timestampNowField.fields" : "CREATION_DATE",
"transforms.dropID.type" : "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.dropID.blacklist": "ID",
"transforms.TimestampConverter1.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter1.format": "DD-MM-YY",
"transforms.TimestampConverter1.field": "CREATION_DATE",
"transforms.TimestampConverter1.target.type": "Date",
"transforms.TimestampConverter2.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter2.format": "DD-MM-YY",
"transforms.TimestampConverter2.field": "LAST_UPDATE_DATE",
"transforms.TimestampConverter2.target.type": "Date",
"transforms.TimestampConverter3.type": "org.apache.kafka.connect.transforms.TimestampConverter$Value",
"transforms.TimestampConverter3.format": "DD-MM-YY",
"transforms.TimestampConverter3.field": "REP_DATE",
"transforms.TimestampConverter3.target.type": "Date"
}
}
This works and i have been able to transfer records from the topic to the database.
The topic has 10 partitions and I found that at any time only 5 database connections are opened.
The sink table has a composite primary key , one column ID has a sequence and that is null in the kafka topic, so I read that the connect will use the existing sequence to populate the column.
The connect server runs in a docker container on my local machine.
When the connector is created, I find that the velocity of transfer from kafka topics to the database is very slow, around 20,000 records per minute. There were 1.5 million records in the kafka topic and it took more than an hour to be transferred to the DB.
Is this expected performance ? What could be the reason for this performance.