Bad performance while writing to Oracle from Kafka using Kafka Connect

We are using Kafka Connect to write to Oracle and having performance issues while writing to Oracle and based on our understanding so far, It is actually dependent of data and the data type we have defined in Kafka Connect and Oracle.

Test 1:
We got good write performance with sample data set (Dataset 1) in this test.

Test 2:
With another sample dataset (Dataset2), write performance was almost half of what it was in first test.

Upon diving deep, we found that the difference between two data sets was that one of the field (STUDENT_ID) was having all not null values populated in dataset1 whereas in dataset2, it was a mix of NULL and NOT NULL values.

We found that STUDENT_ID field had a data type mismatch between kafka connect and Oracle. Kafka connect had INT64 whereas oracle had VARCHAR(50). so we changed kafka connect data type to STRING and with this change, we were able to get good write performance even with dataset2

Test 3:
During production run, we again ended up with poor write performance and this time we have no clue which field (and its values) are causing this.
Below is the mapping of data type between Kafka connect and Oracle.

Oracle          |	  Kafka Connect
NUMBER(19,0)    |	  int64
NUMBER          |   bytes (org.apache.kafka.connect.data.Decimal)
VARCHAR2(10)    |   string
VARCHAR2(1)     |   string
CHAR(4)         |   string
VARCHAR2(10)    |   string
TIMESTAMP       |   int64 (org.apache.kafka.connect.data.Timestamp)
CLOB            |   string

Connector configuration below

{
    "tasks.max": 50,
    "insert.mode" : "insert",
    "value.converter.schemas.enable" : "true",
    "value.converter" : "org.apache.kafka.connect.json.JsonConverter",
    "key.converter" : "org.apache.kafka.connect.storage.StringConverter",
    "consumer.auto.offset.reset" : "latest",
    "auto.offset.reset" : "latest",
    "consumer.override.max.poll.records":5000,
    "consumer.override.max.partition.fetch.bytes": "3145728",
    "consumer.override.group.id":"xy.toto.prim",
    "batch.size":5000                
}

Please let me know if you have any suggestions on how to find the root cause for slow performance in Test 3.

Note 1 : While comparing the Test1 and Test2, we suspected that CLOB field might be the cause for poor write performance but it wasn’t the case. Even after removing CLOB field, we got bad performance and it was only once we corrected the data type for STUDENT_ID field on Kafka side, we got the good performance.

Note 2 : We are running this connector from last two years and haven’t faced issue in last two years and its the first time we are getting issues and had to change the data type of STUDENT_ID to get good performance

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.