While working with a JDBC Sink Connector, I noticed that some SMT choke on a tombstone (null value) while others handle tombstones fine.
For example:
"transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp",
"transforms.flattenKey.type": "org.apache.kafka.connect.transforms.Flatten$Key",
"transforms.flattenKey.delimiter": "_",
"transforms.valueToJSON.type": "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value",
"transforms.valueToJSON.schemas.enable": "false",
"transforms.valueToJSON.predicate": "tombstone",
"transforms.valueToJSON.negate": true,
"transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
"transforms.wrapValue.field":"matrix",
"transforms.wrapValue.predicate": "tombstone",
"transforms.wrapValue.negate": true,
"transforms.addTimestamp.type": "org.apache.kafka.connect.transforms.InsertField$Value",
"transforms.addTimestamp.timestamp.field": "message_timestamp",
"predicates": "tombstone",
"predicates.tombstone.type": "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"
To avoid the cryptic error “java.lang.ClassCastException: class java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” when processing a tombstone record, I had to add a negated predicate of RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need to add that to InsertField.
Digging in the source, I find that InsertField handles the case where key or value is null:
^ Thanks to this, there’s no need to add a predicate to skip InsertField$Value when value is null.
It would help if the docs listed how the individual SMTs behave when dealing with a null key/value. I don’t find any mention of that here:
Of course we can always find this out by trial and error or by studying the source code. But if Confluent were to make a best practice of describing how its SMT handle null key/value, that would have two benefits:
- Save developers time when working with Confluent’s official list of SMT
- Inspire developers who write their own SMT to likewise document how they handle null key/value
Perhaps a standard way of dealing with nulls (“no-op if key/value is null”) could be promoted, and SMT authors would only need to document their behavior when it differs.