Suggestion: SMT support for null key/value should be documented

Ben · 23 September 2021 13:41

While working with a JDBC Sink Connector, I noticed that some SMT choke on a tombstone (null value) while others handle tombstones fine.

For example:

  "transforms": "flattenKey,valueToJSON,wrapValue,addTimestamp",
  "transforms.flattenKey.type": "org.apache.kafka.connect.transforms.Flatten$Key",
  "transforms.flattenKey.delimiter": "_",
  "transforms.valueToJSON.type": "com.github.jcustenborder.kafka.connect.transform.common.ToJSON$Value",
  "transforms.valueToJSON.schemas.enable": "false",
  "transforms.valueToJSON.predicate": "tombstone",
  "transforms.valueToJSON.negate": true,
  "transforms.wrapValue.type":"org.apache.kafka.connect.transforms.HoistField$Value",
  "transforms.wrapValue.field":"matrix",
  "transforms.wrapValue.predicate": "tombstone",
  "transforms.wrapValue.negate": true,

  "transforms.addTimestamp.type": "org.apache.kafka.connect.transforms.InsertField$Value",
  "transforms.addTimestamp.timestamp.field": "message_timestamp",

  "predicates": "tombstone",
  "predicates.tombstone.type": "org.apache.kafka.connect.transforms.predicates.RecordIsTombstone"

To avoid the cryptic error “java.lang.ClassCastException: class java.util.HashMap cannot be cast to class org.apache.kafka.connect.data.Struct” when processing a tombstone record, I had to add a negated predicate of RecordIsTombstone for ToJSON (community SMT) and HoistField, but did not need to add that to InsertField.

Digging in the source, I find that InsertField handles the case where key or value is null:

github.com

a0x8o/kafka/blob/f8237749f6ad34c09154f807e53273be64e1261e/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L130


      
          
          
    if (staticField != null && staticValue == null) {
                  throw new ConfigException(ConfigName.STATIC_VALUE, null, "No value specified for static field: " + staticField);
              }
          
          
    schemaUpdateCache = new SynchronizedCache<>(new LRUCache<>(16));
          }
          
          
@Override
          public R apply(R record) {
              if (operatingValue(record) == null) {
                  return record;
              } else if (operatingSchema(record) == null) {
                  return applySchemaless(record);
              } else {
                  return applyWithSchema(record);
              }
          }
          
          
private R applySchemaless(R record) {
              final Map<String, Object> value = requireMap(operatingValue(record), PURPOSE);

^ Thanks to this, there’s no need to add a predicate to skip InsertField$Value when value is null.

It would help if the docs listed how the individual SMTs behave when dealing with a null key/value. I don’t find any mention of that here:

Of course we can always find this out by trial and error or by studying the source code. But if Confluent were to make a best practice of describing how its SMT handle null key/value, that would have two benefits:

Save developers time when working with Confluent’s official list of SMT
Inspire developers who write their own SMT to likewise document how they handle null key/value

Perhaps a standard way of dealing with nulls (“no-op if key/value is null”) could be promoted, and SMT authors would only need to document their behavior when it differs.

rmoff · 23 September 2021 13:53

Hey @Ben, thanks for taking time to write up the issue.

It’d be a good idea to log it with the authors of the relevant SMTs though, so raise a KAFKA JIRA for Single Message Transforms that ship in Apache Kafka, or with the relevant github repo e.g. Issues · jcustenborder/kafka-connect-transform-common · GitHub

Ben · 23 September 2021 14:07

Thanks, I’ve filed a KAFKA JIRA ticket for it:
https://issues.apache.org/jira/browse/KAFKA-13320

system · 23 October 2021 14:07

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SMT - Cast field to a different type AND preserve the original value Kafka Connect	3	5582	8 February 2021
Kafka Connect Jdbc sink with protobuf convertor - How to use null values? Kafka Connect	5	3732	17 February 2022
KTable aggregation with tombstones Kafka Streams	9	560	9 September 2024
Sink from tables with no primary key Self-Managed Connectors	2	4144	29 March 2023
Connect dropping object item from value Kafka Connect	1	18	6 November 2024

Suggestion: SMT support for null key/value should be documented

Related topics