Is anyone aware of an existing Single Message Transform plugin for Connect which can be used to trim trailing spaces from character/string fields? I’d love to be able to trim the values in the source connector, but I’m not having any luck finding a plugin that already exists for this, and I’m too lacking in Java development chops to feel confident writing my own.
Background: I’m using the Debezium SQL Server source connector producing CDC events from several tables in a 3rd-party vendor’s database. I then sink those events to PostgreSQL using the JDBC Sink Connector.
Problem: The source SQL Server database uses the CHAR data type liberally, instead of VARCHAR. As a result, when the Debezium connector reads CDC data, the values from CHAR columns are padded with whitespace to their max width. This space padding thus propagates to the PostgreSQL sink, and makes it challenging for users to query the data b/c they need to account for the padding.
i.e., from the PostgreSQL sink, a user can’t simply do:
SELECT * FROM table WHERE foo = 'bar';
…they need to know that foo has a max length of 10 characters and instead do:
SELECT * FROM table WHERE foo = 'bar '; -- imagine there's 7 spaces there
or, more problematically:
SELECT * FROM table WHERE foo LIKE 'bar%';