We have data replication solution using Kafka Connect. Data is being read using Debezium SqlServerConnector into multiple topics, and then written to PostgreSQL using JdbcSinkConnector. Each topic/table has dedicated sink.
Now we are having error on one of the sinks:
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00
One of replicated records has some non-printable characters in text column. Message is placed in Kafka topic successfully, but then it fails when sink connector is writing into target table.
Target tables are created automatically, we are using “auto.create”: “true”
Source field (SQL Server): NVARCHAR(400)
Target field (PostgreSQL): TEXT
Now we need to fix it. I see two options:
- (preffered) fin a way to write data as is, including non-printable characters
- remove non-printable characters, either in source connector, or in sink.
I was looking for some transformation that we could use, but no luck so far.
What output format (
value.converter) are you using in the
JdbcSinkConnector configs (or worker if not specified in the connector config)? For this kind of error, I’d recommend starting with these and ensuring that source and sink are in agreement.
E.g., to use Avro for both key and value, both sides would use the same converters (along with any auth settings needed for Schema Registry):
"key.converter.schema.registry.url": "<SR endpoint>",
"value.converter.schema.registry.url": "<SR endpoint>",
Hope this helps!
Thanks @dtroiano ,
We are using
io.confluent.connect.avro.AvroConverter for both key and value, on both source and sink.
It looks to me that either Postgres is unable to accept 0x00 character, or JdbcSinkConnector is unable to produce correct INSERT statement with such value.
BTW, I have found an open issue in JdbcSinkConnector that might be related: 0x00 never supported in Postgres, automatic stripping · Issue #1216 · confluentinc/kafka-connect-jdbc · GitHub