Schema IDs got changed when doing dump and import of _schemas data

shesh11 · 27 March 2024 12:29

We had migrated the Schema Registry from one EKS to another EKS. We had earlier taken a dump of “_schemas” topic data to a text file. Like this:

kafka-console-consumer \
  --from-beginning \
  --topic _schemas \
  --bootstrap-server source-broker:9092 \
  --property print.key=true \
  --property key.separator=### > schemas_dump.txt

We wrote a python script that reads the text file “schemas_dump.txt” line by line and does a POST request to new Schema Registry URL to dump the data.

The POST command is something like this below -

curl --location 'http://SHCHEMA_REGISRTY_URL_elb.amazonaws.com/subjects/example.source.abandon_cart-value/versions/' \
--header 'Content-Type: application/vnd.schemaregistry.v1+json' \
--data '{"schema":"{\"type\":\"record\",\"name\":\"FlatEvent\",\"namespace\":\"com.example.map.cartabandon.model\",\"fields\":[{\"name\":\"eventType\",\"type\":\"string\"},{\"name\":\"eventGuid\",\"type\":{\"type\":\"string\",\"logicalType\":\"uuid\"}},{\"name\":\"deviceId\",\"type\":\"string\"},{\"name\":\"customerId\",\"type\":\"string\"},{\"name\":\"dataTime\",\"type\":\"long\"},{\"name\":\"deviceDateTime\",\"type\":\"long\"},{\"name\":\"dateTime\",\"type\":\"long\"},{\"name\":\"itemId\",\"type\":\"long\"},{\"name\":\"cartId\",\"type\":\"long\"},{\"name\":\"productId\",\"type\":\"long\"},{\"name\":\"qty\",\"type\":\"int\"},{\"name\":\"siteId\",\"type\":[\"null\",\"int\"]}]}","deleted":false}
}'

After completion of the python script, the schema IDs got changed. Please help here as to how the schema ID got changed here? It was not the same from old schema IDs.

Please help me understand how the Schema IDs got changed. It will be highly appreciated.

dtroiano · 27 March 2024 13:29

I don’t see a schema ID included in the example request. Take a look at the steps here and note the version and id fields in step 3.

shesh11 · 27 March 2024 16:11

Hi @dtroiano - Thanks for the reply. We passed the “subject name” (example.source.abandon_cart-value) thinking it will automatically consider picking the same Schema IDs from the text file (dump). So you are saying that we had to explicit mention the schema id key/value payload in POST command? Did our curl command without Schema ID resulted in change in Schema IDs?

Thanks once again.

dtroiano · 27 March 2024 17:47

Yes, specify it explicitly. You might get the same IDs if you don’t specify it (I assume they start at 1?) but it’s safer to be explicit about it as in the docs.

I forgot to mention but take some caution consuming the _schemas topic directly. I think that your approach will work as long as you are only considering records with "keytype":"SCHEMA" and you haven’t deleted any subjects. But, e.g., if you’ve deleted subjects then IDs can get reused. The Python script could handle this case but it adds some complexity.

shesh11 · 27 March 2024 17:57

@dtroiano Thank you

system · 3 April 2024 17:57

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
1 Schema 2 ids. Is it possible? Schema Registry	3	4114	31 March 2022
How to "apply" schema to current Kafka topic (raw string json data)? Schema Registry	1	16	26 June 2025
Schema Registry + PostgreSQL CDC Source Connector Kafka Connect	4	4209	6 May 2021
Schema registry for auto load schemas Schema Registry	1	3501	31 March 2022
Replicating Schema Registry to a new Kafka Cluster while preserving Schema IDs Schema Registry	0	141	6 September 2024

Schema IDs got changed when doing dump and import of _schemas data

Related topics