Schema IDs got changed when doing dump and import of _schemas data

We had migrated the Schema Registry from one EKS to another EKS. We had earlier taken a dump of “_schemas” topic data to a text file. Like this:

kafka-console-consumer \
  --from-beginning \
  --topic _schemas \
  --bootstrap-server source-broker:9092 \
  --property print.key=true \
  --property key.separator=### > schemas_dump.txt

We wrote a python script that reads the text file “schemas_dump.txt” line by line and does a POST request to new Schema Registry URL to dump the data.

The POST command is something like this below -

curl --location 'http://SHCHEMA_REGISRTY_URL_elb.amazonaws.com/subjects/example.source.abandon_cart-value/versions/' \
--header 'Content-Type: application/vnd.schemaregistry.v1+json' \
--data '{"schema":"{\"type\":\"record\",\"name\":\"FlatEvent\",\"namespace\":\"com.example.map.cartabandon.model\",\"fields\":[{\"name\":\"eventType\",\"type\":\"string\"},{\"name\":\"eventGuid\",\"type\":{\"type\":\"string\",\"logicalType\":\"uuid\"}},{\"name\":\"deviceId\",\"type\":\"string\"},{\"name\":\"customerId\",\"type\":\"string\"},{\"name\":\"dataTime\",\"type\":\"long\"},{\"name\":\"deviceDateTime\",\"type\":\"long\"},{\"name\":\"dateTime\",\"type\":\"long\"},{\"name\":\"itemId\",\"type\":\"long\"},{\"name\":\"cartId\",\"type\":\"long\"},{\"name\":\"productId\",\"type\":\"long\"},{\"name\":\"qty\",\"type\":\"int\"},{\"name\":\"siteId\",\"type\":[\"null\",\"int\"]}]}","deleted":false}
}'

After completion of the python script, the schema IDs got changed. Please help here as to how the schema ID got changed here? It was not the same from old schema IDs.

Please help me understand how the Schema IDs got changed. It will be highly appreciated.

I don’t see a schema ID included in the example request. Take a look at the steps here and note the version and id fields in step 3.

1 Like

Hi @dtroiano - Thanks for the reply. We passed the “subject name” (example.source.abandon_cart-value) thinking it will automatically consider picking the same Schema IDs from the text file (dump). So you are saying that we had to explicit mention the schema id key/value payload in POST command? Did our curl command without Schema ID resulted in change in Schema IDs?

Thanks once again.

Yes, specify it explicitly. You might get the same IDs if you don’t specify it (I assume they start at 1?) but it’s safer to be explicit about it as in the docs.

I forgot to mention but take some caution consuming the _schemas topic directly. I think that your approach will work as long as you are only considering records with "keytype":"SCHEMA" and you haven’t deleted any subjects. But, e.g., if you’ve deleted subjects then IDs can get reused. The Python script could handle this case but it adds some complexity.

1 Like

@dtroiano Thank you :+1:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.