Global state store having uneven disk usage across pods

We are using rocks db based global KTable state store. We have 4 pods for our stateful service and it is observed that all 4 pods have very different disk usage.
When pods start they all all start from few MBs but over few hours 1 pods may reach 20 GB disk usage where as other pod may have only 4 GB.
We also wrote a Punctuator to print actual number of keys every 10 mins and all threads across pods report same number of keys.
Since we are writing to global topic directly and all pods are reading from it, shouldn’t all pod have nearly equal disk usage :thinking:

We also took dump on all SST files from the pods and parsed them using org.rocksdb.SstFileReader , it confirmed unique number of key in both the pods are same. However what was different was number of sst files, number of records / entries in them and overall size of all sst file.
Also once we restart the pods the usage goes to nearly zero and starts building up again. But number of key reported are same as before the restart.

Any idea what might be going on ?

we were able to fix the issue. One place in code we were calling

globalStore.all()..forEachRemaining(
              e -> { 

and looks like .all() returns iterator to which we were losing handle in lambda call and not closing it.
Once iterator was closed old stale sst file were also cleared up. Below link proved useful: Delete Stale Files · facebook/rocksdb Wiki · GitHub

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.