2025.01.13 - Trying to understand the “purple lines” of the Rolling Cache Index Area
In the big picture it was just a tiny purple line, with one pixel summarizing the content of 9 KB of data. So the obvious questions always have been:
- How does the index area really look like, if painted at a one byte per pixel resolution?
- How do the changes (deltas) between two snapshots look (and feel)
- … and what could such paintings tell us?
- Are there 72 or 80 bytes per index item?
To recap, here are some observations and assumptions that I made and published during previous tests about the “high level” structure of the RollingCache.ccc file, based on Process Monitor file event recordings:
- The index area
- … that tiny purple line … is starting at “Offset: 80”
- … but has some additional TOC (table of contents) data from “Offset: 0”.
- Each index item seems to have a size of 72 or 80 bytes
- … due to a suspected 64 bit alignment.
- … that tiny purple line … is starting at “Offset: 80”
- The blob area
- … seems to start after the index … at “Offset: 536,877,473” (512 MB).
- Each blob item seems to consist of three parts:
- A kind of TOC
- … with 28 bytes.
- Some kind of HEADER
- … with around 1 KB.
- The blob (chunk) BODY
- … with sizes from 32 bytes to 40 MB or more.
- The write operations seem to take place in 4 MB chunks.
- … with sizes from 32 bytes to 40 MB or more.
- A kind of TOC
Before I showing the actual delta content paintings, I want to list what I would expect to see in an index item of a cache with a Least Recently Used (LRU) replacement policy:
- Must have:
- A marker for the least recent usage.
- A pointer (offset) to the beginning of the actual blob item.
- The length information for the blob item.
- Keeping that only in the blob item TOC is possible
- … but it would be very bad for performance, as it would require multiple read operations.
- Besides that, the recorded file access events show, that the sim “knows” the length of each blob item
- … without reading the blob item TOC first.
- Keeping that only in the blob item TOC is possible
… and since the above would only justify 3 * 4 byte … and not 72 or more … there should be some additional informantion:
- Nice to have:
- Some form of checksum … for the blob item?
- Like with the length … duplicating that in the index item would make sense for performance reasons.
- Some form of checksum … for the index item?
- To detect bit rot and data corruption in big files.
- Modern file systems use that in their “index” structure for a long time.
- This would be a very good idea, because a currupt index needs to be detected (and deleted)
- … instead of reading and feeding corrupt data into the sim and the GPU.
- Due to the benefits of 64 bit alignment on 64 bit CPUs
- … I would expect most information to use and be aligned with 64 bits.
- That also does allow to scale the size of the cache file into the Terabyte range.
- Some form of checksum … for the blob item?
The following test runs have been used to paint the delta content paintings:
- Test 1
- Was performed at 2024.12.15-12.10.
- It created and zero-filled the RC, and then inserted the first (essential) data.
- It thereby created the “iconic” blue text area at the beginning of the blob area
- … which provided (asset index?) blob items that have been reused (kept alive) in all subsequent test runs.
- Test 2
- Was performed on 2024.12.15-15.17
- … 3 hours … or 10,800+ seconds later … so very shortly after Test 1.
- Due to the almost empty cache no data from Test 1 had to be deleted
- … but a lot of (early) data must have been reused.
- Was performed on 2024.12.15-15.17
- Test G
- Was performed on 2024.12.22-17.35
- … 7 days … or 168+ hours … or 604,800+ seconds … or 604,800,000,000+ microseconds … after Test 1
- Was performed on 2024.12.22-17.35
- Test H
- Was performed on 2024.12.23-07.58
- … shortly after Test G.
- Here 30 GB of new data have wiped out all old data from the RC.
- Was performed on 2024.12.23-07.58
The first painting compares Test 1 to Test 2. The colors are as in previous delta content paintings: a “green” pixel marks identical data, a “red” pixel indicates differences, and “white” represents shared “zeros”.
I picked 144 pixels per row for those paintings. This makes it easy to answer the “72 vs 80 bytes” question.
- The obvious column structure of the images tells us right away,
- … that each index item uses 72 bytes.
- There are a lot of shared “zeros” in Test 1 and Test 2
- … so maybe later tests might indicated their usage.
Beware … in the following text the word “column” means “column within one of the index item columns”, as there are two index items per pixel row.
The differences (red) are the interesting parts:
- Two red columns indicate updates to index items
- … where the blob items did not change.
- So the pointer (offset) to the beginning of the actual blob item must be in one of the green columns.
- … where the blob items did not change.
- One red column … is 4 bytes long.
- The second red column … is 2 bytes long.
Where and what is the LRU marker?
There are two common techniques for a least recent usage marker:
- A high resolution timestamp.
- A constantly growing counter
- … sometimes in the form of vector clocks.
Since the word “recent” is a concept of time one does expect (1) to be the most likely candidate. A 64 bit integer can easily hold a timestamp in microseconds or nanoseconds … and filesystems and distributed databases do use such timestamps.
The time resolution will restrict the speed at which new information can enter the cache, an so seconds can be rules out, and even milliseconds feel very very “risky”.
A time delta of 3 hours … or 10,800 seconds … or 10,800,000 (0xA4CB80) milliseconds would at least affect 3 bytes of data. Once we go to the more “safe” microseconds we should already see at least 5 bytes changing constantly. But the largest red column (so far) is only 4 bytes.
So the timestamp hypothesis seems highly unlikely, because to the right of the 4 red bytes we can see many white zeros. But a 64 bit timestamp should have at least common green bytes, encoding the month and years parts of a timestamp. It also seems highly unlikely that the sim would, without need, choose some custom time epoch.
So either the FS2024 team decided to use a very uncommon and risky (short) timestamp … or they decided to go with option (2), a constantly growing counter. Since the sim can (and as far as I can see in the file events … does) serialize all access to the cache, using a constantly growing counter would be a cheap and robust, and a very good solution.
Option (2) would easily fit into 2 bytes at this early life of the cache. After all the cache has not seen that many items yet.
So where is the LRU marker?
The next painting compares Test 1 to Test G, which was the last test that still contained the “iconic” shared blue text area.
After a heavy usage of the cache over a period of 7 days …
- We still basically see the same picture
- … which makes sense, as this early data is still in the cache, unchanged.
- The main difference is … the 2 byte red column
- … has turned into a 3 byte red column.
- This would fit the logic of a constantly growing counter.
- As the 4 byte red column is still 4 byte
- … but it is always changing everywhere
- … it is highly likely, that we are looking at a 32 bit checksum (hash) of the index item.
So what are the four (or more) green columns then? They highly likely contain the following essential information:
- A pointer (offset) to the beginning of the actual blob item.
- This most likely is the “fixed” 4 byte column, as early data is written to roughly the same offset.
- 512 MB … is Offset 536,877,473 … or 0x2000_0000
- That would also fit the white “zero” pattern at the top of those green columns.
- This most likely is the “fixed” 4 byte column, as early data is written to roughly the same offset.
- The length information for the blob item.
- This most likely is the 2 or 3 byte data, because file length is not always the same.
Now a cache index needs to know “what” is inside each blob item. FS2024 on launch reports, that it does build a virtual file system (VFS). This takes a long time, because a couple of million items needs to get placed inside a tree … most likely based on a unified file (URL) path notation. This requires a lot of memory.
An index needs to be more efficient. And so I would consider it likely that one of the remaining green columns contains …
- Some form of checksum … for the blob item.
That still leaves (at least) 2 green columns in the realm of mystery.
What happened in the white zero areas?
The next painting compares Test 1 to Test H, after the oldest data in the RC has been wiped out.
- This mainly confirms, that when the blob item of and index item changes
- … then everything turns red (is different).
- So it is highly likely that the index item contains some more “nice to have” information:
- Some form of checksum … for the blob item.
- That blob checkum can only account for 4 or 8 bytes … but not for all changed bytes.
- I would assume 8 bytes,
- … because the
hashvalues inside thelayout.jsonfiles stored on the regular filesystem do not fit into 32 bit.
- … because the
- Some form of checksum … for the blob item.
However, some interesting questions remain:
- There is now actual (red) data in the white zero parts.
- Maybe this is related to …
- data deletions?
- larger blob items than before?
- whatever … I have no clue.
- Maybe this is related to …
To summarize the observations presented above:
- The index area uses 72 Bytes for each index item.
- Each index item is highly likely to contain:
- A constantly growing counter as the marker for the least recent usage.
- This is a very good choice for this purpose.
- A pointer (offset) to the beginning of the actual blob item.
- The length information for the blob item.
- Some form of checksum … for the index item.
- Most likely a 32 bit checksum.
- Some form of checksum … for the blob item.
- Most likely a 64 bit checksum … like the
hashinlayout.jsonfiles.
- Most likely a 64 bit checksum … like the
- A constantly growing counter as the marker for the least recent usage.
- Each index item also contain … some additional information
- … where the purpose can not be deducted from delta content paintings.
- The usage of checksums for the index items as well as the blob items is a very good decision.
- This makes storing files in the
RollingCache.cccmore robust than storing them in the normal filesystem.
- This makes storing files in the
The above will become more important once I will try to outline (and figure out) what a Brave New World caching concept should introduce to the existing system.





