RollingCache.ccc performance debugging and tuning … How?

nenenui · January 8, 2025, 11:51pm

2025.01.08 - v1.2.8.0 - Rolling Cache read-to-write ratio

What is a clear sign of a good cache design? I already hinted at this a few times before:

A high cache “hit to miss” ratio … or
- … a high read access, with a very low write access.

When I gave my opinion about the CDN caches in FS2024, I claimed:

A good CDN cache can reach a hit ratio of more than … 1 Million (hits to 1 miss).
FS2024 seems to reach CDN hit ratios of around … 30 (hits to 1 miss).

In this analysis I want to present information that will allow to better understand this question:

How good is the RollingCache.ccc file in reducing network access?
- This basically comes down to the “read vs write” ratio in real usage.

The following table contains the relevant metrics I collected with the Process Monitor event recordings.

Beware … As I noticed and outlined in a previous analysis, those numbers are not “byte perfect” as the Process Monitor event recordings are not able to capture 100% of all event. But the numbers are still “pretty close to reality”.

The meaning of the table columns is:

Test
- The identifier which I used in the tests that have been published in previous posts.
- The identifier prefix …
  - z … includes the initial RC zero-fill write activities.
  - i … is the initial sim start after the fresh installation of the Aviator Edition.
  - s … tests which only performed a simple sim start … double-click to main menu.
  - c … simple aircraft configuration and UI test … no flights.
  - d … short drone camera (free) flights.
  - f … “short” free flight.
  - t … dedicated TIN landscape test flights … with the drone or an aircarft.
  - l … “long” free flights with a duration of multiple hours.
R:W ratio
- The “read MB” divided by “write MB”.
RC MB read
- The MB which have been read from the RollingCache.ccc.
RC MB write
- The MB which have been written to the RollingCache.ccc.
IP MB read
- The MB which have been read from the servers over the TCP-IPv4/6 network connections.

Test	R:W ratio	RC MB read	RC MB write	IP MB read
z 1	n.a.	488	9,323	2,338
i 1	0.7	488	666	2,338

s A	258.4	1,292	5	47
s I	0.7	414	571	650
s J	0.5	385	704	1,112

c 2	0.9	2,793	2,924	3,353
c 3	3.1	4,291	1,391	1,602
c 4	1.6	2,911	1,853	2,193
c 5	1.9	4,920	2,460	3,340
c 6	1.5	5,029	3,318	4,085
c 7	2.3	5,307	2,320	2,846
c 8	0.9	6,113	6,553	7,685

d 9	1.3	7,888	6,242	6,750
d C	1.5	9,323	6,170	6,728
d D	1.3	12,331	9,673	12,430
d E	1.0	9,667	9,689	12,042
d F	1.0	10,700	10,931	12,676
d G	1.6	19,924	12,069	12,766

f B	3.4	21,738	6,329	5,086
f N	2.0	15,254	7,472	12,893

t H	1.0	31,693	30,718	31,015
t L	1.0	20,631	20,101	24,471
t M	7.8	17,382	2,213	3,558
t O	1.6	10,386	6,526	6,774

l K	1.0	20,631	20,101	24,471
l P	0.9	21,106	24,308	25,643
l Q	0.4	29,754	68,357	38,068
l R	0.6	13,379	23,124	13,314

What do these numbers say? I want to be careful her, because …

As mentioned above … the numbers are “wrong” (always too low)
- … due to the way Process Monitor collects this data.
  - Perhaps the single threaded recording process at some point can not keep up with the multithreaded actions of the system.
  - Sadly the documentation is not fully clear about this … but definitely see that events are missing in the Test 1 recording.
The “R:W ratio” is not the same metric as a cache “hit:miss ratio”!
- hit:miss … is usually defined on a per “item request” and not per “byte read” level.
- Additionally the blob items in FS2024 are of (greatly) different size.
  - They clearly are not based on a fixed block size.
Furthermore my tests are obviously not representative of the “average FS2024 usage”.
- Each test tried to tigger a certain “response” and most have not been normal flights.
- But these test data are all I have.
Also, the RC is not the only cache in the local FS2024 (process).

Beware … the following does contain some highly speculativ “claims” which I marked with … “?? Idea”

After all these notes of caution, I want to give my comments on the numbers based on each of the test categories, as they should show (somewhat) comparable results.

One common RC file

Test 1 to P
- All are using the 16 GB RC file that was initially created during Test 1.
Test 1 to G
- All share the “blue” text (scenery index?) area at the beginning of the blob item section,
  - … which was created (cached) in Test 1.
Test 8 to P
- 16 GB RC is now 100% full.
  - A gzip compression has no significant benefit
  - … and there are no “zero” blocks visible at the end of the file anymore.
- Fragmentation is starting, as white “zero” lines show up all over the blob item section.

Category z and i … zero-fill and initial sim start

Test 1
- … shows a “R:W ratio” of 0.7 (if the “zero-fill” effect is removed).
- This was the only test related to the installation process of FS2024.
- “RC MB read” is very low … because the cache was just created from zero.
- “RC MB write” is very large … mainly because of the initial “zero-fill”.
  - Only 8,657 initial “zero-fill” events, each with a 1 MB write, have been recorded.
    - That leaves around 666 MB of “RC MB write” for the non-zero-fill.
    - This numbers seems plausible, as it matches very closely the actual new bytes inside the RC file.
- The 488 MB “RC MB read” and the actual 666 MB of “RC MB write” are very close.
  - This again suggests at Read after Write usage pattern.
    - A network download first copies (writes) all received data to the RC.
    - Once all data is in the RC the sim will read it for further processing, e.g.
      - building the virtual file system (VFS) tree,
      - reading JSON index files to decided which other files to download, …
- “IP MB read” is far bigger than the final content in the RC
  - … which contains ca. 664 MB of blob items plus 0.9 MB index items.
  - ?? Idea 1
    - Around 2.3 GB of data have been written to the filesystem … and not the RC.
      - So that would give a good (perfect!) match for the size of “IP MB read”
    - But where did the 664 MB of RC come from then?
    - ?? Idea 2
      - The initial process downloads a lot of “blue” index (JSON, XML) text assets.
        
        This sort of data allows very high compression over HTTPS
        
        … but the data in the RC and the filesystem seems uncompressed (JSON, XML, etc.)
  - ?? Idea 3
    - However, in combination the above suggests that network downloads are first cached in memory
      - … and data for the regular filesystem is not buffered in the RC first.
      - Otherwise there are 2.3 GB of filesystem write
        
        … that could not be accounted for with matching writes inside the RC events.

This is the content classification of Test 1 … with the fresh (empty) RC and the “iconic” blue area, that I already showed in other posts:

Category s … a simple sim start

Test A
- … shows an extremly high R:W ratio of around 258.
- This is a “perfect” start where all relevant data was already in the RC.
- The visible downloads are related to the (unnecessary) rewrites of index files outside the RC.
Test I and J
- … show a low R:W ratio of 0.5.
- The previous Test H did wipe out all important (1.3 GB) of “sim start essential” data.
- Due to slow servers the necessary files could not be all downloaded during Test I
  - … and the missing downloads then also ruined the “R:W ratio” of the next Test J.
- To be clear … the sim launches without that data into the main menu
  - … but in the background it still does (try to) download the “sim start essential” data.
- Those very low “R:W ratio” numbers are caused by the Least Recently Used (LRU) cache replacement policy.

Category c … aircraft configuration

Test 2 to 8
- Those tests where designed to try to find out, if the cache uses a trivial FIFO cache replacement policy.
  - No … it does not, as it turned out.
- All tests tried to trigger and allow high (aircraft model) content reuse.
- So “R:W ratio” numbers above 1 seem easy to explain.
  - It did intentionally look at 1 or 2 aircraft from previous tests
    - … and revisited one (of those) aircraft more than once during each test.
- The higher IP download numbers in all (c) tests seem related to the sound files, which get stored outside the RC.
Test 2
- … only shows a low “R:W ratio” of 0.9.
- Being the first test, I only revisited one aircraft
- ?? Idea 4
  - The poor ratio could originate from the fact that some aircraft features are downloaded and cached, which are not needed (read) during the aircraft config menu.
    - Those features might be other LOD levels etc, which would be needed after clicking “Start Flight”.
Tests 3 to 7
- … show a high “R:W ratio” of 1.5 to 3.1.
- In those tests I revisited 3 or 4 aircraft … and that is reflected in R:W ratios.
Test 8
- … again only shows a low “R:W ratio” of around 0.9.
- ?? Idea 5
  - In this test I looked at twice as many aircraft as in previous tests.
    - This is reflected in the IP download numbers.
  - Additionally I only revisited one aircraft.
  - So the “R:W ratio” seems plausible and I would explain in like in Test 2 and Idea 4.

Category f … “short” free flight.

Test B
- … shows a high “R:W ratio” of 3.4 … which is plausible.
- Flying around the small island of Maui for around 1 hour
  - … with the drone and a previously used (cached) aircraft.
  - Many places have been visited more than once.
Test N
- … shows a “R:W ratio” of 2.0 … which is plausible.
- This 15 minute flight at KIKK was mainly a check of static ground aircraft
  - … using the drone camera a lot while inspecting the same aircraft over and over again.
- I have no idea how to explain the high IP download
  - … as there have not been any significant writes outside the RC.

Category t … TIN landscape test flights

Test H and L
- … show a “R:W ratio” of 1.0 to 1.6 … and that is plausible.
- The tests took place in a region that was not yet cached
  - … and tried to download a large amount TIN landscape without revisiting the same place again.
Test M
- … shows a “R:W ratio” of 7.8 … and that is plausible too.
- Here I revisited LA after a previous test in the same region
  - … and then I checked the same place over and over again
  - … hoping to trigger some “TIN healing” (which did not happen).
Test O
- … shows a “R:W ratio” of 1.6 … and that is also plausible.
- I did revisit the KLAX airport and the LA landscape.
- Some data (ground vehicles? static aircraft?) have been already in the cache.

Category l … “long” multi-hour free flights

Test K
- … shows a “R:W ratio” of 1.0.
- This was a 3.5 hour flight, at 70 ktas and 3,000 ft … from KORD (US) to KIKK (US).
- Sometimes the servers seemed stressed
  - … e.g. downloads of static ground aircraft took veeeeery long.
- A lot of data originated in the TIN landscape of Chicago.
Test P
- … shows a “R:W ratio” of 0.9.
- This was a 4 hour flight, at 70 ktas and different altitudes … from KIKK (US) to KORD (US).
- I explained all the details of this flight in the posting:

Test Q
- … shows a “R:W ratio” of 0.4.
- This was a 11.5 hour flight, at Mach 0.98 and 20,000 ft … from EDDM (DE) to VVTS (VN).
- This test used live weather and live air traffic.
- Sometimes the servers seemed “gone”
  - … e.g. ground textures where basically missing for hours.
- “IP MB read” matches “RC MB read” … but “RC MB write” is almost twice as high.
- ?? Idea 6
  - The very poor “R:W ratio” might be related to a lot of cache reorg and “zero-fill” block cleanup.
    - I do consider such “zero-fill” unnecessary for a “game”, and I will return to this topic in a future post.
Test R
- … shows a “R:W ratio” of 0.6.
- This was a 7 hour flight, at 150 ktas and 15,000 ft … from FACT (ZA) to FNBL (AO).
- This test used live weather and live air traffic.
- Sometimes the servers seemed “busy”
  - … e.g. ground textures with too much blur (“wrong” LOD).
- “IP MB read” matches “RC MB read” … but “RC MB write” is almost twice as high.
  - I would again explain this with Idea 6.

Just to leave one example of the Process Monitor recordings. This is for Test R and it shows the network and file access results:

Since this was (again) such a text heavy post, I want to leave at least one pretty picture from Test R and the biome around FNBL in Angola.

To me the picture above is both:

The magic of FS2024 and the reason why I will never go back to FS2020.
The challenge for a streaming-caching system design, that is not up to the task yet.
- Close to the ground the landscape data volume goes up dramatically
  - … and today FS2024 usually can not keep up (reliably).

To summarize the findings from above:

It is highly likely that network downloads are initially cached in memory … and not inside the RC.
- Only data intended for the RC is then written to the RC.
- Data intended for the regular filesystem is not buffered in the RC first.
The observed Read:Write ratio numbers of the RC fit the different test characteristics
- … and the Least Recently Used (LRU) cache replacement policy.
The RC remembers too much “useless” data and forgets too much “important” data.
- Landscape data is usually (naturally) rarely reused
  - … which results in “R:W ratio” number of 1.0 or below.
- Other categories of data (airports, aircraft, etc.) … can result in higher “R:W ratio” numbers
  - … but a RC with a small size and a LRU policy, will struggle to keep them in the RC.
- Especially the FS2024 speciality, flying low and enjoying the new ground details and bioms,
  - … will result in landscape wiping out important data that should stay in the RC.
Overall real use “R:W ratio” number for the RC are barely above 1.0.
- The RC mostly acts like a buffer (queue) and only rarely like a real cache.
The current RC design does not seem to take away significants load from the API, CDN or Origin servers.