RollingCache.ccc performance debugging and tuning … How?

2025.01.08 - v1.2.8.0 - Rolling Cache read-to-write ratio

What is a clear sign of a good cache design? I already hinted at this a few times before:

  • A high cache “hit to miss” ratio … or
    • … a high read access, with a very low write access.

When I gave my opinion about the CDN caches in FS2024, I claimed:

  • A good CDN cache can reach a hit ratio of more than … 1 Million (hits to 1 miss).
  • FS2024 seems to reach CDN hit ratios of around … 30 (hits to 1 miss).

In this analysis I want to present information that will allow to better understand this question:

  • How good is the RollingCache.ccc file in reducing network access?
    • This basically comes down to the “read vs write” ratio in real usage.

The following table contains the relevant metrics I collected with the Process Monitor event recordings.

Beware … As I noticed and outlined in a previous analysis, those numbers are not “byte perfect” as the Process Monitor event recordings are not able to capture 100% of all event. But the numbers are still “pretty close to reality”.

The meaning of the table columns is:

  • Test
    • The identifier which I used in the tests that have been published in previous posts.
    • The identifier prefix …
      • z … includes the initial RC zero-fill write activities.
      • i … is the initial sim start after the fresh installation of the Aviator Edition.
      • s … tests which only performed a simple sim start … double-click to main menu.
      • c … simple aircraft configuration and UI test … no flights.
      • d … short drone camera (free) flights.
      • f … “short” free flight.
      • t … dedicated TIN landscape test flights … with the drone or an aircarft.
      • l … “long” free flights with a duration of multiple hours.
  • R:W ratio
    • The “read MB” divided by “write MB”.
  • RC MB read
    • The MB which have been read from the RollingCache.ccc.
  • RC MB write
    • The MB which have been written to the RollingCache.ccc.
  • IP MB read
    • The MB which have been read from the servers over the TCP-IPv4/6 network connections.
Test R:W ratio RC MB read RC MB write IP MB read
z 1 n.a. 488 9,323 2,338
i 1 0.7 488 666 2,338
s A 258.4 1,292 5 47
s I 0.7 414 571 650
s J 0.5 385 704 1,112
c 2 0.9 2,793 2,924 3,353
c 3 3.1 4,291 1,391 1,602
c 4 1.6 2,911 1,853 2,193
c 5 1.9 4,920 2,460 3,340
c 6 1.5 5,029 3,318 4,085
c 7 2.3 5,307 2,320 2,846
c 8 0.9 6,113 6,553 7,685
d 9 1.3 7,888 6,242 6,750
d C 1.5 9,323 6,170 6,728
d D 1.3 12,331 9,673 12,430
d E 1.0 9,667 9,689 12,042
d F 1.0 10,700 10,931 12,676
d G 1.6 19,924 12,069 12,766
f B 3.4 21,738 6,329 5,086
f N 2.0 15,254 7,472 12,893
t H 1.0 31,693 30,718 31,015
t L 1.0 20,631 20,101 24,471
t M 7.8 17,382 2,213 3,558
t O 1.6 10,386 6,526 6,774
l K 1.0 20,631 20,101 24,471
l P 0.9 21,106 24,308 25,643
l Q 0.4 29,754 68,357 38,068
l R 0.6 13,379 23,124 13,314

What do these numbers say? I want to be careful her, because …

  • As mentioned above … the numbers are “wrong” (always too low)
    • … due to the way Process Monitor collects this data.
      • Perhaps the single threaded recording process at some point can not keep up with the multithreaded actions of the system.
      • Sadly the documentation is not fully clear about this … but definitely see that events are missing in the Test 1 recording.
  • The “R:W ratio” is not the same metric as a cache “hit:miss ratio”!
    • hit:miss … is usually defined on a per “item request” and not per “byte read” level.
    • Additionally the blob items in FS2024 are of (greatly) different size.
      • They clearly are not based on a fixed block size.
  • Furthermore my tests are obviously not representative of the “average FS2024 usage”.
    • Each test tried to tigger a certain “response” and most have not been normal flights.
    • But these test data are all I have.
  • Also, the RC is not the only cache in the local FS2024 (process).

Beware … the following does contain some highly speculativ “claims” which I marked with … “?? Idea”

After all these notes of caution, I want to give my comments on the numbers based on each of the test categories, as they should show (somewhat) comparable results.

One common RC file

  • Test 1 to P
    • All are using the 16 GB RC file that was initially created during Test 1.
  • Test 1 to G
    • All share the “blue” text (scenery index?) area at the beginning of the blob item section,
      • … which was created (cached) in Test 1.
  • Test 8 to P
    • 16 GB RC is now 100% full.
      • A gzip compression has no significant benefit
      • … and there are no “zero” blocks visible at the end of the file anymore.
    • Fragmentation is starting, as white “zero” lines show up all over the blob item section.

Category z and i … zero-fill and initial sim start

  • Test 1
    • … shows a “R:W ratio” of 0.7 (if the “zero-fill” effect is removed).
    • This was the only test related to the installation process of FS2024.
    • “RC MB read” is very low … because the cache was just created from zero.
    • “RC MB write” is very large … mainly because of the initial “zero-fill”.
      • Only 8,657 initial “zero-fill” events, each with a 1 MB write, have been recorded.
        • That leaves around 666 MB of “RC MB write” for the non-zero-fill.
        • This numbers seems plausible, as it matches very closely the actual new bytes inside the RC file.
    • The 488 MB “RC MB read” and the actual 666 MB of “RC MB write” are very close.
      • This again suggests at Read after Write usage pattern.
        • A network download first copies (writes) all received data to the RC.
        • Once all data is in the RC the sim will read it for further processing, e.g.
          • building the virtual file system (VFS) tree,
          • reading JSON index files to decided which other files to download, …
    • “IP MB read” is far bigger than the final content in the RC
      • … which contains ca. 664 MB of blob items plus 0.9 MB index items.
      • ?? Idea 1
        • Around 2.3 GB of data have been written to the filesystem … and not the RC.
          • So that would give a good (perfect!) match for the size of “IP MB read”
        • But where did the 664 MB of RC come from then?
        • ?? Idea 2
          • The initial process downloads a lot of “blue” index (JSON, XML) text assets.
            • This sort of data allows very high compression over HTTPS
            • … but the data in the RC and the filesystem seems uncompressed (JSON, XML, etc.)
      • ?? Idea 3
        • However, in combination the above suggests that network downloads are first cached in memory
          • … and data for the regular filesystem is not buffered in the RC first.
          • Otherwise there are 2.3 GB of filesystem write
            • … that could not be accounted for with matching writes inside the RC events.

This is the content classification of Test 1 … with the fresh (empty) RC and the “iconic” blue area, that I already showed in other posts:

Category s … a simple sim start

  • Test A
    • … shows an extremly high R:W ratio of around 258.
    • This is a “perfect” start where all relevant data was already in the RC.
    • The visible downloads are related to the (unnecessary) rewrites of index files outside the RC.
  • Test I and J
    • … show a low R:W ratio of 0.5.
    • The previous Test H did wipe out all important (1.3 GB) of “sim start essential” data.
    • Due to slow servers the necessary files could not be all downloaded during Test I
      • … and the missing downloads then also ruined the “R:W ratio” of the next Test J.
    • To be clear … the sim launches without that data into the main menu
      • … but in the background it still does (try to) download the “sim start essential” data.
    • Those very low “R:W ratio” numbers are caused by the Least Recently Used (LRU) cache replacement policy.

Category c … aircraft configuration

  • Test 2 to 8
    • Those tests where designed to try to find out, if the cache uses a trivial FIFO cache replacement policy.
      • No … it does not, as it turned out.
    • All tests tried to trigger and allow high (aircraft model) content reuse.
    • So “R:W ratio” numbers above 1 seem easy to explain.
      • It did intentionally look at 1 or 2 aircraft from previous tests
        • … and revisited one (of those) aircraft more than once during each test.
    • The higher IP download numbers in all (c) tests seem related to the sound files, which get stored outside the RC.
  • Test 2
    • … only shows a low “R:W ratio” of 0.9.
    • Being the first test, I only revisited one aircraft
    • ?? Idea 4
      • The poor ratio could originate from the fact that some aircraft features are downloaded and cached, which are not needed (read) during the aircraft config menu.
        • Those features might be other LOD levels etc, which would be needed after clicking “Start Flight”.
  • Tests 3 to 7
    • … show a high “R:W ratio” of 1.5 to 3.1.
    • In those tests I revisited 3 or 4 aircraft … and that is reflected in R:W ratios.
  • Test 8
    • … again only shows a low “R:W ratio” of around 0.9.
    • ?? Idea 5
      • In this test I looked at twice as many aircraft as in previous tests.
        • This is reflected in the IP download numbers.
      • Additionally I only revisited one aircraft.
      • So the “R:W ratio” seems plausible and I would explain in like in Test 2 and Idea 4.

Category f … “short” free flight.

  • Test B
    • … shows a high “R:W ratio” of 3.4 … which is plausible.
    • Flying around the small island of Maui for around 1 hour
      • … with the drone and a previously used (cached) aircraft.
      • Many places have been visited more than once.
  • Test N
    • … shows a “R:W ratio” of 2.0 … which is plausible.
    • This 15 minute flight at KIKK was mainly a check of static ground aircraft
      • … using the drone camera a lot while inspecting the same aircraft over and over again.
    • I have no idea how to explain the high IP download
      • … as there have not been any significant writes outside the RC.

Category t … TIN landscape test flights

  • Test H and L
    • … show a “R:W ratio” of 1.0 to 1.6 … and that is plausible.
    • The tests took place in a region that was not yet cached
      • … and tried to download a large amount TIN landscape without revisiting the same place again.
  • Test M
    • … shows a “R:W ratio” of 7.8 … and that is plausible too.
    • Here I revisited LA after a previous test in the same region
      • … and then I checked the same place over and over again
      • … hoping to trigger some “TIN healing” (which did not happen).
  • Test O
    • … shows a “R:W ratio” of 1.6 … and that is also plausible.
    • I did revisit the KLAX airport and the LA landscape.
    • Some data (ground vehicles? static aircraft?) have been already in the cache.

Category l … “long” multi-hour free flights

  • Test K

    • … shows a “R:W ratio” of 1.0.
    • This was a 3.5 hour flight, at 70 ktas and 3,000 ft … from KORD (US) to KIKK (US).
    • Sometimes the servers seemed stressed
      • … e.g. downloads of static ground aircraft took veeeeery long.
    • A lot of data originated in the TIN landscape of Chicago.
  • Test P

    • … shows a “R:W ratio” of 0.9.
    • This was a 4 hour flight, at 70 ktas and different altitudes … from KIKK (US) to KORD (US).
    • I explained all the details of this flight in the posting:
  • Test Q
    • … shows a “R:W ratio” of 0.4.
    • This was a 11.5 hour flight, at Mach 0.98 and 20,000 ft … from EDDM (DE) to VVTS (VN).
    • This test used live weather and live air traffic.
    • Sometimes the servers seemed “gone”
      • … e.g. ground textures where basically missing for hours.
    • “IP MB read” matches “RC MB read” … but “RC MB write” is almost twice as high.
    • ?? Idea 6
      • The very poor “R:W ratio” might be related to a lot of cache reorg and “zero-fill” block cleanup.
        • I do consider such “zero-fill” unnecessary for a “game”, and I will return to this topic in a future post.
  • Test R
    • … shows a “R:W ratio” of 0.6.
    • This was a 7 hour flight, at 150 ktas and 15,000 ft … from FACT (ZA) to FNBL (AO).
    • This test used live weather and live air traffic.
    • Sometimes the servers seemed “busy”
      • … e.g. ground textures with too much blur (“wrong” LOD).
    • “IP MB read” matches “RC MB read” … but “RC MB write” is almost twice as high.
      • I would again explain this with Idea 6.

Just to leave one example of the Process Monitor recordings. This is for Test R and it shows the network and file access results:

Since this was (again) such a text heavy post, I want to leave at least one pretty picture from Test R and the biome around FNBL in Angola.

To me the picture above is both:

  • The magic of FS2024 and the reason why I will never go back to FS2020.
  • The challenge for a streaming-caching system design, that is not up to the task yet.
    • Close to the ground the landscape data volume goes up dramatically
      • … and today FS2024 usually can not keep up (reliably).

To summarize the findings from above:

  • It is highly likely that network downloads are initially cached in memory … and not inside the RC.
    • Only data intended for the RC is then written to the RC.
    • Data intended for the regular filesystem is not buffered in the RC first.
  • The observed Read:Write ratio numbers of the RC fit the different test characteristics
    • … and the Least Recently Used (LRU) cache replacement policy.
  • The RC remembers too much “useless” data and forgets too much “important” data.
    • Landscape data is usually (naturally) rarely reused
      • … which results in “R:W ratio” number of 1.0 or below.
    • Other categories of data (airports, aircraft, etc.) … can result in higher “R:W ratio” numbers
      • … but a RC with a small size and a LRU policy, will struggle to keep them in the RC.
    • Especially the FS2024 speciality, flying low and enjoying the new ground details and bioms,
      • … will result in landscape wiping out important data that should stay in the RC.
  • Overall real use “R:W ratio” number for the RC are barely above 1.0.
    • The RC mostly acts like a buffer (queue) and only rarely like a real cache.
  • The current RC design does not seem to take away significants load from the API, CDN or Origin servers.
18 Likes