RollingCache.ccc performance debugging and tuning … How?

What you @BegottenPoet228 are describing seems to fit the “full index area dump” issue very well (which I described in detail in an older post).

The index write happens at different time patterns … and so it does not seem to be the cause of a continuous FPS stutter … but more of stutters with a sporadic nature (like you say).

1 Like

I just went back and reread this fascinating thread. Your comment triggered a thought (scary, I know…) When I built my new computer back in October I installed a debloated version of Win11 Pro. So I don’t have Copilot installed. And I think I eliminated some (but of course not all) telemetry. Perhaps that’s one reason I’ve been quite happy with the overall speed/smoothness of the sim.

I also encountered a very brief pause during the ongoing SU1 beta – it seemed to me that MSFS was loading the tiles. However, I might be wrong.

Where can I get one of those?

I’ve been doing some very basic testing, using CapFrameX to look at FPS and latency over time (5 minute flight.) I don’t know how that tool measures ‘stuttering’ or even if its measurement has any relevance to how the RC is functioning. In my testing I’ve used a simple aircraft (Bonanza G36 Improvement Project) that’s kept in the Community folder (actually the Addons Linker folder - only a link to that file is in Community.)

7950X3D, 3090 Ti, 64GB (2x32GB) DDR5-6400/CL32
1Gb fiber ISP, WiFi 6 router. Speedtest generall shows around 400 Mb/s.

Clear weather preset, Live and AI Traffic OFF in all tests.

I fly the same route over rural Illinois and NYC, changing one variable in my computer configuration at a time. The things I’ve been testing are:

  • vCore vs non-vCore vs. all-core performance. I use Process Lasso to set flightsimulator2024.exe core affinity. Most OS processes set to Idle Priority, FS2024 set to High Priority. When testing vCore-only performance FS2024.exe was the only thing assigned to those cores.
  • Hyperthreading On/Off
  • Windows Game Mode On/Off
  • HAGS On/Off
  • Power Plan (Balanced, High, Ultimate, BitSum High.)
  • ISLC enabled/disabled (replacing HPET.)
  • BitSum Core Park app (parking enabled/disabled.)

I’ve determined that I get the best FPS and lowest latency with the following settings: vCores-only, Hyperthreading ON, Game Mode OFF, HAGS OFF, Ultimate Power Plan, ISLC enabled / HPET disabled, Bitsum Core Parking set to prevent parking. 4K, mostly Ultra settings using DLSS 4 (Quality) and Preset J.

All the above leads me to a question:

CapFrameX also provides a measure of what it calls ‘smoothness’ as well as ‘stuttering’ and frametime variance. The screenshot below shows those graphs.

The question is: What flight test should I run to best analyze those results as they relate to Rolling Cache? In particular, the size of the RC (currently 100GB.) I can set CapFrameX to record for much longer than 5 minutes (theoretically unlimited) and I have plenty of drive space to store longer tests.

  • Should I fly a drone cam around a 737 Max on the ground at KJFK?
  • Fly the Bonanza over KJFK, KLAX, or a custom airport (I own several but haven’t been using them.)
  • Fly a more complex aircraft over Paris? Then repeat the flight?

I’d like to see how to keep stutters (and interframe times) as low as possible, of course. RC is the way, though I understand that “It is what it is” except for being able to adjust the size. I haven’t paid much attention to LOD levels, although your tests were quite illuminating. Your testing shows that Asobo has a mandate: Optimize the Rolling Cache. I want to do whatever I can do to optimize performance/quality on the Edge. Right now I think I have my system set up in the best possible configuration. But of course that’s subject to change.

Bonanza flight over NYC @ 1500 AGL showing ‘smoothness’ and ‘stuttering.’ Filename shows some of the relevant settings.

Flight over Illinois @ 2500 AGL with mostly the same settings. I was testing PBO Boost settings at the time.
Graph shows frametime variances, which I think is a very relevant metric.

I followed the procedure in this video. I didn’t follow it exactly, but he gives you a good framework for how to do it.

2 Likes

Mine was most definitely downloading from somewhere over ethernet as per task manager.

Yeah, I use 128 and I think that is OK.

1 Like

The drone cam is an excellent way to fill up the rolling.cache. Spawn a Cessna on the runway.

Drone cam, pan 360 by holding left/right key and watch Task Manager downloading.

W to move cam to buildings. Gain height and look at local city and scenery. Fly closer to cache more meshes.

Next session you will notice a significant reduction in streaming in Task Manager.

2 Likes

RUFUS is another install tool for automated workarounds during install.

I used it to upgrade my sim PC to Win11 yesterday. I get better frame rates and higher CPU usage with FS24 on Win11.

1 Like

Yes, but IIRC Rufus requires an IMG file (I could be wrong.) The tool I used lets you drop an XML onto a retail USB. Both methods will work.

Just went in specifically to look at my cache settings. Noticed it was set to 16G. Last time I looked it was 0. I changed it to 64G and they must have fixed it because the window came up but gave me a running percentage. It took about ten minutes(?) and that was it. Task Manager Ethernet monitor now shows drastically less traffic.

So was this broken until now?

Thanks for your work. Actually what I don’t get is why Asobo can’t give us a clear info about the optimal size. They should know exactly what cache structure they have, what search algo they use and to which extent it performs with various file and cache sizes. Based on their feedbacks in various dev streams or support tickets I raised, it seems they have no clue…

1 Like

2025.02.18 - Rolling Cache fragmentation, zero-fill and a 100% full cache

During my recap of the last dev stream I wrote that I wanted to revisit the topic of a “100% Full” cache. So here we go.

In that recap I made the following claims:

  • At some point every cache will reach its size limit and be “100% full”.
    • That is normal.
  • Actually one should expect that the RC is always at 100%
    • … and if not, then it is not a cache, but a waste of storage.

While I think that the above is obvious I still want to stress some related aspects:

  • The goal of a cache is to cache … and not to be empty
    • … but there is a tradeoff between the “cost of size” vs the “benefit of performance increase”.
  • A cache which on average reduces performance should be disabled.
  • To find data quickly one always needs an index
    • … and a lookup (read) in a smaller index is always (somewhat) faster than a lookup in a large index
      • … especially because an index also needs to be kept “up to date” (write)
        • … to match the actual content of the cache (blob items).
  • Bigger cache sizes always required bigger cache index size
    • … which means more overhead.
  • A cache tries to store a small part (e.g. GB) of the original data set (e.g. PB = 1 Mio * GB)
    • … but the cache stores the data close to the point of usage
      • … so that the access is (a lot) faster.
  • Finding (defining) the proper size of a cache is more of an “art” … and somewhat less “science”
    • … as there are many uncertainties (tradeoffs).

The fact that any cache (storage) is limited in its size, results in a totally obvious, but somewhat non-trivial “problem”:

In order to store new data a cache always needs to delete old data.

For the implementation of a cache subsystem the resulting questions then are:

  • A) Which data to delete?
  • B) Which data is the least likely to be needed in the near future again?
  • C) Does the new data fit into the place(s) where the old data was?
  • D) How to delete the old data?
  • E) How to store the new data?

I covered (A) and (B) in a lot of detail in the past. FS2024 currently uses a cache subsystem with a Least Recently Used (LRU) deletion strategy, that does not fit the nature of FS2024. I will not repeat that analysis here.

However, I want to provide some insight about (C) to (E).

(E) How to store the new data?

As I have shown in the cache content paintings, the storage (write) is straight forward (simplistic?). Here is the image from the fresh 16 GB RC of Test 1:

  • There are two regions:
    • 512 MB index area at the beginning
      • … with that tiny purple single pixel line at the top-left.
    • The blob area after that … with 15.5 GB (ca. 16,000 MB).
      • “Blue” are the cached text files and “purple” are the binary texture files, mesh files, etc.
  • The “white” regions show zero values in the (yet) unused areas.
    • On initial RC creation the zero-fill is important
      • … to ensure that the filesystem will really reserve the space for the cache file.
    • The “white” lines in the initial purple block most likely are “real” data with lots of zero bytes
      • … as it seems unlikely to me that any data deletion was performed in a basically empty cache file.

In Test 7 the cache is approaching “full”.

  • The purple area is (basically) a big purple block.
  • The white (unused) regions are (basically) only at the end.

What does that tell us?

New data is stored in a continuous “stream” of blob items … without “gaps”!

So contrary to all existing filesystems the cache is not using a “common (minimum) block size” to organizes the storage inside the RollingCache.ccc file.

(D) How to delete the old data?

After some heavy usage we get to Test P which was “100% full” many times over.

  • There are lots of (tiny) white lines (pixels) sprinkled all over the purple blob area.
    • This indicates that (B) does not simply delete the oldest data written
      • … which was also part of the previous “LRU analysis”.

What do the white lines tell us?

Old data is delete actively … by a zero-fill of the old data cache (memory) regions.

But why?

  • Basically every data is written twice into the RC:
    • First as a “zero-fill” (to “clean up” the cache region)
    • … and later with the “real” blob data.
  • No matter how fast the write operation is
    • … there is no speed difference in writing only “zero” or any other byte pattern
    • So the zero-fill is reducing the speed of the cache writes by (basically) 50%.
  • In addition data needs to be deleted in the index area too.
    • So why not only delete it in the index area?
      • That would deliver factor 2 write speed increase.

(C) Does the new data fit into the place(s) where the old data was?

In combination (A+D+E) define the nature of what is known as …

Memory fragmentation

It is not specific to the RC file, it is common for any kind of storage of “stuff” with different sizes. For example during the last dev stream Sebastian talked about memory fragmentation in the VRAM of the GPU.

Memory fragmentation …

  • is like gravity: it can not be avoided,
    • … and one simply has to find a way to deal with it.
  • It always results in loss of really usable (precious?) memory.
  • It induces additional memory management overhead
    • … which will reduce performance (significantly).

In computer science there is a lot of research about how to limit the amount or impact of memory fragmentation. As always there are numerous tradeoffs.

In the case of FS2024:

  • (E) = New data is stored in a continuous “stream” of blob items … without “gaps”!
    • The major benefit is …
      • very efficient (for initial) storage, as no bytes are “wasted”
        • … due to a mismatch between blob item size and cache storage block size.
    • A major drawbacks is …
      • (somewhat) inefficient storage, once fragmentation increases
        • … as it becomes harder to find (really) old data
          • … which (really) fits the size of the new data.
          • This again increases fragmentation.

As I already wrote. All (modern) storage solutions work with some kind of “data size aware” placement strategy. Only in the case of “write-once” archive files (or network data streams) a continuous “stream” of blob items is normal (e.g. in a ZIP file).

A cache is not a “write-once” archive file!

So let me put some numbers to those theoretical observations. The following shows the number of “white” regions in my 16 GB RC file over time.

  • Test
    • The identifier which I used in the tests that have been published in previous posts.
      • Only for 3 of those tests I have included a painting in this post.
  • Zero Items
    • The (rough, rule of thumb) count of “white” regions with zero-fill.
  • Size (MB)
    • The MB which (I guess, based on the 9 KB “pixel size”) the zero-fill regions have.
Test Zero Items Size (MB)
1 1 16,000
7 1 1,700
8 450 10
B 1,400 1
D 25,000 20
F 30,000 10
I 70,000 110
J 60,000 100
K 17,000 30
N 12,000 15
P 30,000 25
  • Once the RC has reached “100% full” … after Test 7
    • … the sum of the empty regions remains in a 10 to 100 MB range
    • … and the fragmentation never drops below 10,000 empty “zero-item” regions.
  • Just for context …
    • The average size of a blob item is around 30 KB.

The actual performance implications of all this on the FS2024 cache subsystem can not be derived from my paintings. Real code profiling would be necessary. But a goose would dare to claim: fragmentation does not come for free.

Update 2025-02-19: Maybe I am totally misreading what I am seeing in the “white” pixels!. Please check my next post below.

Fragmentation is real. But “zero-fill” overhead does not (seem to) exist.

I am leaving my mistakes in this post, for documentation purposes.

To summarize:

  • Data is stored in the blob area as a continuous “stream” of blob items … without “gaps”!
    • … and clearly not in a “block” strategy.
    • So initial writes into the RC file are fast and efficient … during the first days
      • … at the cost of higher memory management overhead, slower writes, etc
        • … for the majority of the cache usage … during the next years.
  • Fragmentation of the RC blob area is increasing quickly once the RC is 100% full.
    • The managment overhead of finding the “proper place” for new data increases over time.
      • This will have a negative impact on the cache performance.
  • Data in the RC is deleted with a “zero-fill” strategy.
    • But there is zero benefit to a zero-fill … only added overhead (50% performance loss).
    • FS2024 is not a banking system or a real flight control computer
      • … where excessive caution or safety requirements may justify the “zero-fill” overhead.
      • However, FS2024 would benefit from higher performance (e.g lower latency).

I would recommend for a “Hotfix” release of the Rolling Cache:

  • Remove the “zero-fill” deletion overhead.
    • Simply delete old data by marking it as deleted in the index items.

Beware … you are entering “Brave New Cache” territory.

In addition to that I would also recommend:

  • Adopt new strategies (algorithms) which reduce cache fragmentation and reduce the resulting management overhead.
  • To avoid a full rewrite of the cache subsystem it could be investigated if …
    • a generational cache design, with multiple but smaller (e.g. 4 GB each) cache files, might achieve good results.

I will try to explain the “generational cache design” idea in a future post in more detail.

12 Likes

The zero-fill seems like serious overkill.

I wonder why that is employed? Is there a “ready made cache code “kit” that comes as an easy-to-implement setup and it has the zero-fill by default? Is it configurable and someone didn’t set a flag to disable it?

That is an interesting find, as has all of your findings.

Thank you for this!

3 Likes

Thery good job and analysis !
It seems very clear.

Thank you for the positive feedback … BUT …

I think I need to add a very bold note to my post, as I am making claims which are not based on “hard” science or (very) solid evidence here. I am mainly looking at “fuzzy” paintings. Reading “tea leafs”.

Maybe I am totally misreading what I am seeing!

And in that case I would not do justice to the people who developed that code.

Just some “what if” examples …

  • What if there is a very strange correlation here?
  • What if the white areas are “real data”?
  • What if the white areas are “sea level” elevation data … for some ocean?

Here are more example pictures. This time from Test N … and I played with different “zero-percentage” values for the white pixels (remember each pixels represents 9 KB of data).

The first requires 90% of all data to be "zero"s:

The second requires 99% of all data to be "zero"s:

There clearly is a visible difference. But then “deleted” data would not always align perfectly with my 9 KB painting block size.

Another reminder. When I started doing my paintings I discovered that even the fresh empty zero-fill RC file … does not really have only “zeros” in the “zero” area. That is why I am painting based on percentage thresholds.

Also remember that I wrote in my previous post, that I do not think the white pixels in Test 1 indicate deletions.

  • Now what if that is true in general?
    • white line != deletion?
  • But then why are there no new white lines in e.g. Test 2 to 7?
    • Perhaps because I was not flying … but only loading aircraft data?

The longer I think about this … the more I think that I am missing something important when it comes to the “zero” ranges.

Hmm … I guess I will have to write a real decoder at some point. Hmm.

6 Likes

This post clearly has a lot of interest and it’s correct functionality is core to the sim, I really wish Asobo would respond to this post with some information or raise it in the next Q&A session…… Any Moderators care to get this in front of Asobo?

5 Likes

I don’t think there ever was a REAL and FULL understanding of how all this worked in FS20. Personally, I think it’s all snake oil :slight_smile:

1 Like

Although unable to comprehend much in this thread other than positive and negative conclusions, a computer-naive reader becomes convinced that a great deal of the performance improvements in the sim over the next few years will come from gradual improvement of the efficiency of the RC. I expect that to happen, but my question is how much performance improvement might be achieved by changing the balance between data that’s permanently stored and data that’s streamed repeatedly. Almost all PC users can afford the disk space for more permanently stored data. Is it the Xbox that requires so much data to be cached presently?

1 Like