2025.01.03 - Ideas for a happy new Rolling Cache ecosystem
Beware, this post is of the very long-ish kind.
To continue the new year with more constructive remarks related to the Rolling Cache I would like to lay out a number of constructive ideas for how to improve the caching system of FS2024.
The fundamental tradeoff in computing is
- Waste memory or waste computation cycles?
In a trivial case like “multiplication” one could decide between those two strategies:
- Compute every possible multiplication once … and remember (cache) all results in some memory.
- Compute the individual result on demand … and then forget it.
While it may look silly in the context of “multiplication” the decision becomes harder once we get to problems like
- prime numbers
- a query result that has to scan 5 petabytes of data
- the medium LOD textures and 3D mesh of this or that aircraft
In real computing it always is a mix of both strategies.
Caching is like Garbage Collection
As memory is always (too?) limited, even in computers, we can not store (remember) everything that we would like to store. Ignoring the legal aspects, I clearly would love to have a full copy of the FS2024 globe for “offline flying”. But 5 petabytes is too much for a goose.
So the next level of constraints and questions that a system like FS2024 needs to answer is:
- A) How much cache memory do I need at least to have a useful system?
- B) Above what size will caching start to cause problems … to someone or something?
(A) is very hard to answer, as we all know, because “useful” means different things to different people at different times.
(B) seems to have been answered by the FS2024 team with … 16 GB … mainly due to Xbox limitations.
So we basically are looking at 5,000,000 GB of orgin data from which we need to extract the 16 GB of relevant data. So the minimum FS2024 cache size is just at 0,00032 % of the origin. Hmmm.
- C) How do I know which data is relevant?
- D) How do I decide which relevant data is now “garbage” … once my cache is full?
Housekeeping matters
In real world computing automatically the next question(s) will pop up:
- E) How much effort (memory and computation) does it take to maintain the cache?
- Insert new data
- Decide what old data is garbage
- Remove old data
- F) At what point will my caching system (algorithm) become “the problem”.
- A cache should make a system faster … not slower.
Now we are entering the …
The Art of Caching
The ideas of caching are as old as the ideas of computing. Caching is both: science and art.
A lot of research has been made and published. Here are just three resources that can provide some background or can serve as examples for those readers who want to dive deeper:
- Cache replacement policies - Wikipedia
- The Wikipedia introduction to “Cache replacement policies”
- https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/ismm_paper.pdf
- Paper from 1998: “Using Generational Garbage Collection To Implement Cache-Conscious Data Placement”
- https://web.eecs.umich.edu/~mahlke/courses/583f21/lectures/Dec6/Group15_slides.pdf
- These slides from 2021 explain ideas for cache management with machine learning.
I am not “recommending” any of the above algorithms. I mainly want to show that caching research is going on for decades and that there are many different strategies out there.
Least Recently Used (LRU)
In previous posts I tried to explain why I think that today the RollingCache.ccc
file is governed by a simple Least Recently Used (LRU) cache replacement policy.
LRU dates back to at least 1993. Its key assumtions are:
- All units of data are equal … therefore
- the only (meta) information that can inform the caching algorithm is
- … the time of the most recent usage of each unit of data.
So it is based on a single piece of meta information.
LRU and other caching algorithms have usually been developed to make computer chips (CPUs) faster, by avoiding copying data the “long distance” from the main memory (RAM).
But even todays CPUs have:
- Data category specific … instruction and data caches
- Layered (generational) … L1 and L2 caches
- Predictive caching … due to predictive code execution
- … and Wikipedia will provide a lot more details
Does FS2024 really know less about the nature of its data then the underlying CPU?
Why FS2024 can have a happy new Rolling Cache ecosystem
So what meta information can FS2024 use to decide what to keep (cache) and what to throw away? I would propose the following “quick” high level list:
- G) The time of the most recent usage … obvious. The only meta info used today.
- H) Usage history
- I) The data category
- J) Pilot preferences
- K) Game mode
- L) Current flight plan
However, to be fair, today there already are “alternative” caching strategies …
- M) The asset is part of the predefined FS2024 software installation “minimum” permanent asset collection
- … and so it will be “streamed” during or after the first launch
- … and cached outside of the RC in the normal filesystem.
- … and so it will be “streamed” during or after the first launch
- N) A sound file for a streamed aircraft
- … will, on demand, be cached outside of the RC in the normal filesystem.
But let us get back to the meta info categories (H) to (L).
(H) Usage history
FS2024 already remembers the time of usage. But how often was this “unit of data” read from the cache?
- The more often the sim or the pilot needs this data
- … the more important it seems to be
- … and the higher the probability should be of future use.
With such a simple addition the deletion of the “blue text” index sections that are needed on every sim launch could be protected from destructive deletion.
(I) The data category
Not all data is “equal” in FS2024. A very quick list might look like this:
- Scenery Index
- Aircraft
- Thumbnails and “Metadata”
- My (frequently or recently used) aircraft
- The LOD needed for aircraft and livery selection
- The LOD needed for being “Ready to fly”
- The LOD for sitting inside the cockpit
- All owned aircraft
- Static aircraft
- Airports
- My (frequently or recently visited) airports
- All owned airports
- “Metadata” like parking locations, runway info etc.
- Vehicles and “ground stuff”
- Airport ground vehicles
- Regular city trucks, busses and cars
- whatever
- People
- My Avatar
- My Career Instructor
- Airport people
- other people … for career mode
- Career Mode
- My Office
- People
- My Instructor
- Passengers
- Challenges
- The aircraft
- The landscape
- Marketplace
- Metainfo
- Thumbnails
- Showcase pictures
- Globe (landscape)
- textures
- LOD … required for the high level globe view
- LOD … common for free flight airport selection
- LOD … for low above ground flights
- Elevation or TIN data
- LODs … as for the textures?
- textures
- Live data
- Weather
- airtraffic
- Peer player aircraft
Each of the above categories will differ in their characteristics:
- small … vs. … large data units
- few … vs. … many data units
- immutable … vs. … frequently updated
- high reuse … vs. … no reuse likely
- everybody needs it … vs. … individual needs
- on every flight … vs. … on one or few flights
So why not have a RollingCache
folder which contains a dedicated cache file for each of the above categories? (I know, I know … but that will be a topic of the next post).
(J) Pilot preferences
If I “like” (buy?) and aircraft or airport it indicated higher probability of reuse. So it should not be considered “cache garbage” if there is still less liked content filling up the cache … like TIN landscape from about an hour ago, of places that I might never ever return to.
(K) Game mode
Data that I need for some kind of profession in career mode has a higher probability of reuse.
Data that I need for the latest challenges has a higher probability of reuse. I might want to retry it tomorrow, to get a better score. So why is all this long gone landscape from my current long-haul flight deleting the landscape and aircraft of my favorite challenges?
(L) Current flight plan
The FS2024 team has invested so much effort into the new flight planning website and its game integration.
Some pilots … ok, not a sightseeing goose like me … invest a lot of time in planning their flights. Why ruin such passion with broken landscape or missing airports or stutters on final approach?
Why does an active flight plan not trigger constant precaching of the destination airport, or the necessary landscape for the next X minutes of flight, or the distance X ahead?
Since this post is so text heavy, allow me to underline my case with two pictures from my yesterdays “long-haul” flight, where I escorted flight VN32 from EDDM to VVTS.
Approaching Mount Ararat was a true pleasure …
… and looking at Pakistan (a country with very interesting landscape, especially in the north) was … well … not a pleasure.
There are so many possible (and easy) ways to make the caching strategy of FS2024 more efficient. As a result the stress on the servers and the stress on the (frustrated?) pilots will decrease. This should also reduce the stress on the FS2024 team.
Good caching is good … for everybody.
In my next post I will try to outline some even more concrete implementation ideas … a “Cache Hotfix” and maybe also a “Brave New Cache” idea.
To summarize my ideas and claims from above:
- Todays Least Recently Used (LRU) strategy does not fit the nature of FS2024.
- LRU was designed for random data with no meta info.
- FS2024 has and can collect a lot of meta information about the data:
- Usage history
- The data category
- Pilot preferences
- Game mode
- Current flight plan
- etc.
- Manual caching (a.k.a. “installing a package”) is a possible hot fix
- … but FS2024 should and could also provide way better on demand caching options … e.g.
- Data category specific caching
- Layered (generational) caching
- Predictive caching