VR Reverb G2 WMR Performance (Motion Reprojection CPU issues, overclocking)

mbucchia · February 12, 2022, 2:02am

I think you are reading the statistics incorrectly. I was just discussing this with @RomanDesign by PM, but I will explain it here too.

You can use the OpenXR “Display frame timing overlay” (see Developer Tools) in order to confirm everything I am about to say below.

The stats in the OpenXR overlay will show you 3 lines:

App Cpu/Gpu ← This is the workload of FS
Pre Cpu/Gpu ← This will always read 0 for now because the game does not submit depth buffers
Post Cpu/Gpu ← This is the workload of the OpenXR runtime, in this case the Motion Reprojection.

Based on the numbers @RomanDesign shared with me, it appears that App Cpu is basically just a little bit more than what the other overlays calls “RdrThread”, here 13.4ms. These values are directly reflecting the amount of CPU time used to submit the work to the GPU.

So if you look at your values here, we have RdrThread being 13.4ms, and OpenXR App Cpu was about ~15ms. This number is the CPU utilization for rendering a frame in the game.. Now 15ms, that is technically low enough to drive up to 60 FPS. Yes 60 FPS.

Now you look at the App Gpu number. This number matches the GPU time you also see in the other overlay. In your case, here, you see 27.4ms. This duration is low enough to drive ~37 FPS.

So very clearly, you look at those 2 numbers above:

CPU can drive 60 FPS
GPU can drive 37 FPS

You take the smallest one out of the 2. This is the GPU one. You are GPU-bound.

Q:

Nope. You can go look at the statistics I mentioned yourself. Post Cpu/Gpu, on my machine at least, is reading less than 1ms on CPU (so basically nothing) and 4-5mis on the GPU.

Q: OK so what the heck is this “Limited by MainThread”?

I cannot answer than with certainty, because only the developers can. But I can give you a very reasonable guess and explanation.

The time shown as App Cpu is the duration between the game calling the “beginning of the frame” and the game submitting the frame, ie “finish of the frame”. This is the time the game is actively busy doing computational work and writing commands to the GPU.

There is however, another step involved in this frame rendering process: the “waiting to begin the frame” step. The MainThread duration observed in the overlay is likely corresponding to the duration between the “waiting to begin the frame” and “finish of the frame”.. Yes, you just read it. Read it back. This MainThread timer is including the “waiting” phase. During that phase, well, nothing happens. The game just waits. It does not use extra CPU.

This is also why the MainThread time is reading ~30ms in your screenshot, because 30ms is about 30 FPS (there’s always a few milliseconds of overhead/gap here and there, so just ignore them).

Q: Now wait, why 30 FPS? Why not 37 FPS?

These questions are all answered by looking at the principle of reprojection. VR headsets are best driven at the same rate that their display supports. So for example with a 90Hz display, you want to send an image 90 times per second, and this timing must match when the display will start showing the image (the “scan out”). Why is that by the way? This is because it takes several milliseconds for a display to fully illuminate all the pixels making up the image. So when you want to reduce latency, you want to produce an image as close as possible to the beginning of scan out.

For reprojection to be able to match the display rate mentioned above, it is easiest to lock the app framerate to a rate that is divisible of the refresh rate. This is why for your 90Hz display, we will try to lock at 90 FPS, or 45 FPS, or 30 FPS, or 22.5 FPS. This way for each frame, we will either do (respectively) no reprojection, 2x reprojection, 3x reprojection or 4x reprojection.

Q: And also why waiting, this sounds stupid?

In order to lock onto the frame rate chosen above, we must throttle down the game. This means doing a wait. There are 2 paradigms when implementing a wait: 1) the so-called “active wait” or “busy loop” and 2) the sleep.

The 1) as its name indicates is actively using the CPU to perform the wait. Literally, the CPU core will be executing these instructions called NOP (No Operation) and check for the desired wait condition to be over. When doing that, nothing else runs on your CPU core, and the wattage/temperature will go up, because a busy loop is effectively using 100% of your CPU core. Active wait is bad, and typically not used in high-performance applications.

Option 2) is what OpenXR implements. When the game is waiting for the next frame to lock onto, the game thread performing the wait goes to sleep. When it sleeps, two things may happen:

Another thread is executed, effectively allowing other tasks within the application or other programs on your system to run.
If no other thread is eligible, the CPU core will go into a so-called halt state. In the halt state, the CPU core will not do anything, and this effectively lowers wattage and heat. This is what you sometimes sees referred to as the “idle thread”.

So for short, this wait between the RdrThread’s 15ms and the MainThrThread’s 30ms is 15ms of sleep, where either the game will decide to do something else (like downloading something, or doing physics, or audio) or the system will just go idle.

In all cases, you are not CPU-bound. Wait is good. Wait means you have headroom.

Q: Wut?

Limited by MainThread, in the circumstances described above, does not mean limited by how powerful your CPU is.

Here, Limited by MainThread is a reflection of the OpenXR runtime throttling down the game to lock onto an optimal Motion Reprojection rate. This optimal rate is determined by looking at both the CPU and GPU load. In the case reviewed above, the bottleneck is the GPU.

To quickly find out if you are truly CPU Limited, take a look at RdrThread and GPU time. The higher value of the two indicates whether your CPU (RdrThread is higher) or GPU (GPU time is higher) is the limiting factor.