This post aims at clarifying what is Motion Reprojection (MR), also called SpaceWarp (SW), and roughly how it works. The target audience is end-users with little to no knowledge of the VR principles.
This post takes great care to not disclose any internal intellectual property, and therefore will not present full details on the “how”, but rather focus on the “what” and the “when”. These details shall help the reader to understand how motion reprojection affects performance and how to better tune your experience for motion reprojection.
Some of the details in this post are specific to the implementation of motion reprojection in the Windows Mixed Reality platform software. Things vary from vendor to vendor, but the general idea remains the same.
I will revisit the FAQ overtime to add more Q&A as they come in discussions on the thread.
DISCLAIMER: I wrote this quickly on a Sunday afternoon, so hopefully it doesn’t contain to many typos or inaccuracies.
Two forms of reprojection
Spatial reprojection: Given an image rendered for a specific camera pose, synthesize an image for a different camera pose. The source image is rendered in a predicted camera pose that estimates the user’s position at the time the image is to be displayed, and the target pose is the true value, or rather a more recent estimation, of the user position.
- Most references to just “reprojection” (as opposed to motion reprojection) refer to this type of reprojection.
- This is what Microsoft sometimes call Depth Reprojection, in the case where the application is providing depth information to improve the outcome of the process.
- This is what Oculus calls TW (TimeWarp), sometimes OTW (Orientation TimeWarp) or PTW (Positional TimeWarp) depending on the exact technique used.
- This is enabled at all time. This reprojection is implicitly done.
Here is an example of how spatial reprojection can be implemented by the VR software:
Temporal reprojection: Given a sequence of images generated at different times, generate an image for a specific time. The target time can either be within the range of the sequence of source images (backward reprojection) or in the future (forward reprojection).
- This is what we usually call motion reprojection.
- This process along is not sufficient: we use a combination of spatial reprojection AND temporal reprojection when doing motion reprojection. Temporal reprojection does not replace spatial reprojection, in complements it.
- You typically elect to enable or disable motion reprojection. It is not automatic.
Here is an example of how temporal reprojection can be implemented (there will be more about this further below):
As briefly mentioned above, there are two types of temporal reprojection: forward and backwards.
In the case of VR, where latency is critical, we prefer the use of forward reprojection: since the direction of the prediction goes along with the actions of the user (moving your head over time) we can achieve minimal latency and minimal error (“guess”) by leveraging the most recent tracking information.
Some pros and cons of both techniques:
- Spatial reprojection: very low cost, good results when compensating for headset motion, but creates discontinuity in scene motion (eg: moving objects).
- Temporal reprojection: expensive, reduces discontinuity in scene motion, but can create tearing and bubbling artifacts.
The motion reprojection process detailed
The diagram below illustrates how motion reprojection happens while your application is running.
Some technical vocabulary needed for the diagram above:
- V-sync: this is the time where the panel of the headset begins displaying (“scanning out”) the most recent image. It can be seen as the beginning of a new frame period.
- Frame submission: this is the action of passing a rendered image to the next step in the VR pipeline.
- Frame latching: this is the moment as late as possible within the lifetime of a frame where we MUST have a image ready for the next step to begin.
- Late-Stage Reprojection (LSR): this is the always-on spatial reprojection process, that happens no matter what, even when motion reprojection is disabled.
First and foremost: motion reprojection is not a feature of the application. It is not done by the application; it is fully done by the VR software stack. In some designs (eg: WMR), motion reprojection runs inside the application process, in other designs, motion reprojection runs into a separate “compositor” process.
The motion reprojection process is done asynchronously: this means that it is done in parallel to the rest of the application running, and it does not block the application for performing CPU or GPU tasks. This nature gives the name ASW (Asynchronous SpaceWarp) to Oculus’ technology, but it is also implemented that way by other vendors. The application is not aware of the motion reprojection happening, however it can be more or less disturbed by it (see more details on this further below).
The principle of motion reprojection is as follows:
- The application renders the next view of the scene (next frame). This process cannot achieve native refresh rate of the VR headset (eg: 90 Hz with WMR) with all applications and all the best visual settings. This results in missing the frame latching deadline imposed by the VR software.
- In the background, the asynchronous motion reprojection process creates extrapolated frames (it “guesses” what the next frames should look like) for all frames that the application could not render on time. Eg: if an application can only achieve 30 FPS (or one third of 90 Hz), motion reprojection needs to extrapolate the 2 missing frames.
- In reality, motion reprojection extrapolates all frames (so in this example, it creates all 3 frames to take the frame rate from 30 to 90 FPS). This is done to compensate for the delay introduced between the moment the application finished rendering a frame, and the moment that frame needs to be displayed.
- The background asynchronous motion reprojection then submits the extrapolated frame on time for latching by the compositor/headset driver. The extrapolation process MUST happen fast enough so that the frame submission will be on time for frame latching (the absolute latest point where we must have an image that we will display on the panel).
- The VR compositor or device driver performs spatial reprojection (aka Late-Stage Reprojection or LSR) using the most recent head tracking information, before beginning display of the image on the panel.
- At any point above, the application completed rendering of its current frame, and that frame will be the next one used by the asynchronous motion reprojection process.
- In order to avoid unnecessary computation by the application and to reduce the end-to-end latency, we make the application wait until the next v-sync event from the headset.
Let’s now take a closer look at the frame extrapolation process mentioned above (sometimes called frame synthesis). The diagram below shows what is inside the “MR frame” boxes in the previous diagram:
The motion reprojection process can be summarized in four major steps, plus one final spatial reprojection step (as described earlier: we use temporal reprojection in addition to the always-on spatial reprojection):
- First we spatially reproject the last two frames to the most recent tracking information we have for them. This is to ensure that we use the most recent projection of those frames for the next step.
- Then, we perform motion estimation. We use these two frames to compute motion vectors, which encodes what direction and how much the pixels have moved between the two images. This is the slowest step of the motion reprojection process.
- We apply some post-processing to the motion vectors in order to clean them up (remove noise in incorrectly guessed motion) and either or both spatially correlate them (are neighboring motion vectors consistent with each other?) or temporally correlate them (are new motion vectors in this region of the screen consistent with past motion vectors in that same region?).
- Finally, we perform motion propagation: starting from the most recent image that we have, we use the motion vectors to displace pixels in the direction we think they are going.
We then submit the extrapolated frame to the VR compositor/driver, where it is eventually picked up by the Late-Stage Reprojection (LSR) and spatial reprojection is applied using the most recent tracking information available.
The overhead of motion reprojection.
Looking back at the five stages described above, we briefly summarize the workload accomplished in all of them. Note that below, for CPU, we do not list the negligible overhead associated with sending commands to the GPU.
- Spatial reprojection (step 1 and step 5):
CPU: a few matrix multiplications (very light)
GPU: running a full-quad shader (very light) - Motion estimation (step 2):
CPU: no overhead
GPU: passing the images to the video encoder block (very slow, but does not affect rasterization (see below)) - Post-processing (step 3):
CPU: no overhead
GPU: invoking a series of shaders (very light, because the motion vectors input is very small) - Motion propagation (step 4):
CPU: no overhead
GPU: running a full-quad shader (very light)
The diagram below shows the end-to-end workloads and how they are executed:
(NB: the size of the workloads is not at scale)
The overhead of motion reprojection on the application is only the lighter orange regions in the “App GPU” timeline: when certain steps of the motion reprojection happen, they preempt the GPU (meaning they take control of the GPU for their own work, before giving it back to the application). This preemption happens multiple times for each reprojected frame, and it effectively increases the frame time.
The overhead of motion reprojection on the CPU is negligible, since there is no big workload executed on the CPU, and the few workloads run on the CPU can be scheduled on other cores without interfering with the application workload.
Some real-life numbers:
- Setup is: No OpenXR Toolkit, RTX 2080 Ti, changed MSFS/OpenXR resolution to hit 45 FPS with MR. Using WMR overlay to read statistics.
- MR manages to get 45 FPS, app CPU/GPU read 15ms and 17ms respectively.
- Disabling MR, app CPU/GPU reads 18ms and 15ms respectively, and we are running 51 FPS.
This implies that the overhead of MR at 45 FPS was ~2ms on the GPU. The other overhead (aka post CPU/GPU) is either a) included in the new app GPU (those preemption/overlaps) b) irrelevant because done by the video encoder c) irrelevant because done during app Idle period.
No clue why the app CPU was higher without MR, probably because higher frame rate in MSFS means more physics or something? But this still confirms that MR is not adding any noticeable CPU overhead.
FAQ
Q: The Flight Simulator performance overlay tells me that I am “limited by mainthread” when using motion reprojection. Therefore, I assume that MR takes a lot of CPU?
A: The performance overlay is wrong. It does not understand that the frame rate is being throttled down to 45, 30 or 22.5. During frame throttling, the CPU is idle, waiting for the next app v-sync. The performance overlay accounts the delay as CPU time, but this does not truly “consume” CPU or generate heat. It just sleeps. No CPU is being wasted.
Q: Why throttling?
A: If we did not throttle the application, it would render the next frame as soon as the previous one completed, however that frame would not be used until the next async thread latching opportunity (make that “delay” shown above longer). This would add latency and also require motion reprojection to predict further in the future, which would lead to more uncertainty (error in prediction) and therefore more artifacts.
We could short-circuit the async thread and use the next v-sync event instead, however we would still need to perform motion reprojection for this frame (which would still lead to artifacts in some cases). Plus altering the temporal continuity in the motion estimation/motion propagation algorithm would effectively lead to stuttering and more artifacts, because the algorithm better at predicting the future for a constant prediction rate.
So we would be working harder, but to achieve the same exact result, which isn’t worth it.
Q: Without motion reprojection, I can get 45 FPS, but when I enable it, I get throttled to 30 FPS while I expected 45 FPS. Why?
A: This is because of the preemption overhead described above. The spatial reprojection, post-processing, and motion propagation workloads will increase your frame time. You cannot assume that 45 FPS without motion reprojection will give you 45 FPS frame lock with motion reprojection. The motion reprojection will interleave work for each extrapolated frame which will increase the app GPU frame time by ~1-2ms. At 45 FPS (22.2ms/frame), this takes you to 41 FPS (22.2+2 = 24.2ms/frame), which is now insufficient to achieve 45 FPS lock. At 30 FPS, you might see twice the overhead (4ms/frame, since the asynchronous process happens twice). At 22.5 FPS, you might see three times the overhead (6ms).
You need to account for this overhead in your headroom. For example a quick test in MSFS shows me that if I tune my resolution to achieve 51 FPS (19.6ms/frame), I can achieve 45 FPS with motion reprojection, which makes sense since 19.6+2 = 21.6ms = 46 FPS. Anything lower than 51 FPS in my test resulted in 30 FPS lock. (NB: Of course doing it this way (with absolutely no headroom near 45 FPS) means that I will regularly drop from 45 to 30).
Because the motion reprojection “buckets” are pretty wide (eg: 30 → 45), you will see a large drop (15 FPS) which makes you think that the overhead is huge. But in reality, it is the “penalty” for missing your target frame rate that is huge, even if you miss it by very few frames.
Q: I find motion reprojection is less efficient than no motion reprojection.
A: When you run your application with motion reprojection and fall into the 45 FPS budget, your application is rendering at 45 FPS. However, with the motion reprojection extrapolating the missing frames, your effective frame rate is 90 FPS. If you think of efficiency as “how hard is the work to do to hit a given framerate”, then here is an example with actual numbers I’ve just measured:
- Without MR, app CPU/GPU = 21/21ms, FPS is 45. Your frame efficiency is: 2.1 frame per millisecond of CPU/GPU time.
- With MR, app CPU/GPU = 16/27ms, FPS is 90 after reprojection. Your frame efficiency is: 5.6 frame per millisecond of CPU time and 3.3 frames per millisecond of GPU time.
With MR, you are 1.5x more GPU-efficient, and 2.6x more CPU-efficient.
Q: Why is motion reprojection bad on my AMD GPU?
A: This is because of the motion estimation phase of the algorithm. It relies on the video encoder block of the GPU. Pre-RDNA GPUs (like RX 5000 series) do not support motion estimation on large blocks, and therefore we must fallback to small blocks, which means we must downscale the input images dramatically (think 5x smaller than headset’s resolution) and this loses a lot of details. RDNA GPUs (RX 6000 series) support large blocks, however they are 2-3 times slower than Nvidia GPUs for motion estimation, which leads to missed latching in the LSR thread, and results in image warping or unstable framerates.
Q: What is the motion reprojection setting in Flight Simulator VR settings doing (“Depth and Motion”)?
A: Short answer: it does nothing today.
Long answer: it implements something called application-assisted motion reprojection, also called AppSW by Oculus. It requires your OpenXR runtime to implement a feature called XR_FB_space_wrap
, which as of today, no runtime implements (not even the Oculus one - the only implementation of AppSW today is for native Android apps running on the Quest 2). When/if this feature is implemented by vendors, it will allow MSFS to pass its own motion vectors (computed during rendering for TAA, DLSS, motion blur etc) to the motion reprojection process. This will allow to 1) remove the need for motion estimation (which is mostly done on the video encoder anyway, so it will not affect performance dramatically) 2) use high-quality motion vectors for motion propagation (which can increase quality).
The “Depth”-only setting enables the game to pass depth information which is used on WMR for better spatial reprojection (when motion reprojection is off - this setting has no effect when motion reprojection is used). I can’t speak for other vendors, but I suspect they can also do better spatial reprojection with this setting. As for Oculus, I have not verified this information, but the claim is that depth information will also enable ASW 2.0 if you use motion reprojection.