Motion Reprojection explained

mbucchia · October 10, 2022, 1:20am

This post aims at clarifying what is Motion Reprojection (MR), also called SpaceWarp (SW), and roughly how it works. The target audience is end-users with little to no knowledge of the VR principles.

This post takes great care to not disclose any internal intellectual property, and therefore will not present full details on the “how”, but rather focus on the “what” and the “when”. These details shall help the reader to understand how motion reprojection affects performance and how to better tune your experience for motion reprojection.

Some of the details in this post are specific to the implementation of motion reprojection in the Windows Mixed Reality platform software. Things vary from vendor to vendor, but the general idea remains the same.

I will revisit the FAQ overtime to add more Q&A as they come in discussions on the thread.

DISCLAIMER: I wrote this quickly on a Sunday afternoon, so hopefully it doesn’t contain to many typos or inaccuracies.

Two forms of reprojection

Spatial reprojection: Given an image rendered for a specific camera pose, synthesize an image for a different camera pose. The source image is rendered in a predicted camera pose that estimates the user’s position at the time the image is to be displayed, and the target pose is the true value, or rather a more recent estimation, of the user position.

Most references to just “reprojection” (as opposed to motion reprojection) refer to this type of reprojection.
This is what Microsoft sometimes call Depth Reprojection, in the case where the application is providing depth information to improve the outcome of the process.
This is what Oculus calls TW (TimeWarp), sometimes OTW (Orientation TimeWarp) or PTW (Positional TimeWarp) depending on the exact technique used.
This is enabled at all time. This reprojection is implicitly done.

Here is an example of how spatial reprojection can be implemented by the VR software:

Temporal reprojection: Given a sequence of images generated at different times, generate an image for a specific time. The target time can either be within the range of the sequence of source images (backward reprojection) or in the future (forward reprojection).

This is what we usually call motion reprojection.
This process along is not sufficient: we use a combination of spatial reprojection AND temporal reprojection when doing motion reprojection. Temporal reprojection does not replace spatial reprojection, in complements it.
You typically elect to enable or disable motion reprojection. It is not automatic.

Here is an example of how temporal reprojection can be implemented (there will be more about this further below):

As briefly mentioned above, there are two types of temporal reprojection: forward and backwards.

In the case of VR, where latency is critical, we prefer the use of forward reprojection: since the direction of the prediction goes along with the actions of the user (moving your head over time) we can achieve minimal latency and minimal error (“guess”) by leveraging the most recent tracking information.

Some pros and cons of both techniques:

Spatial reprojection: very low cost, good results when compensating for headset motion, but creates discontinuity in scene motion (eg: moving objects).
Temporal reprojection: expensive, reduces discontinuity in scene motion, but can create tearing and bubbling artifacts.

The motion reprojection process detailed

The diagram below illustrates how motion reprojection happens while your application is running.

Some technical vocabulary needed for the diagram above:

V-sync: this is the time where the panel of the headset begins displaying (“scanning out”) the most recent image. It can be seen as the beginning of a new frame period.
Frame submission: this is the action of passing a rendered image to the next step in the VR pipeline.
Frame latching: this is the moment as late as possible within the lifetime of a frame where we MUST have a image ready for the next step to begin.
Late-Stage Reprojection (LSR): this is the always-on spatial reprojection process, that happens no matter what, even when motion reprojection is disabled.

First and foremost: motion reprojection is not a feature of the application. It is not done by the application; it is fully done by the VR software stack. In some designs (eg: WMR), motion reprojection runs inside the application process, in other designs, motion reprojection runs into a separate “compositor” process.

The motion reprojection process is done asynchronously: this means that it is done in parallel to the rest of the application running, and it does not block the application for performing CPU or GPU tasks. This nature gives the name ASW (Asynchronous SpaceWarp) to Oculus’ technology, but it is also implemented that way by other vendors. The application is not aware of the motion reprojection happening, however it can be more or less disturbed by it (see more details on this further below).

The principle of motion reprojection is as follows:

The application renders the next view of the scene (next frame). This process cannot achieve native refresh rate of the VR headset (eg: 90 Hz with WMR) with all applications and all the best visual settings. This results in missing the frame latching deadline imposed by the VR software.
In the background, the asynchronous motion reprojection process creates extrapolated frames (it “guesses” what the next frames should look like) for all frames that the application could not render on time. Eg: if an application can only achieve 30 FPS (or one third of 90 Hz), motion reprojection needs to extrapolate the 2 missing frames.
- In reality, motion reprojection extrapolates all frames (so in this example, it creates all 3 frames to take the frame rate from 30 to 90 FPS). This is done to compensate for the delay introduced between the moment the application finished rendering a frame, and the moment that frame needs to be displayed.
The background asynchronous motion reprojection then submits the extrapolated frame on time for latching by the compositor/headset driver. The extrapolation process MUST happen fast enough so that the frame submission will be on time for frame latching (the absolute latest point where we must have an image that we will display on the panel).
The VR compositor or device driver performs spatial reprojection (aka Late-Stage Reprojection or LSR) using the most recent head tracking information, before beginning display of the image on the panel.
At any point above, the application completed rendering of its current frame, and that frame will be the next one used by the asynchronous motion reprojection process.
- In order to avoid unnecessary computation by the application and to reduce the end-to-end latency, we make the application wait until the next v-sync event from the headset.

Let’s now take a closer look at the frame extrapolation process mentioned above (sometimes called frame synthesis). The diagram below shows what is inside the “MR frame” boxes in the previous diagram:

The motion reprojection process can be summarized in four major steps, plus one final spatial reprojection step (as described earlier: we use temporal reprojection in addition to the always-on spatial reprojection):

First we spatially reproject the last two frames to the most recent tracking information we have for them. This is to ensure that we use the most recent projection of those frames for the next step.
Then, we perform motion estimation. We use these two frames to compute motion vectors, which encodes what direction and how much the pixels have moved between the two images. This is the slowest step of the motion reprojection process.
We apply some post-processing to the motion vectors in order to clean them up (remove noise in incorrectly guessed motion) and either or both spatially correlate them (are neighboring motion vectors consistent with each other?) or temporally correlate them (are new motion vectors in this region of the screen consistent with past motion vectors in that same region?).
Finally, we perform motion propagation: starting from the most recent image that we have, we use the motion vectors to displace pixels in the direction we think they are going.

We then submit the extrapolated frame to the VR compositor/driver, where it is eventually picked up by the Late-Stage Reprojection (LSR) and spatial reprojection is applied using the most recent tracking information available.

The overhead of motion reprojection.

Looking back at the five stages described above, we briefly summarize the workload accomplished in all of them. Note that below, for CPU, we do not list the negligible overhead associated with sending commands to the GPU.

Spatial reprojection (step 1 and step 5):
CPU: a few matrix multiplications (very light)
GPU: running a full-quad shader (very light)
Motion estimation (step 2):
CPU: no overhead
GPU: passing the images to the video encoder block (very slow, but does not affect rasterization (see below))
Post-processing (step 3):
CPU: no overhead
GPU: invoking a series of shaders (very light, because the motion vectors input is very small)
Motion propagation (step 4):
CPU: no overhead
GPU: running a full-quad shader (very light)

The diagram below shows the end-to-end workloads and how they are executed:

(NB: the size of the workloads is not at scale)

The overhead of motion reprojection on the application is only the lighter orange regions in the “App GPU” timeline: when certain steps of the motion reprojection happen, they preempt the GPU (meaning they take control of the GPU for their own work, before giving it back to the application). This preemption happens multiple times for each reprojected frame, and it effectively increases the frame time.

The overhead of motion reprojection on the CPU is negligible, since there is no big workload executed on the CPU, and the few workloads run on the CPU can be scheduled on other cores without interfering with the application workload.

Some real-life numbers:

Setup is: No OpenXR Toolkit, RTX 2080 Ti, changed MSFS/OpenXR resolution to hit 45 FPS with MR. Using WMR overlay to read statistics.
MR manages to get 45 FPS, app CPU/GPU read 15ms and 17ms respectively.
Disabling MR, app CPU/GPU reads 18ms and 15ms respectively, and we are running 51 FPS.

This implies that the overhead of MR at 45 FPS was ~2ms on the GPU. The other overhead (aka post CPU/GPU) is either a) included in the new app GPU (those preemption/overlaps) b) irrelevant because done by the video encoder c) irrelevant because done during app Idle period.
No clue why the app CPU was higher without MR, probably because higher frame rate in MSFS means more physics or something? But this still confirms that MR is not adding any noticeable CPU overhead.

FAQ

Q: The Flight Simulator performance overlay tells me that I am “limited by mainthread” when using motion reprojection. Therefore, I assume that MR takes a lot of CPU?

A: The performance overlay is wrong. It does not understand that the frame rate is being throttled down to 45, 30 or 22.5. During frame throttling, the CPU is idle, waiting for the next app v-sync. The performance overlay accounts the delay as CPU time, but this does not truly “consume” CPU or generate heat. It just sleeps. No CPU is being wasted.

Q: Why throttling?

A: If we did not throttle the application, it would render the next frame as soon as the previous one completed, however that frame would not be used until the next async thread latching opportunity (make that “delay” shown above longer). This would add latency and also require motion reprojection to predict further in the future, which would lead to more uncertainty (error in prediction) and therefore more artifacts.

We could short-circuit the async thread and use the next v-sync event instead, however we would still need to perform motion reprojection for this frame (which would still lead to artifacts in some cases). Plus altering the temporal continuity in the motion estimation/motion propagation algorithm would effectively lead to stuttering and more artifacts, because the algorithm better at predicting the future for a constant prediction rate.

So we would be working harder, but to achieve the same exact result, which isn’t worth it.

Q: Without motion reprojection, I can get 45 FPS, but when I enable it, I get throttled to 30 FPS while I expected 45 FPS. Why?

A: This is because of the preemption overhead described above. The spatial reprojection, post-processing, and motion propagation workloads will increase your frame time. You cannot assume that 45 FPS without motion reprojection will give you 45 FPS frame lock with motion reprojection. The motion reprojection will interleave work for each extrapolated frame which will increase the app GPU frame time by ~1-2ms. At 45 FPS (22.2ms/frame), this takes you to 41 FPS (22.2+2 = 24.2ms/frame), which is now insufficient to achieve 45 FPS lock. At 30 FPS, you might see twice the overhead (4ms/frame, since the asynchronous process happens twice). At 22.5 FPS, you might see three times the overhead (6ms).

You need to account for this overhead in your headroom. For example a quick test in MSFS shows me that if I tune my resolution to achieve 51 FPS (19.6ms/frame), I can achieve 45 FPS with motion reprojection, which makes sense since 19.6+2 = 21.6ms = 46 FPS. Anything lower than 51 FPS in my test resulted in 30 FPS lock. (NB: Of course doing it this way (with absolutely no headroom near 45 FPS) means that I will regularly drop from 45 to 30).

Because the motion reprojection “buckets” are pretty wide (eg: 30 → 45), you will see a large drop (15 FPS) which makes you think that the overhead is huge. But in reality, it is the “penalty” for missing your target frame rate that is huge, even if you miss it by very few frames.

Q: I find motion reprojection is less efficient than no motion reprojection.

A: When you run your application with motion reprojection and fall into the 45 FPS budget, your application is rendering at 45 FPS. However, with the motion reprojection extrapolating the missing frames, your effective frame rate is 90 FPS. If you think of efficiency as “how hard is the work to do to hit a given framerate”, then here is an example with actual numbers I’ve just measured:

Without MR, app CPU/GPU = 21/21ms, FPS is 45. Your frame efficiency is: 2.1 frame per millisecond of CPU/GPU time.
With MR, app CPU/GPU = 16/27ms, FPS is 90 after reprojection. Your frame efficiency is: 5.6 frame per millisecond of CPU time and 3.3 frames per millisecond of GPU time.

With MR, you are 1.5x more GPU-efficient, and 2.6x more CPU-efficient.

Q: Why is motion reprojection bad on my AMD GPU?

A: This is because of the motion estimation phase of the algorithm. It relies on the video encoder block of the GPU. Pre-RDNA GPUs (like RX 5000 series) do not support motion estimation on large blocks, and therefore we must fallback to small blocks, which means we must downscale the input images dramatically (think 5x smaller than headset’s resolution) and this loses a lot of details. RDNA GPUs (RX 6000 series) support large blocks, however they are 2-3 times slower than Nvidia GPUs for motion estimation, which leads to missed latching in the LSR thread, and results in image warping or unstable framerates.

Q: What is the motion reprojection setting in Flight Simulator VR settings doing (“Depth and Motion”)?

A: Short answer: it does nothing today.

Long answer: it implements something called application-assisted motion reprojection, also called AppSW by Oculus. It requires your OpenXR runtime to implement a feature called XR_FB_space_wrap, which as of today, no runtime implements (not even the Oculus one - the only implementation of AppSW today is for native Android apps running on the Quest 2). When/if this feature is implemented by vendors, it will allow MSFS to pass its own motion vectors (computed during rendering for TAA, DLSS, motion blur etc) to the motion reprojection process. This will allow to 1) remove the need for motion estimation (which is mostly done on the video encoder anyway, so it will not affect performance dramatically) 2) use high-quality motion vectors for motion propagation (which can increase quality).

The “Depth”-only setting enables the game to pass depth information which is used on WMR for better spatial reprojection (when motion reprojection is off - this setting has no effect when motion reprojection is used). I can’t speak for other vendors, but I suspect they can also do better spatial reprojection with this setting. As for Oculus, I have not verified this information, but the claim is that depth information will also enable ASW 2.0 if you use motion reprojection.

HugeMercury163 · October 10, 2022, 3:00am

Wow, a very detailed, excellent explanation. Makes things so much clearer now, as I had almost no idea of what was going on “under the hood” with MR. Thanks for this great post, also for explaining why my current Radeon 6800XT trails an nVidia 3080 for example in MR.

Edit: Roll on the 12th, when with luck I can bag an nVidia 4090. I’m not expecting any miracles performance wise in VR, but if I could have OXR 100%/TAA100% and be able to reliably lock on to MR at 45fps I’d be a happy camper.

BAW303 · October 10, 2022, 5:31am

@mbucchia Thank you for taking the time to post this, Really informative!

Kjaye767 · October 10, 2022, 3:46pm

Thanks for putting this together mbucchia, it’s super helpful.

It will take me a couple more read throughs to get my head around it all but I did have a few questions that I wanted to ask you regarding the value of upgrading.

I currently have a 10850k, 32 GB 3200 mhz ram and a 3090. I have always thought that my processor is holding me back, but reading your post seems that’s possibly not the case.

Would a new CPU like the upcoming 13900k or AMD 5950x give me a real, noticeable improvement in performance in your opinion? I am only flying in VR

Likewise, do you expect the 4090 to be capable of delivering much better performance. Nvidia has said DLSS 3 will not be available for VR at launch, do you know anything about the difficulties making DLSS 3 work in VR, and if its likely to be implemented, and if so does it work alongside MR, or does it replace it entirely?

Sticking with DLSS, I don’t like the current implementation of DLSS in MSFS, even at quality setting its blurry. I can get 45 fps locked with MR and DLSS whereas TAA still only locks at 30 fps, but even so I prefer to use TAA and your OpenXR Toolkit with FSR for now over DLSS as I don’t like the blurry cockpits. Do you think this will improve, and is this likely to be a problem for DLSS3?

I guess my main question is, for VR with a 10850k and 3090 already, could I get the performance I dream of with a 4090 and/or a 13900k or are some of these limitations more to do with the engine/optmisation at present so better off sticking with what I have?

I know I won’t be the only user wondering the same so it will likely be a useful question for lot of us.

mbucchia · October 10, 2022, 5:12pm

I don’t know more than you do about DLSS3. I probably know less actually since I haven’t read much about it. Some guesses/random points below:

From what I understand, it’s doing backward reprojection that I’ve explained above? It’s not really clear in that scenario how the interpolated images are then submitted to the VR pipeline. The real issue with this backward method is that it will create images using old headset poses, therefore increasing end-to-end latency, and will rely on the spatial reprojection to correct.

The issue (and I should have listed it in the cons of spatial reprojection) is that spatial reprojection doesn’t work over long periods of time. The higher the latency, the worst spatial reprojection will perform. So you will likely get higher frame rate, but also higher latency will which make the experience not enjoyable in VR (perhaps sickening). Think of the “taking a picture from a different position” example. If you do that in real life, if the delta between the original angle of the picture and the new angle is small, then it won’t look too bad and it will properly fool you. But if that delta is very large (and with latency, this delta becomes larger and larger because your head has moved further and further from the starting point, or perhaps in non-linear motion), the new picture will look completely off. Kinda like when you have some really really low framerate and you can see that “virtual plane” moving in your headset (I’m sure you’ve experienced that before).

Using temporal reprojection won’t help you either, because similarly, it doesn’t perform well over long periods of time and will increase the amount of artifacts. Since DLSS3 and MR would likely use a different technique for motion propagation, they will likely “disconnect” in the way they shift pixels, and it will result in very poor image quality (a lot more bubbling and tearing). Think of it that one algorithm will move a chunk of pixels one way, with a specific bounding region (detecting the edges/contour of a moving object), and then the next algorithm will move a different chunk of pixels with a different bounding region (because both algorithms will be tuned differently for edge detection etc). This will create news artifacts, and the amount we see today is already borderline or not acceptable. I don’t really see a world in which you do DLSS3 + MR. The intent is more to replace it I think, but with the issue of latency discussed earlier.

Regarding your CPU upgrade question, I have no idea, but if like me your current app CPU is like twice lower than app GPU, then it’s unlikely to make a noticeable difference?

Regarding DLSS3 quality vs DLSS2, I have not read anything about it. If the main improvement of DLSS3 is the frame interpolation, then it won’t help you with your image quality issues.

Kjaye767 · October 10, 2022, 5:51pm

Thanks for the extensive reply. I’m thinking I will wait for now and see what develops. I don’t want to drop 4 grand on a new PC and get only modest gains. I’m back to using your toolkit over DLSS now anyway as I prefer quality visuals over pure fps, and TAA and FSR looks better at 30 frames locked MR, than DLSS at 45, which is putting me off any DLSS in VR in general for now.

I’ll wait and see what optimisations Asobo can bring and hopefully improve things in time.

SearBreak225638 · October 10, 2022, 8:44pm

@mbucchia, thanks for taking your precious time to explain MR. We all are doing pretty well now with MSFS VR - getting more clarity and more performance, generally trying to get a stable 30 to 45 FPS. I really prefer MR since I like to look out the side windows in turns so I want that to be smooth. It’s a real balancing act! Since the MSFS frametimes vary several milliseconds, I have to add about 5 msec. to the OXRTK printout of appGPU in order to get the smoothest visuals at 36Hz.

mbucchia · October 11, 2022, 7:26pm

I just swapped an RTX 2080 Ti for an RX 6600 in the same PC, and here is the comparison for the motion projection numbers for the same scene:

Nvidia: <4ms, which is plenty in order to meet the 11.1ms pace for 90 Hz motion reprojection
AMD: >12ms, which is not enough for 90 Hz motion reprojection, and falls back to 60 Hz which creates a poor experience. Not much to be done here… except lowering the quality even more.

HugeMercury163 · October 11, 2022, 11:41pm

Thanks for that. Although my 6800XT performs very well at 1440p in 2D, for VR and especially with MR, I really need an upgrade to get the performance and clarity I want.

I noticed reading Igor’s Lab review today that the new RTX 4090 has made even more improvements, with dual AV1 encoders. So either a 4090 or a 4080 would do the job, though I’ll be interesting to see what AMD brings to the table in that regard.

Quote:
"In addition, NVIDIA now also uses a dual AV1 encoder, whereby the NVIDIA encoder (NVENC) of the 8th generation is used. Generation with AV1 is said to work up to 40% more efficiently than H.264…

…the entire feature set of extremely increased ray tracing performance, DLSS 3.0 and Reflex is accompanied by other hardware solutions like the dual video encoder (NvEnc), which can even take on parallel tasks. Simultaneous streaming and recording are only one facet, because the overall increased computing power of the GeForce RTX 4090 including the Tensor cores will also be very much appreciated in productive use."

mbucchia · October 12, 2022, 12:04am

I’m certainly excited to see what the newer NVENC cores can do. This is where the motion estimation happens (and the Optical Flow part of DLSS).

I’ve been working on some improvements to leverage these features already (even on pre-4000 series), and I really hope to be able to ship some of these improvements by the end of the year.

mbucchia · October 12, 2022, 4:51am

Here’s an interesting article that talks about the Frame Rate Up Conversion (FRUC) of the new Optical Flow hardware and SDK, which I bet is the tech behind DLSS3. It confirms the “backward” frame synthesis (“take two consecutive frames and return an interpolated frame in between them”).

JacDX · October 13, 2022, 7:06am

Very well written and thorough explanation. So as and AMD user originally upgrading for 1080ti and having bought a reference 6800 xt at release, I too have found the motion smoothing performance to be underwhelming, which is extremely problematic on MSFS more than anywhere else. OFC it’s the one game I’d want to play! There is a catch however, and one I’ve been trying to understand for the best part of an entire year. You seem very knowledgeable and perhaps can help: if the problem is indeed hardware related because of the video encoder block slowness, why is that reverting to older drivers (especially 21.5.2 from May 2021 as thoroughly documented on the DCS World forums) seems to improve matters to such a large degree? What did they (AMD) change at that time, that greatly worsened motion smoothing performance?
Please understand I’m not trying to defend the card. It is just a genuine question from someone who has been reverting to a year old drivers at least once per week, having experienced many times said performance boost in motion smoothing, that does at least SEEM to be software related. At this point, I’d be just as happy to surrender and get new card. And never buy AMD again.

HyperJet2018 · October 13, 2022, 8:35pm

Wow, great post @mbucchia! Thanks for that, i always was wondering how MR works. And good i bought from team green (reading bad things about team red for VR which apparently were true). It is working like a charm now with my G2, Render Scale 130% (DLSS performance looking really great, instruments acceptable at 130%) with my 3070ti/5800X. Openxr 112.

shameelx1249 · October 13, 2022, 10:05pm

so MR can be activated through MSFS, OXR and OXRTK - what is the best way to use it and what kind of settings would work best for a 12900k, 3090

mbucchia · October 13, 2022, 10:25pm

I replied to you here already, but it sounds like you did not read and understand the difference between depth reprojection and motion reprojection explained in the first post.

mbucchia · October 14, 2022, 12:37am

When I say “the video encoder” this means not just the hardware, but also driver. Can you provide data from the OpenXR Tools for WMR overlay to compare these 2 drivers? Especially the post CPU/GPU stats.

Accessing the video encoder for motion estimation is done through some DX12 code, so if something DX12 is inefficient in their driver, it could explain the difference.

Sootysax · October 15, 2022, 2:35am

Can you explain how I can get rid of the warbling distortion effects that motion Reprojection creates?

JacDX · October 15, 2022, 6:45am

Sure, here are my stats for the two versions (im using FPSvr for data as it’s already installed and configured on my system.)
As you can see, there is no great difference in the numbers recorded for the two cases within the same scene (flying over Rome with Motion Smoothing on ofc) and FPS performance is similar, whereas I can assure you the old 21.5.2 driver produces a “correct” rendering, with some tolerable artefacts. While current 22.10.1 drivers give me a terrible wobbly, bubbling, migraine inducing image until I disable MS. You can double check my experience by googling the problem and driver version or even by searching on this very forum for threads related to the 6000 series of cards.
AND the problem is present in every single driver after that one, despite many people telling AMD in the bug reports time and again.

Edit: also for the sake of completeness, here you have my sweet spot for heavy scenes and city flying: old drivers, motion smoothing on and frames locked at 22 with 11 ms prediction time. I can zip around Manhattan for hours on end like this, while even attempting these settings with current day driver releases means you’ll have to lie down in about 3-4 mins AT MOST)

mbucchia · October 15, 2022, 8:09am

I’m very confused, if you are using fpsVR, then you are using SteamVR OpenXR runtime? What headset do you have?

Unfortunately I wont be able to help you investigate this. fpsVR and the SteamVR driver do not expose the statistics needed to investigate motion reprojection.

I had somehow wrongfully assumed you were on WMR (eg: Reverb G2).

mbucchia · October 15, 2022, 8:15am

That’s not really a lot of details to go on with.

The lower your frame rate, the more wobbling/artifacts there will be due to underprediction or overprediction. For example WMR at 22.5 FPS will look worse than 30 FPS, and 30 FPS will look worse than 45 FPS.

Also some motion reprojection algorithms are going to do better in some situations than others. You cannot do anything about that.