Modern CPU multithreading implementation

However I think much of this data gets assembled in vram. I’m no DX cuda programmer but can imagine the sequencing of downloaded content into mainthread must need at least some built in latency.