Complete System Crash - Help!

Howdy. I recently upgraded and rebuilt my system. I’ve been having a heck of a time getting back to a stability point - the sim keeps causing my whole system to shut down, and when I chase the shutdown back to Event Viewer, I get Kernel Power Event ID 41 as the only Critical line item.

Specs:
Ryzen 9 9900
MSI Tomahawk X870 MAG mobo
MSI RTX 3080 SUPRIM X 10GB
64GB (2x 32GB) DDR5 RAM with EXPO enabled
3x M.2 SSDs
NZXT Kraken Elite 360 AIO
850W Gold PSU (not sure specifics)

Steps to reproduce:

  1. Open MSFS from desktop icon
  2. Select any flight (any aircraft/airport starting place)
  3. Observe crash - sometimes before a flight even loads, sometimes randomly during a flight. Several hours ago, I had a crash after landing on a short flight; just a few minutes ago, I had a crash before I even got to hit Fly and load the flight.

I have NZXT Cam watching my temps, and right up until the crash, no component is seeing temps >80C, so I’m inclined to believe I’m not getting a thermal shutdown but something else. I see the system crash most often when using MSFS, so I am posting here first, but it happens several hours into an AIDA64 run with some regularity.

I found Kernel Power Event ID 41 | Microsoft Learn, which suggests checking the BugcheckCode and PowerButtonTimestamp values in the Event Viewer entry. I did so, and both are 0, which leads me to think that I may need to upgrade my PSU. Before I got that route though, I wanted to see if anyone else has experienced this and been able to resolve it any way other than buying a new PSU.

Hi,


:small_blue_diamond: Important Note: The following recommendations are based on a similar issue I experienced with my system, though not in MSFS but in Counter-Strike 2 (CS2). When I would load into a match, right after the map finished loading, the game would crash in a way that sounds similar to what you’re describing. I noticed this issue began right after I upgraded my RAM. Upon investigation, I realized the new RAM had a higher frequency than what my motherboard supported. After replacing the RAM with a version that matched my motherboard’s specifications, the problem was entirely resolved.


:desktop_computer: My System Configuration for Reference:

  • Processor: Intel Core i7-12700K
  • Motherboard: Asus ROG STRIX Z790-H GAMING WIFI
  • Memory (RAM): CORSAIR VENGEANCE DDR5 RAM 32GB (CMK32GX5M2B6400C32)
  • Graphics Card: NVIDIA GeForce RTX 3090 Ti 24GB
  • Power Supply: ASUS ROG THOR 1000W Platinum II

Recommended Troubleshooting Steps

1. Disable EXPO:

  • It is known that enabling EXPO can sometimes cause instability. Disable EXPO and run MSFS to see if the issue persists.

2. Lower RAM Frequency:

  • If your RAM is set to a high frequency (e.g., 5200 MHz), try reducing it to a lower frequency, such as 4800 MHz, and monitor system stability.

3. Check RAM Compatibility for AMD:

  • If the system was built from individual components, make sure that your RAM is certified for AMD processors and motherboard compatibility. Many motherboards have a QVL (Qualified Vendor List) of compatible RAM and supported frequencies. Verify if your RAM modules are listed there.

4. Monitor Critical Events in Event Viewer:

  • Continue monitoring the Event Viewer and check if the same errors appear after the suggested adjustments.

Power Supply Consideration

The 850W PSU might be at its limit for a high-performance system, especially when running intensive GPU or CPU tasks. A PSU of at least 1000W, such as the ASUS ROG THOR 1000W Platinum II, may provide better stability during peak loads.


If the issue persists, please provide further observations so we can explore alternative solutions.

May we assume you updated your system BIOS and your chipset drivers?

Hi - yes, every driver is completely up to date, as is the BIOS. That was one of the first things I made sure of when things started crashing.

1 Like

Just spitballing here…
You said you built the system, and that CPU temps don’t rise above 80°C.

That seems high, especially with a 360mm AIO.
Are you graphing the temps with an app like HWINFO or Aida64?

I think 80 was the high point before the cooler kicked in, but yeah, I built the system. I have run an AIDA64 test and gotten several hours into it with sustained temps around 75. Eventually, AIDA causes this same crash, though.

I think I can still improve my cooling, and maybe I should. Right now, the radiator is on the top of the case, with the fans above the radiator and below the case pushing out. I could realize additional benefits by moving the fans below the radiator and blowing through it. Any other suggestions?

I have three 120mm Noctua fans up front pulling air into the case, the radiator on top (as discussed), and two Noctua 120s on the back pulling air out. I’m curious how big a difference moving the radiator fans will be.

I suspect that the other poster above your first post may be onto something, though. I have my RAM EXPOed, and Windows sees a speed of 6000 MTUs. When I check the QVL for my board, I find that my RAM has an SPD speed of 5600, vs the 6000 Windows sees with EXPO enabled. I am not home to try it right now, but might this whole thing be caused by the RAM being clocked higher than the supported maximum?

Could definitely be the RAM speed.
I bought DDR5-6400 RAM on the QVL, and it ran perfectly at that speed using the XMP profile. Of course I decided I needed to tweak a little extra performance, so I bumped FCLK from the default 2000 to 2200. Big mistake. The computer wouldn’t boot, and I had to reflash the BIOS.

So yeah, RAM speed/timings are something of a tightrope.

As for the radiator… I think having the fans underneath and pushing air up and through the top-mounted radiator is better than having them pull the air out from the top. That’s how I have my 280mm radiator on a custom loop cooling my GPU. The CPU is air-cooled.

My 7950X3D runs 50-60°C when flying. Part of the reason is I’m undervolting all the cores.
My 3090 Ti runs 60-65°C on the liquid cooling. It runs at 90-95% load.

Just curious - why are you undervolting? Is it just to keep the temps down, or is it some performance-related reason?

My understanding:

To get the maximum sustained core frequency, the system balances core temp and core voltage, limiting each to maintain stability at higher clock speeds. Higher voltage can cause lowered clock speed as core temps rise.

Undervolting keeps temps down, allowing higher frequencies for a given load.
Maybe someone who knows more about this subject will chime in.

Okay, I got home, turned off EXPO, reset the CMOS, and returned everything to complete bone stock. Been running AIDA64 stress tests component by component for the last several hours. The CPU passed with flying colors, staying at about 70C for 2.5 hours. The system memory ran for 1.5 with no hiccups, and so far the GPU is about an hour into a test with no issues. Going to try a comprehensive test of everything at once here shortly and see if I get a crash. If not, back to MSFS to try that again.

Promising, so far.. the longest AIDA test I’d been able to run previous to this set was ~15 mins, after which the whole system shut itself off.

You basically got the gist of it. Every processor will be stable at slightly different voltages even if they are the exact same model number. Manufacturers artificially pump a slightly higher voltage to ensure as processors are able to hit their advertised speeds, even though they often can run on less. The vast majority of people don’t get into this kind of tuning, so makes sense on their end to go for across the board stability. To test each and every processor for its own stable voltages would be prohibitively time consuming and expensive.

Undervolting is basically getting free performance and efficiency gains if you are willing to put in the effort to do so yourself.

1 Like

Just tried the sim again. Got into a flight, and bam, shut down again. I did notice that even the menus were laggy, and the pan-down when you first load up a flight was very slow.. but I didn’t even make it into the cockpit before the crash.
At this point EXPO is off, the RAM is running at its base speed, I reset CMOS again, unplugged and replugged every single component, and it’s still crashing.
It runs through every benchmark I throw at it and survived several hours of stress test loading with AIDA, but something about MSFS pushes it past some limit and crashes everything.

I think the only two possibilities now are GPU and PSU?

Hi @stmad12,
Does this happen if you run MSFS is safe mode?
(once into the sim, go to your Windows Task manager and terminate msfs.exe. Restart the sim and you’ll get the safe mode option.)

Have you tried running your system with half of your RAM sticks? If it still crashes, try the other half.

I hope you got your issues resolved.

I thought I’d follow up with a real-world look at my undervolted CPU encoding a 4K H.265 video using Handbrake. It’s not MSFS, but it shows the undervolted cores highly loaded, running near their rated boost frequency, while maintaining temps below the 89°C thermal limit. I think the power curve algorithm is limiting the boost frequency a little based on the temp limit. If I had not undervolted the cores, they would reach that thermal limit much sooner, and those max core frequencies would be lower.

For reference, AMD lists the 7950X3D frequencies as: 4.2 GHz Base / 5.7 GHz Boost.