Discussion: April 1st, 2021 Development Update

Also, I keep reading people throwing around the “high accurate and detailed aircraft” quote from the game description.

Are people really saying that all of the default aircraft are not in any way “highly accurate”?

Did the literature say “100% accurate”? I think not. Have Asobo not said that they are continuing to improve them over time? I think so.

Nowhere did it state that “every single button in every single aircraft will work exactly as it does in the real world.”. The cockpits are all “highly accurate” in the way they look, and the functionality is coming from both Asobo and third parties.

I wonder how many people saying “this simulator is rubbish” have also posted constructive posts in the bugs/wishlist sections requesting these missing features?

1 Like

The point of such marketing statements is exactly that - sounding good while staying vague so that noone can claim it to be wrong.
Everyone will have a different expectation as of what “highly accurate” means.

I for one find the models and textures extremely accurate, the flight models vaguely accurate and the stock FMC/Garmin implementation pretty far from accurate.
And this even varies from aircraft to aircraft a LOT. So from my point of view “highly accurate” exaggerates what there actually is in sum, but I know that such claims need to be read with a grain of salt.

1 Like

Absolutely, you’re right - there is a lot of personal interpretation and expectation with anything like this isn’t there.

I just can’t agree with the (few) people claiming “the aircraft are in no way highly accurate” and that microsoft have “totally mislead people”.

With the ongoing improvements, and the mods by groups like Working Title, it’s getting better across the board all the time.

I should be clear, there is a lot to be desired, and we should defo give feedback on what we want fixed and added. I just don’t think Microsoft / Asobo were sat there in 2019 saying “tehehe lets string them all along, they’ll never notice…”

I think they simply didn’t live up to the hype that the trailers and marketing promised.
We can all agree that noone of us expected the weather to be as inaccurate as it was upon release.
We did not expect nav data or even entire airports (EDDS) to be missing.
Etc.

As for the aircraft, one could of course think that so many different aircraft and their systems can’t be as in-depth as one would hope.
But I personally was VERY disappointed in how bad shape the premium/deluxe aircraft were delivered, especially (still) lacking the options for mods to fix them. If I knew these facts, I would have bought the standard edition and that makes me being tricked by the marketing (even if I wouldn’t say I was lied to, the result is the same).

But then again, it’s not a big loss and if that contributed a little to improvements on the platform, I’m fine with it.

1 Like

Thanks for your detailed answer! A lot of assumptions are made, in the forums, about what kind of test methodology Asobo uses (or does not use). By now I would like to know how they are testing exactly. They have no obligation to tell us, but I’m curious about it nevertheless.

The sim is visually stunning. The aircraft are visually, highly detailed. The 152 is great (never flown with a G1000 or flown an Airbus so I can’t comment.) It brought me back to simming after 10 years. It cost $60. I don’t expect perfection.

1 Like

Hi all, I thought I’d take a few minutes to give an updated response to my initial post that seemed to generate quite some conversation.

  1. I’m very happy that this is the “beginning” of using beta testers vs. a real first attempt, that is absolutely a step in the right direction.

  2. Despite @Steeler2340 comments, I’m pretty aware of what the limitations of automated testing are, and how much time “sophisticated” tests can take to set up properly. But let’s not overestimate the work effort here either. I’m talking about tests to find performance issues and memory leaks. Really think it would be pretty easy to automate spawning in airplanes at 1000 random airports (mix of very large, mid, and small / grass / bush strip types), collect performance data under each of the predefined weather conditions, and determine if the performance has dramatically changed from prior releases. Marry these up with a couple of hundred long range autopilot flights and at least the most obvious performance issues and leaks would be found. Yes - this is going to require a group of computers with various hardware (assuming ??? being Microsoft you can’t simulate all this in Azure?), but man this type of testing should be the bare bone minimum before ever putting out a release.

  3. More advanced automated testing should be continuously added.

  4. I still overall do NOT feel like this game is “for simmers”. Here’s one example - without completely mangling your input device characteristics (“tweaking” curves, killing extreme rudder, etc.) you can’t really even keep a plane on the centerline during takeoff. I’d forgotten how horrible ALL of the default planes (at least the GA planes I fly) are until I bought Just Flight’s Pipear Arrow 3. They recommend resetting all the axes to default - and you know what, the plane flies pretty well like that. Try that in any of the “out of the box” planes and you’ll end up in the trees, probably within a few hundred feet.

MSFS is a visual treat to be sure - but I’m (much) more interested having properly simulated airplanes, and I feel like most of those issues aren’t being addressed (at all). Come on Asobo / Microsoft - clean up your airplane models enough so I can steer them on the runway…

Scott

2 Likes

You apparently completely misunderstood my point: automated tests are not „limited“.

Read again what I wrote about your suggested „fly around and track memory leaks suggestion“.

Beta tests are a revolution, such a congrats to the dev team for such a positive approach towards a way better update experience.

1 Like

These are no tests. Let’s hope they finished all the tests for W4 already last month ! These 300 beta participants are all customers. User reviews, or - if you want to give it importance - acceptance tests like these 300 users now help with, represent an end stage just before a major release. A Beta is not to get bugs or memory leaks out. These should be out before users touch the product. Beta users only get a separate milestone version planned to be released a little earlier… and they’ll fly as they are used to fly. And they report their problems to Zendesk, beta section (or somewhere, dunno).

These beta reviews provide log information of a cross section of users (e.g. hardware and FpS), and they are good watchdogs for issues in a major release, e.g. the update installation procedure. Certain issues that pop up with these 300 users could make it vital to postpone the W4 update. I’m very pleased with Asobo taking the effort of setting up / opening up server equipment and contacting all these people, in order to assure us a happy W4 fully updated flight over the Netherlands.

2 Likes

Hi @Steeler2340, really do feel like you have some useful information to share…but confused completely. I’ve re-read your post. Somehow you switched from “automated testing” to “integration testing”. Those two are not the same thing, so I’m having issues trying to follow your train of thought.

And I’d guess I’d still ask: if the goal of this beta is to test performance (which MS/Asobo have clearly stated in the post about the beta), I fail to understand why you don’t think a simple automated test of spawning in aircraft at certain airports / locations, during different lighting and weather conditions, and measuring performance before and after the beta software (across multiple computers with differing hardware) is a bad idea. To me, this is bare bones automated testing that should be done for every release.

p.s. and if you understand the purpose of the beta to test the server infrastructure…I’m certain you are also aware there are tons of software suites out there that let you record player activity and “play it back” to stress test servers. I don’t feel this is the purpose of the beta - only 300 users is hardly going to stress servers (one would hope), but this is yet another type of automated testing that should be done prior to releasing to beta testers.

Thanks,
Scott

Good morning, @SPowell42

First of all: your scenario of “flying around (spawning at different locations, with different characteristics (photogrammetry, high geometry count, high count of AI traffic, …)) and detecting performance issues (and/or memory leaks)” is of course doable. Automated tests are not “limited” in that respect (but let’s not dive into Turing completeness and the like ;))

And yes, I was using the term “integration test”, fully aware that I was opening a can of worms here: there are indeed multiple levels at which automated tests can be applied, starting from unit tests, integration tests, service tests, UI tests, … and depending on who you ask the same tests are sometimes called differently - but typically the following “test pyramid” is a generally accepted “test model”: The Practical Test Pyramid

But again, this is a flight simulator forum and I did not want to dive too deep into automated testing, but you are completely correct: I should have used the term automated test, let’s call it a “family name” for all the different tests that exist.

My actual point is: while your test scenario seems absolutely reasonable in practice it is little helpful. Why? Let’s focus for a moment on the “when” you are going to run this test. Being “automated” it should be run “as often as possible”, best before a given code change is going to be integrated on the “main branch” (or any “stable branch”).

Now here is problem number one: if your automated test runs for hours, it is completely impractical to delay every code change for the same amount of time. What if the test fails? The programmer needs to detect the problem, fix it, check in again… and wait another couple of hours. But more often than not you also get “false rejects”, because some test simply crashed (yes, welcome to reality™ - tests themselves are also “just code” and have bugs etc.), the integration server was rebooted or whatever Murphy’s Law dictates just before the “go release”…

So the “solution” to problem number one is to run the automated tests (the long running ones) “overnight”, aka “asynchronously”. But what if tests have failed the next morning? Which of the dozens of changes that has been checked in the day before (or even the day before that, in case the test was not running yesterday night etc.)? And the finger pointing game starts… sure: with a dozen or so changes (only) it is reasonable to expect to filter out the responsible change (not all changes are going to modify “shader code” or are modifying “geometry construction”).

But what if a designer decided that “8K resolution textures simply look better”? And now due to limited VRAM the game engine is constantly swapping texture data from RAM to VRAM, creating a high load on the memory bus of the GPU? My point is: sometimes a silly change which has been done “far, far away from the problem hot spot” might be responsible.

And now you still need to go over all those dozens of changes from yesterday because a given performance test failed (and again, what if the performance test simply failed because Windows 10 decided to run an automated test (*) in the background? Sure, you might disable automatic updates on test machines, but just as an example that there are dozens of operating system processes that you may not be able to control 100%…).

(*) UPDATE: Hi hi, I was so into testing… of course I meant “update” here. But I leave my typo uncorrected here, for the laugh of it (and who knows what Windows 10 is really doing in the background, after all ;))

Which leads us to the bigger problem number 2: so let’s assume you made a “test flight” between some well-defined locations. And now the test claims that “the average frames per second is 20% lower than it should”. Now what? Is it constantly lower? At one given “hot spot”? And most importantly: why? Is it the brand new 8K livery texture the designer team integrated? The new LOD algorithm which has a problem with certain geometry data? The new shader code which renders depth of field effects? The new autopilot code? The new… you get the idea!

Such a “fly for one hour”-test would tell you almost nothing about the root cause! And whenever you would fix a certain aspect (“maybe it was the 8K texture after all. Let’s try…”) you would have to run the same test for hours again! Just to figure out “no, wasn’t the 8k texture. Let’s try disabling the autopilot this time…”.

In other words: an automated test shall:

  • be quickly reproducible (best in the order of millisecond for unit tests, seconds for integration tests…)
  • test a well-defined aspect (= functionality, module, algorithm, …) where…
  • … the test input is as limited as possible and…
  • … the expected outcome is well defined

Which leads us again to the aforementioned test pyramid: unit tests test a very specific functionality, with a granularity as low as “per method (function)”. They usually run very quickly and so you want to have a lot of unit tests. E.g. you would test the function which returns “triangles, based on surface points and their distance to the camera” (the “LOD algorithm”)".

An “integration test” would perhaps simply test whether a given glass display would be rendered correctly etc.

But in any case: every automated test needs to have a very specific test data input and a well defined output, so that one can quickly reproduce the test and immediatelly know “which system is affected”. E.g. when the “LOD algorithm” fails you know that it can’t be because of the 8K textures, because the livery textures are not any input of the LOD algorithm (and the LOD algorithm can’t even access the livery textures etc.).

That’s what I meant by saying that “flying around for hours and testing performance regressions is not very practical”.

BUT: There is one more test category, which is called “smoke test” (that term really comes from plumbing): there you do not test a specific functionality, but rather “the entire system” (or “the overall behaviour” etc.). So I agree that your test scenario could certainly fall into that category…

2 Likes

The term “fit for purpose” comes to mind. There are many ways this game is not fit for purpose and the creators have a duty to make it right. The complexity of that task is not the customers responsibility.

It fits my purpose. Now what? See what I am getting at? :wink:

Not even a little bit my good fellow.

Couldn’t have said it better - you are bang on the money with this and your whole comment. Ignorance, entitlement and arrogance are the three words that spring to mind when I think of the flight sim community as exemplified on this forum

2 Likes

I think he means that “purpose” (specifically being fit for) is in the eye of the beholder/user/purchaser etc.

But in this case it’s not. This is a flight simulator and it simulates flight and the systems associated with flying. If the game states that a given plane can fly an ILS approach and upon activation of said feature the plane turns around and flies in the opposite direction then, well, you have your answer. It’s not subjective, it is demonstrably broken. Asobo have burned far too much good will with stupid mistakes like this all of which could have been avoided with a more robust testing procedure and with a more nimble update procedure.

1 Like

And you have a written and signed contract with a list of promised features with Asobo where exactly? Remember: you purchased a game, for entertainment purposes only. And even if you are not entertained in most cases you won’t get your money back (except within a certain amount of time perhaps).

Hint: Marketing / advertisement is not a contract / obligation :wink:

1 Like

Relevant consumer law is however, and there are many covering the sale of goods which is why Steam and other stores have refund policies. This has diverted into an absurd tangent and it’s done now. The strawmen have all gone home.