New Video about Azure AI Speech

Just another one of their videos, where you can barely understand the speaker, as he talks about something, that by the time it is implemented, we will have moved on.


Very good production in the video.
The quality of Azure AI Speech is very good.
The quality of the ATC itself (the very purpose of Azure AI Speech in MSFS) is questionable at best.

I am be overreacting but MS may be using MSFS as a vehicle to appeal to a wider audience but this is first and foremost a flight sim and then everything else. Any improvements are welcome of course.

Phraseology and instructions aside, the speech itself is really good in MSFS. Like shockingly realistic (most of the time). It just sucks to have such few voices and no real variation on accents.

Hopefully this means we can expect a new palette of voices in the future, particularly more localized ones with correct accents.


Understood every single word they said, you just have to listen. There are closed captions if you need them.


What I really want is an option to select the speed of the speech synthesis. The ATC is a bit slow.

I know that the technology supports it. It’s just a matter of implementation.


Hope so. flying in NZ and hearing American accent? Nope that won’t do lol


Indeed. It would be nice to have the option of adjusting the AI voice parameters a bit. I do find them a bit slow at times.

I feel you. It feels odd coming into an airport in Asia and have ATC talking to me in perfect American English. lol

That said, I used to play an MMO and one of our guys was Welsh (and coincidentally, an ATC as well). I could barely understand a word he was saying much of the time. With localized accents, ATC may be much harder to understand. Although it will be more realistic.

If you find Welsh accent tough to understand, try Scouser or even better, Scottish LOL (no offense to them, and I found it a really entertaining and edutaining challenge to learn to understand them).

if they can get ATC to stop saying ‘Cessna one hundred and fifty two’ or ‘Solo one hundred and three’ etc…that would be great.

also - in FSX there was a 3rd party program that let you set a faster cadence for the ATC as well as shorten aircraft ID callouts. Just that alone made it much better than default.

100% agree. Global accents would be such a boon to this sim - it would really help with the immersion.

That was edit voicepack , you could adjust the speed of the atc voices.
It was especially good if you had ultimate traffic installed as well as the airwaves were constantly active with other AI traffic talking and receiving atc instructions
Sometimes you had to be patient and persistent to get your instructions across
In effect the atc immersion was so much better , plus you could choose QNH or inches of Mercury and get vectors to runways for ILS or visual approaches


lol - that was it - and I remember a few times having to orbit / make wider base turns near KSAN until I could get a chance to confirm clearance to land…
I think I ended up with commercial traffic reduced to 15% and GA at 40% so I could get a word in.


EditVoicepack would also change the phraseology to conform to ICAO or FAA standards. It was not perfect of course, but I really thought that the MSFS ATC would be at least the level of EVP.

But it’s not and on top, the architecture of the ATC seems closed to third parties. I know they want to make a showcase for Azure. But the functionality is so limited.


that was my expectation too - that the ATC would be scalable (tempo / localized accents / regional or National standards) and just way better right out of the box…
in fact - my first thought on experiencing MSFS ATC was that it was essentially a straight port of FSX ATC. Long Long way to go before it can match the visual attributes of this sim. Same with so many other key aspects of MSFS


Yes it did have that also along with other ATC functions as you mentioned

Actually yes the other 3rd party software ultimate traffic would have you doing Holding patterns if you had your traffic sliders up too far but it was really immersive and the traffic had the proper liveries for the part of the world you were flying in

In the air and at the airports it was truly immersive built on real world airline flight plans

Just for information I have no affinity to these developers but they were very good , I just wished we had those 3rd party programs available for MSFS

I think GSX are working on passengers and ground operations, however I think the ground crews in MSFS are pretty good , just need the passenger bus to ferry them to and from the terminal if there aren’t jetways available

I also have a faint memory of EditVoicepack knowing the transition altitudes of different regions, but that might have been another third party mod. But it was possible to change the TA from the standard 18,000ft FAA standard. Maybe that was UT? It’s been so long that I actually played FSX.

It does show that apart from actual pronounciation, the flexibility and functionality was largely on par with the current MSFS ATC and in parts it was even better.

It’s not my intention to spoil the mood of this thread at all. But to point to areas that Asobo could still improve upon. The technical feat of having cloud based speech synthesized integrated is one thing, but consistency and functionality, I hope, grow with future patches.

Sometimes one of the available voices manages to pronounce a callsign correctly and another voice will not and sometimes even the same voice will do the correct pronounciation and the next time just spell out every letter. It’s a little hard to fathom sometimes. Probably has to do with how the data is stored/transmitted between the sim and Azure?

I totally agree with everything you said in your post, let’s hope that in future updates these areas get addressed

The ground crew is quite good. I just want an intuitive way to control them manually.

Bang for the buck, we have a good deal of overall functionality in the package of MSFS. It just needs some polish and enhancements.


Nice video. From a TTS developer perspective - I’ve been involved in the distance past, ca 25 years ago - I find it astonishing how this technology has advanced. These are understandable voices !! At the same time, I wonder why none of the synthetic voices in the video feature an Asian accent, while all of the human speakers in de video do have a heavy Asian accent… if you really want to be realistic, Japanese air traffic operators talk Japanese accent English… Dutch operators talk Dutch accent English… French operators talk French accent English for ATC… Now THAT would be the big challenge !! Have really different voices for different locations…

