ATC and Natural Language Processing - Speech-to-text - Speech synthesis - Voice Recognition

I can’t help but be a little, dissapointed? with the functionality of the ingame ATC at present. I recognise that it’s perhaps not the simplest thing to implement, but i’m surprised to see they’ve used a lot of the legacy code for this area.

Having seen and used a lot of the Azure tech stack and what it can do, I feel it’s maybe a bit of a missed opportunity to showcase some of the language AI features - like being able to speak back to the ATC with a mic and have it run through Natural Language Processing rather than simply acknowledge with the keyboard etc.

Appreciate that for the ultimate in ‘realism’ there will always be Vatsim et al, and an AI is not going to ever replicate that feeling, but I think there’s definitely a space in the middle for simmers who want a higher level of realism that ATC provides, without commiting to full-blown lifelike procedures and associated consequences.

I know it’s a base platform and i’m sure they will have aspirations/plans to build on it - has anyone seen any quotes etc from Asobo on this area? Would be interested to see.

2 Likes

This would be a cool feature

At some point, it would enhance the realism of the game to be able to speak on frequency with air traffic control or on an uncontrolled CTAF using a gaming headset or real aviation headset, if you have the adapter.
Ideally, ATC would be able to perform some basic AI tasks, such as:

  • Interpreting some degree of loose phraseology as a human controller would (People can debate whether it would allow slang like “in the box” and “on the fish finder”, but my opinion is that if a real-life controller would accept (read: tolerate) it, it should be okay for use.)
  • Be able to understand variations of words/phrases (such as “Cessna 316TS” vs “Skyhawk 316TS” vs “6TS”, or “point” vs “decimal”, or “contact ground on 121.9” vs “contact ground .9”)
  • Be able to understand if phrases are moved around (e.g. “Climb and maintain 8000, turn right heading 210” versus “Turn right heading 210, climb and maintain 8000”)

Note: Text-based ATC right now (October 2020) needs a lot of work to be more realistic, so I’m not advocating for this right now, but it would go a long way to adding to the realism of the sim.
Also note: Yes, there are services like VATSIM, but there are a lot of rules for those services. This would be a nice way of not worrying about all of the rules. Or it could be a stepping stone to using services like VATSIM.

Edit: I agree with @Neo4316 's post, below, about information privacy when it comes to processing voice in the cloud:
https://forums.flightsimulator.com/t/atc-communication-using-voice-recognition/302025/26
I do hope that there would be an option for offline voice processing.

Just so you know, Pilot2ATC is pretty good. It works with MSFS and other sims. Voice input and output, plus fairly realistic ATC possibilities - not so much of a ‘branching tree’ structure, so you can ask for what you need when you need it, at least most of the time. It also has flight planning and you can add background chatter for a bit more immersion. The UI is quite a steep learning curve, but it’s a good deal more powerful than we’re likely to see in MSFS for some time to come.

6 Likes

Pilot2atc is essential in my setup, sometimes you don’t want to fly online , or you may know you’ll be pausing the sim.
It assigns sids/stars, you can request different runways or arrivals , vectors to finals, alt airports, it’s pretty in depth, enough to have a learning curve to it .
I plan the flight in it, then you can export the files to all simulators, including msfs and import it in the sim.
It really adds an amazing layer to flight sim. And I don’t get paid to say this lol. But I can’t imagine flying without it.

2 Likes

When I try to export a plan for MSFS from P2A, it never comes out right. How do you get the two to work together so that both are working with the same flight plan?

I just use p2atc - export , and use the fsx format.
Now in the msfs planning screen it doesn’t show the waypoints correctly but when I actually load the flight, the gps or FMC is loading all waypoints correctly in the system. Some kind of bug I guess with the display of it

Thanks. I’ll try that.

Thanks, guys. I saw you guys say it has a pretty tough learning curve. Is that because it’s not intuitive, or because there are tons of features?
Do you think it could be done better in the sim if Asobo were to implement this one day? (With the understanding that it might take several releases before that happens.)

2 Likes

Tons of features, mostly, and because it has to integrate several different things to be a functional system. It’s designed to look a bit like a glass cockpit screen, with a moving map in the middle, a flight plan to the right, etc. You can talk to ATC by voice, or you can select from a list of things to “say” that’s way larger than MSFS has. Plus there are several configuration screens. So there’s quite a lot to get to grips with before you get to say your first “Cessna 1234 request taxi, departure to the east”, but it’s not that hard, really, just a bit overwhelming when you see all the stuff in the UI.

I think Asobo could implement the voice ATC part by itself, given that they already have flight planning and a moving map, so in that sense a built-in system would be better than Pilot2Atc. But it’s no small feat to make a free-style and flexible voice-based system. They could definitely just add voice recognition to their present ‘menu’-based system, but it would feel a bit lame, I think. Once you add voice, people will expect to be able to say what they want to, not just recite one of two fixed options. That alone would still be a good feature, though, for those who just want a sense of immersion but don’t want the complexities of real-life ATC.

1 Like

I heard that Microsoft has acquired exclusive licensing of the amazing general AI technology “GPT-3” some time ago. This got me thinking that a great use of this technology would be real-time voice interaction with ATC, or even your co-pilot. If Asobo or MS is listening, please consider this, as this game would provide an incredible test-bed to advance this kind of general artificial intelligence.

3 Likes

actually, this is not even needed by microsoft.

i was working on a project back in the FSX days where my program would take a screenshot of the ATC window then use OCR (optical character recognition) to read the history and options available. it would then hear what the user said and decide what the user said was complete and if it was properly constructed, would pick an option for the ATC system that corresponded with what was said.

2 issues killed the project. #1, OCR is very CPU intense and with FSX and the hardware of the era, doing it on the same PC was just not going to be viable in realtime with good FPS and quick response by my program. #2, in busy traffic scenarios, there was no way to stop the default AI from transmitting while the user was speaking causing an issue where the menu options were not available when the program decided to send the selected response.

to fix this, I needed 2 things that still 16 years later have yet to come.

#1 the ability to tell the AI ATC engine that the user is transmitting and to pause any transmissions from ATC or traffic as well as the ability to clear that flag when the user has stopped transmitting.

#2 a series of variables that would list the chat history from ATC as well as the menu options available.

#2 would eliminate OCR as a necessity and improve responsiveness to where it would be a great experience (at least it was 7/10 while developing and the ATC wasn’t blocking my transmissions)

1 Like

Sounds like a very cool project. Shame you couldn’t get it progressed. What I was keen on is having a flexible conversation with an A.I., where it didn’t matter if I used a prescribed statement. I want the A.I. to disambiguate meaning by having prior knowledge as a reference point, enabling me to characterise a request or response in a more natural and spontaneous manner. I’ve heard that GPT-3 comes very close to achieving that. You should also be able to banter about things too.

1 Like

If Microsoft worked with their GPT-3, since they have it now, it could be a real showpiece for them to have a really nice ATC built with it here. :small_airplane: :smile_cat:

2 Likes

the idea of my project was that it would parse the atc directives into callsign, clearance type, heading/route, altitude, freq, squawk. Based on what was received, you’d have to issue a call that fulfilled all of the pieces, but you could do it in any order with whatever phrases you wanted, within reason.

ATC made up clearance
Cessna 49C, cleared to KIAH as filed. On Departure Climb and maintain 3000. Departure will be 119.35. Squawk 1123.

to cessna 49C - thats me, so start gathering what needs readback
cleared to KIAH - “cleared” pertinent, “to IAH”, “to KIAH”, “to William P Hobby” could all be used
route as filed - pertinent
climb and maintain 3000 - pertinent
departure 119.35 - pertinent
1123 - pertinent

so, you could readback
“Cessna 49C cleared as filed. On departure climb to 3000. Departure on 119.35. 1123 in the box.” and it would choose the “acknowledge clearance” ATC option.

it would ignore the extra bits that aren’t pertinent and just make sure the cleared as filed, 3000, departure 119.35 and 1123 are all in your readback. If you’re missing something, I’d like it to tell you readback incorrect and re-issue the instruction, which would be an option needed in the ATC interface backend to trigger that result.

Come on Asobo…do me a solid and let me have at it…let me fill in that blank that so many of us want…the ATC is decent enough (when not telling you to maintain cruise after top-of-descent or even higher) and all you want to do is control it accurately with your voice.

Options exposed in simconnect
Chat history exposed in simconnect (could even be limited to player aircraft to save space)
Flag to indicate user is transmitting
Flag to indicate user readback incorrect
Reissue clearance/instruction based on readback correct flag

Gimme this help and lets enjoy default ATC once again…

Pilot2ATC does look good, but I want the “next-gen” … a conversation with an A.I. tower and maybe also a co-pilot. It’s a fairly limited domain of knowledge when compared to the set of knowledge required for general intelligence, so I reckon it would make a great case for GPT-3. Don’t care how long it takes … impress us all with an amazing “first-of-its-kind” chat with A.I. agents in a game. If they decide to do something that is just antecedent / consequent based flow … I will be disappointed, but it would be a nice feature.

1 Like

Pilot2ATC is a good program. But it doen’t interact with AI. There is a request for AI control: AI TRAFFIC control in SDK

This technology is getting fairly stable now, and there are several models that could be adopted to MSFS. It would certainly be a realism improvement to be able to just repeat back to ATC the instructions, or to query them (“say again”)

Take a look at Pilot2ATC

VoiceAttack $10