It seems Microsoft have the tech now to make this pretty easy to do, so please let us do it!
“Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models.”
The Skynet Terminator used this technology to sound like Sarah Connor when calling John Connor on the phone, but quickly the tables turned when Cameron used the same technology to sound like John Connor to flush out the Terminator.
Seems to me a simpler solution. I fly IL2 Great Battles online on a server called Combat Box and there they have implemented SRS and related bots to simulate airfield tower and combat air control in an extraordinary manner. I can speak directly to these bots requesting directions, to targets and airfields, it can provide a basic flight following, I can request callsigns, communicate with other live pilots, and a whole lot more. Responses are in natural voices, male and female and interact, for the most part, very well as long as one’s diction is clear. I believe DCS also uses a similar setup. This really adds to the immersion and has an exceptionally small footprint on your system. On MSFS 2020 this seems to be nonexistent unless you pay out $ for some 3rd party app that may or may not play well from your community folder and are less intuitive than SRS. Maybe I am over simplifying the complexity of MSFS in implementing something like this, but it sure would help make the sim more immersive.