Meta collaborated with University of Texas at Austin scholars.
Meta is collaborating with a group of researchers from the University of Texas at Austin (UT Austin) to bring realistic audio to the metaverse.

According to Kristen Garuman, Research Director of Meta AI (opens in new tab), augmented and virtual reality (AR and VR, respectively) are more than just visuals. Audio is crucial in bringing a world to life. According to Garuman, “audio is formed by the environment that [it] is in.” The geometry of a room, what’s in that room, and how far someone is from a source all have an impact on how sound behaves.
Meta’s aim is to utilise AR glasses to record both audio and video from one place, then transform and clean the recording using a set of three AI models so it feels like it’s happening in front of you when you play it back at home. The AIs will consider the room you’re in so that it can match the environment.
According to the projects, Meta looks to be working on AR glasses. Meta’s VR headset strategy includes simulating the sights and sounds of an area, such as a concert, so that you feel as if you are there in person. Read more; Apple VR/AR Headset — Everything We Know So Far
We asked Meta how people could hear the increased audio. Will individuals need to use headphones to listen, or will it be done through the headset? We did not receive a response.
We also inquired Meta how developers could obtain these AI models. They’ve been made open source so that third-party developers can work on the technology, but Meta didn’t provide any other information.
Transformed by AI
The question is how Meta can record audio on a pair of AR glasses and have it reflect a different scene.
The first option is AViTAR, which stands for “Visual Acoustic Matching model.”
This is the AI that adapts audio to a new setting. Meta gives the scenario of a mother using AR glasses to capture her child’s dancing routine in an auditorium.
According to one of the researchers, the mother in question can play the clip back at home, where the AI will morph the sounds. It will scan the environment, adjust for any barriers in the area, and make the recital sound as if it were taking place directly in front of her with the same glasses. According to the researcher, the audio will originate through the spectacles.

Visually-Informed Dereverberation is a tool for cleaning up audio. It basically removes the clip’s irritating reverb. The example presented is of capturing a violin concert at a train station, bringing it home, and having the AI clean up the film so that you only hear music.
The final AI model is VisualVoice, which separates voices from other noises using a combination of visual and acoustic clues. Consider filming a video of two people bickering. This AI will isolate one voice such that you may understand it while silencing all others. According to Meta, visual signals are vital since AI needs to see who is speaking in order to recognise certain nuances and know who is speaking.
In terms of graphics, Meta says they intend to include film and other cues to improve AI-driven audio. Because this technology is currently in its early stages, it’s unclear when or if Meta will bring these AIs to a Quest headset near you. Read also: Magic Leap reveals new AR headset and fresh funding
If you’re considering purchasing an Oculus Quest 2, be sure to read our most recent review. We like it, so there’s that.