The issue
Creation of immersive 3D sound over headphones is experiencing renewed interest. This can partly be attributed to technical advances in ultra high-definition video displays and interactive virtual reality (VR) headsets, but we are also seeing a surge in production support for consumer-based 360 degree visual and audio content generation and consumption.
Binaural surround sound delivered using headphones is commonly used to accompany such immersive displays. As a result, it is challenged with forming realistic, or hyper-real, sound fields that are experienced with good externalisation, localisation and sound quality. This research project looks at enabling binaural reproduction for mass audiences with uncompromised sound quality.
The research
3D audio can be created by filtering recorded sounds with special filters known as head-related transfer functions (HRTFs). These describe the interaction between the head, torso and ears of a listener and a sound source at a given angle relative to the head. HRTFs are typically measured using probe microphones in a subject's ears or from binaural dummy head microphones.
Our approach is to treat the measured HRTFs as virtual loudspeakers equally spaced around the head. Ambisonics, a 3D audio rendering technique, is then used to create the signals that feed the virtual loudspeakers to create the immersive soundscape. Ambisonics has the advantage that the entire sound scene can be translated to counter head movements. If a listener moves their head in a headphone based and motion tracked VR experience, the sound will remain stable in 3D space as it does in real life.
This work was initiated as part of the Engineering and Physical Sciences Research Council (EPSRC) funded Spatial Audio For Domestic Interactive Entertainment Project (SADIE). The project looked at improving immersive sound rendering in the home. Measured datasets of binaural filters have been generated to support this work, leading to the team at York to use this data in creating new algorithms to improve VR audio rendering.
How 3D audio sound scenes can be best compressed for streaming services such as YouTube 360 is also of importance. York’s AudioLab have evaluated the optimal bit rates and encoding conditions for binaural-based ambisonics to be delivered through Google.
The outcome
The SADIE binaural filters have been integrated into Google’s VR pipeline influencing the audio rendering at all stages of VR content creation. They are at the heart of Google Resonance Audio, a cross-platform software development kit (SDK) for Android and iOS that allows for VR content creation with Unity, WWISE and FMOD as well as delivery of immersive experiences through the web.
To understand the impact to date, Google have shipped more than 10 million Cardboard VR headsets. Over the first 19 months of the Cardboard platform, over 1,000 applications were developed with 25 million installations made worldwide. In May 2016, Google followed this up with the Daydream VR platform, whose VR mode included apps such as YouTube 360.
The AudioLab continues to work with Google to improve spatial audio quality through the browser, specifically looking at the optimal bit-rates and compression for streamed spatial audio delivery while preserving spatial fidelity and timbre.