Spatial Audio filters for the Google YouTube 360/VR pipelne.
Measurements from the SADIE Binaural database have now been integrated into the Google YouTube 360/VR pipeline. Google VR Audio enables Ambisonic spatial audio playback in Google VR SDK, YouTube 360/VR, Jump Inspector and Omnitone. The dataset integrated into the Google pipeline is the measured KU100 set taken from here. The following is an explanation of the use of the HRTFs from the Google Spatial Audio Resources page on Git hub
This is a set of binaural measurements (Head Related Impulse Responses or HRIRs) taken from a cube loudspeaker configuration. This configuration is used in the Google VR Audio first-order ambisonic binaural decoder.
This set has been derived from the SADIE binaural measurements and is provided as individual time-domain FIR filters.
subject: 002 (KU100 binaural head)
sampling rate: 48kHz
bit depth: 16 bit
length: 256 samples
fade-in: none
fade-out: half-hann (16 samples)
symmetric: yes (left hemisphere only)
applied gain: 0dB (none)
Also provided is a preset configuration file for the ambiX binaural decoder plugin.
Note: Using these filters directly (or via the ambiX preset) with your ambisonic mix will not result in an output equivalent to Google VR Audio unless you use phase-matched shelf-filters, as recommended by A. J. Heller, R. Lee and E. M. Benjamin in their paper "Is My Decoder Ambisonic?". To solve that problem, we also provide a set called symmetric ambisonic binaural decoder which contains shelf-filtered, symmetric spherical harmonic HRIRs which can be directly applied to an ambisonic mix via a simple filter operation (please see below).
This set represents spherical harmonic HRIR filters used for first-order ambisonic binaural decode in Google VR Audio. To produce the binaural output, these filters should be applied to an ambisonic mix directly and the output routed to the stereo L & R channels as shown in the following diagram:
The filtering operation (* symbol) can be done using freely available tools, for example, free multichannel convolver plugins like LAConvolver (mac) or Freeverb3 (win). The dotted red line with the Ø symbol denotes 180° phase inversion.
This is a set of filters which compensate for the change of HRTFs in the binaural ambisonic decoder. If the ambisonic audio has been mixed against our previous THRIVE HRTFs, applying these filters to the ambiX tracks will minimize the timbral changes after the switch to SADIE KU100 HRTFs. For example, the below diagram shows a frequency response of a decoded sound fields containing a single, broadband sound source in front of the listener. If no compensation is used, switching the binaural decoder from THRIVE to SADIE KU100 will result in spectral changes:
After applying the compensation filters, frequency response differences will be minimized:
The filtering operation (* symbol) can be done using freely available tools, for example, free multichannel convolver plugins like LAConvolver (mac) or Freeverb3 (win).
Binaural decoding of ambisonic spatial audio relies on Head Related Transfer Functions (or their time-domain equivalents - HRIRs) which are unique to each individual. However, our studies showed that some datasets perform better on average than other datasets, in the case when an individualized set is not available.
We tested a large number of available HRTF datasets in a mobile VR application and using a variety of different sound samples. A panel of 53 experienced listeners (20 of whom were expert assessors) compared the overall subjective audio quality of the interactive binaural renders. We found that SADIE KU100 set performed best for majority of the participants:
When creating content for YouTube 360/VR it is recommended that you monitor your ambisonic audio using the provided symmetric ambisonic binaural decoder and/or preview the final mix either directly on YouTube or in the Jump Inspector to control the timbre (coloration) and loudness of your mix (please see below).
HRTFs add specific coloration to the decoded binaural output signal. However, this coloration is dependent on the HRTF set used (due to individual nature of HRTFs). For example, here is an example HRTF frequency response of the THRIVE set and a similar response of the SADIE KU100 set:
If you already have produced an ambisonic soundtrack using our THRIVE HRTF set for monitoring, you can simply correct the frequency response of your ambiX tracks to match the frequency response of the new SADIE KU100 HRTF set. This can be done by applying the filters we share in the /ambisonic correction filters directory.
Due to HRTF processing, the output amplitude of your Ambisonic track may differ depending which HRTF set you use. That is why it is important to monitor the binaural output when working on an ambisonic soundtrack using the binaural decoder which is going to be used to render your ambisonic content. Our binaural decoder's loudness should be the same as the loudness of a standard M-S stereo decoder.
For example, the table below shows expected loudness (according to the EBU R128 standard) and absolute amplitude peak values when a 0.5 normalized pink noise bursts signal is played at 8 different spatial locations around the listener.
Decoder | Loudness [dB LUFS] | Absolute Peak Amplitude |
---|---|---|
M-S Stereo | -10.59 | 0.5000 |
THRIVE | -10.99 | 0.9478 |
SADIE KU100 | -11.20 | 0.9642 |
Please note, although the loudness is matched between different decoders and the stereo decoder, the binaural output may result in the peak amplitude exceeding 0dB FS.
Some third party tools like BlueRippleSound O3A View, Noisemakers AmbiHead and SoundParticles (coming soon) implement their own binaural preview tools using the HRTFs from this repository which match those used by YouTube 360/VR.