Could really do with some optimizations to the mixing gain calculations. For
ambisonic targets, the coefficients will only have 1 non-0 entry for each
output, so the double loop in unnecessarily wasteful. Similarly, most uses
won't need a full height encoding either, so a horizontal-only or mixed-order
target could reduce the number of channels.
This uses a virtual B-Format buffer for mixing, and then uses a dual-band
decoder for improved positional quality. This currently only works with first-
order output since first-order input (from the AL_EXT_BFROMAT extension) would
not sound correct when fed through a second- or third-order decoder.
This also does not currently implement near-field compensation since near-field
rendering effects are not implemented.
There were phase issues caused by applying HRTF directly to the B-Format
channels, since the HRIR delays were all averaged which removed the inter-aural
time-delay, which in turn removed significant spatial information.
This mixes to a 4-channel first-order ambisonics buffer. With ACN ordering and
N3D scaling, this makes it easy to remain compatible with effects that only
care about mono input since channel 0 is an unattenuated mono signal.
Note that it still uses FuMa scalings internally. Coefficients loaded from
config files specify if they're FuMa (in both ordering and scaling) or N3D,
and will get reordered or rescaled as needed.
It seems a simple scaling on the coefficients will allow first-order content to
work with second- and third-order coefficients, although obviously not with any
improved locality. That may be something to look into for the future, but this
is good enough for now.
It's possible to calculate HRTF coefficients for full third-order ambisonics
now, but it's still not possible to use them here without upmixing first-order
content.
This adds the ability to directly decode B-Format with HRTF, though only first-
order (WXYZ) for now. Second- and third-order would be easilly doable, however
we'd need to be able to up-mix first-order content (from the BFORMAT2D and
BFORMAT3D buffer formats) since it would be inappropriate to decode lower-order
content with a higher-order decoder.
With HRTF mixing, certain things are mixed to virtual channels to be filtered
with HRTF later. This allows for using an 8-channel cube instead of a 14-
channel cube+diamond.
The sound localization with virtual channel mixing was just too poor, so while
it's more costly to do per-source HRTF mixing, it's unavoidable if you want
good localization.
This is only partially reverted because having the virtual channel is still
beneficial, particularly with B-Format rendering and effect mixing which
otherwise skip HRTF processing. As before, the number of virtual channels can
potentially be customized, specifying more or less channels depending on the
system's needs.
This new method mixes sources normally into a 14-channel buffer with the
channels placed all around the listener. HRTF is then applied to the channels
given their positions and written to a 2-channel buffer, which gets written out
to the device.
This method has the benefit that HRTF processing becomes more scalable. The
costly HRTF filters are applied to the 14-channel buffer after the mix is done,
turning it into a post-process with a fixed overhead. Mixing sources is done
with normal non-HRTF methods, so increasing the number of playing sources only
incurs normal mixing costs.
Another benefit is that it improves B-Format playback since the soundfield gets
mixed into speakers covering all three dimensions, which then get filtered
based on their locations.
The main downside to this is that the spatial resolution of the HRTF dataset
does not play a big role anymore. However, the hope is that with ambisonics-
based panning, the perceptual position of panned sounds will still be good. It
is also an option to increase the number of virtual channels for systems that
can handle it, or maybe even decrease it for weaker systems.