I don't like this, but it's currently necessary. The problem is that the
ambisonics-based panning does not maintain consistent energy output, which
causes sounds mapped directly to an output channel to be louder compared to
when being panned. The inconcistent energy output is partly by design, as it's
trying to render a full 3D sound field and at least attempts to correct for
imbalanced speaker layouts.
Although it is more correct for preserving the apparent volume, the ambisonics-
based panning does not work on the same power scale, making it louder by
comparison.
For mono sources, third-order ambisonics is utilized to generate panning gains.
The general idea is that a panned mono sound can be encoded into b-format
ambisonics as:
w[i] = sample[i] * 0.7071;
x[i] = sample[i] * dir[0];
y[i] = sample[i] * dir[1];
...
and subsequently rendered using:
output[chan][i] = w[i] * w_coeffs[chan] +
x[i] * x_coeffs[chan] +
y[i] * y_coeffs[chan] +
...;
By reordering the math, channel gains can be generated by doing:
gain[chan] = 0.7071 * w_coeffs[chan] +
dir[0] * x_coeffs[chan] +
dir[1] * y_coeffs[chan] +
...;
which then get applied as normal:
output[chan][i] = sample[i] * gain[chan];
One of the reasons to use ambisonics for panning is that it provides arguably
better reproduction for sounds emanating from between two speakers. As well,
this makes it easier to pan in all 3 dimensions, with for instance a "3D7.1" or
8-channel cube speaker configuration by simply providing the necessary
coefficients (this will need some work since some methods still use angle-based
panpot, particularly multi-channel sources).
Unfortunately, the math to reliably generate the coefficients for a given
speaker configuration is too costly to do at run-time. They have to be pre-
generated based on a pre-specified speaker arangement, which means the config
options for tweaking speaker angles are no longer supportable. Eventually I
hope to provide config options for custom coefficients, which can either be
generated and written in manually, or via alsoft-config from user-specified
speaker positions.
The current default set of coefficients were generated using the MATLAB scripts
(compatible with GNU Octave) from the excellent Ambisonic Decoder Toolbox, at
https://bitbucket.org/ambidecodertoolbox/adt/
At 0 distance from the listener, the sound is omni-directional. As the source
and listener become 'radius' units apart, the sound becomes more directional.
With HRTF, an omni-directional sound is handled using 0-delay, pass-through
filter coefficients, which is blended with the real delay and coefficients as
needed to become more directional.
Currently the only way SSE 4.1 is detected is by using __get_cpuid, i.e. with
GCC. Windows' IsProcessorFeaturePresent does not report SSE4.1 capabilities.
They are still there for auxiliary sends. However, they should go away soon
enough too, and then we won't have to mess around with calculating extra
"predictive" samples in the mixer.
This fades the dry mixing gains using a logarithmic curve, which should produce
a smoother transition than a linear one. It functions similarly to a linear
fade except that
step = (target - current) / numsteps;
...
gain += step;
becomes
step = powf(target / current, 1.0f / numsteps);
...
gain *= step;
where 'target' and 'current' are clamped to a lower bound that is greater than
0 (which makes no sense on a logarithmic scale).
Consequently, the non-HRTF direct mixers do not do not feed into the click
removal and pending click buffers, as this per-sample fading would do an
adequate job of stopping clicks and pops caused by extreme gain changes. These
buffers should be removed shortly.