Turns out the C version of the cubic resampler is just slightly faster than
even the SSE3 version of the FIR4 resampler. This is likely due to not using a
64KB random-access lookup table along with unaligned loads, both offseting the
gains from SSE.
This will be to allow buffer layering, multiple buffers of the same format and
sample rate that are mixed together prior to resampling, filtering, and
panning. This will allow composing sounds from individual components that can
be swapped around on different invocations (e.g. layer SoundA and SoundB on one
instance and SoundA and SoundC on a different instance for a slightly different
sound, then just SoundA for a third instance, and so on). The longest buffer
within the list item determines the length of the list item.
More work needs to be done to fully support it, namely the ability to specity
multiple buffers to layer for static and streaming sources. Also the behavior
of loop points for layered static sources should be worked out. Should also
consider allowing each layer to have a sample offset.
This improves the transition width, allowing more of the higher frequencies
remain audible. It would be preferrable to have an upper limit of 32 points
instead of 48, to reduce the overall table size and the CPU cost for down-
sampling.
This is a bit more efficient than calling the normal HRTF mixing function
twice, and helps solve the problem of the values generated from convolution not
being consistent with the new HRIR.
This greatly improves HRTF performance since the dual-mix only applies to the
64-sample coefficient transition. So rather than doubling the full mix, it only
doubles 64 samples out of the full mix.
This is intended to do conversions for interleaved samples, and supports
changing from one DevFmtType to another as well as resampling. It does not
handle remixing channels.
The mixer is more optimized to use the resampling functions directly. However,
this should prove useful for recording with certain backends that won't do the
conversion themselves.
This should cut down on unnecessary quantization noise (however minor) for 8-
and 16-bit samples. Unfortunately a power-of-2 multiple can't be used as easily
for converting float samples to integer, due to integer types having a non-
power-of-2 maximum amplitude (it'd require more per-sample clamping).
This improves fading between HRIRs as sources pan around. In particular, it
improves the issue with individual coefficients having various rounding errors
in the stepping values, as well as issues with interpolating delay values.
It does this by doing two mixing passes for each source. First using the last
coefficients that fade to silence, and then again using the new coefficients
that fade from silence. When added together, it creates a linear fade from one
to the other. Additionally, the gain is applied separately so the individual
coefficients don't step with rounding errors. Although this does increase CPU
cost since it's doing two mixes per source, each mix is a bit cheaper now since
the stepping is simplified to a single gain value, and the overall quality is
improved.