Turns out the C version of the cubic resampler is just slightly faster than
even the SSE3 version of the FIR4 resampler. This is likely due to not using a
64KB random-access lookup table along with unaligned loads, both offseting the
gains from SSE.
This will be to allow buffer layering, multiple buffers of the same format and
sample rate that are mixed together prior to resampling, filtering, and
panning. This will allow composing sounds from individual components that can
be swapped around on different invocations (e.g. layer SoundA and SoundB on one
instance and SoundA and SoundC on a different instance for a slightly different
sound, then just SoundA for a third instance, and so on). The longest buffer
within the list item determines the length of the list item.
More work needs to be done to fully support it, namely the ability to specity
multiple buffers to layer for static and streaming sources. Also the behavior
of loop points for layered static sources should be worked out. Should also
consider allowing each layer to have a sample offset.
The context state properties are less likely to change compared to the listener
state, and future changes may prefer more infrequent updates to the context
state.
Note that this puts the MetersPerUnit in as a context state, even though it's
handled through the listener functions. Considering the infrequency that it's
updated at (generally set just once for the context's lifetime), it makes more
sense to put it there than with the more frequently updated listener
properties. The aforementioned future changes would also prefer MetersPerUnit
to not be updated unnecessarily.
This improves the transition width, allowing more of the higher frequencies
remain audible. It would be preferrable to have an upper limit of 32 points
instead of 48, to reduce the overall table size and the CPU cost for down-
sampling.
Rather than storing individual pointers to filter, scale delta, phase delta,
and scale phase delta entries, per phase index, the new table layout makes it
trivial to access the per-phase filter and delta entries given the base offset
and coefficient count.
The old layout separated filters, scale deltas, phase deltas, and scale phase
deltas into separate segments that each contained a numbers of scale and phase
entries, Since processing a sample needed a filter and one of each delta entry
relating to a particular scale and phase, the memory needed would be spread
across the whole table. And since subsequent samples would use a different
phase, it would jump around the table a whole lot as well.
The new layout packs the data in a way more consistent with its use. The
filters, scale deltas, phase deltas, and scale phase deltas are interleaved,
such that for a particular scale and phase, the filter and delta entries used
are contiguous. And the phase entries for a particular scale are kept together,
so the ~500 to ~1000 samples processed per source update stay within the same
3KB to 6KB area of the 70+KB table, which is much more cache friendly.
This improves a stereo (front-left + front-right) sound "image" by generating a
front-center channel signal. Done correctly, it helps reduce the comb effects
and phase errors associated with using only two speakers to simulate center
sounds.
Note that it shouldn't be used if the front-center channel is already included
in the positional audio mix (the dialog effect is okay). In general, it may
actually be better to exclude the front-center channel from the positional
audio mix and use this to generate front-center output.