Semaphores allow for semi-persistent signals, compared to a condition variable
which requires a mutex for proper detection. A semaphore can be 'post'ed after
writing some data on one thread, and another thread will be able to recognize
it quickly even if the post occured in between checking for data and waiting.
This more correctly fixes a race condition with events since the mixer
shouldn't be using mutexes, and arbitrary wake-ups just to make sure an event
wasn't missed was quite inefficient.
They're now decompressed on the fly in the mixer. This is not a significant
performance issue given that it only needs a 512-byte lookup table, and the
buffer stores half as much data (it may actually be faster, requiring less
overall memory).
Turns out the C version of the cubic resampler is just slightly faster than
even the SSE3 version of the FIR4 resampler. This is likely due to not using a
64KB random-access lookup table along with unaligned loads, both offseting the
gains from SSE.
This will be to allow buffer layering, multiple buffers of the same format and
sample rate that are mixed together prior to resampling, filtering, and
panning. This will allow composing sounds from individual components that can
be swapped around on different invocations (e.g. layer SoundA and SoundB on one
instance and SoundA and SoundC on a different instance for a slightly different
sound, then just SoundA for a third instance, and so on). The longest buffer
within the list item determines the length of the list item.
More work needs to be done to fully support it, namely the ability to specity
multiple buffers to layer for static and streaming sources. Also the behavior
of loop points for layered static sources should be worked out. Should also
consider allowing each layer to have a sample offset.
This improves the transition width, allowing more of the higher frequencies
remain audible. It would be preferrable to have an upper limit of 32 points
instead of 48, to reduce the overall table size and the CPU cost for down-
sampling.
This is a bit more efficient than calling the normal HRTF mixing function
twice, and helps solve the problem of the values generated from convolution not
being consistent with the new HRIR.
This greatly improves HRTF performance since the dual-mix only applies to the
64-sample coefficient transition. So rather than doubling the full mix, it only
doubles 64 samples out of the full mix.
This is intended to do conversions for interleaved samples, and supports
changing from one DevFmtType to another as well as resampling. It does not
handle remixing channels.
The mixer is more optimized to use the resampling functions directly. However,
this should prove useful for recording with certain backends that won't do the
conversion themselves.