At least SSE and ARM have opcodes that handle float-to-int conversions well
enough. Also, Clang doesn't inline lrintf, incurring function call overhead for
what should be a single opcode.
For some reason, the { if(!x)__builtin_unreachable(); } construct does not
provide the same optimization opportunity for Clang (even though the condition
being false would trigger undefined behavior by reaching unreachable code, it
still performs checks and such for the condition potentially being false).
Using __builtin_assume seems to work better.
Semaphores allow for semi-persistent signals, compared to a condition variable
which requires a mutex for proper detection. A semaphore can be 'post'ed after
writing some data on one thread, and another thread will be able to recognize
it quickly even if the post occured in between checking for data and waiting.
This more correctly fixes a race condition with events since the mixer
shouldn't be using mutexes, and arbitrary wake-ups just to make sure an event
wasn't missed was quite inefficient.
To avoid having unknown user code running in the mixer thread that could
significantly delay the mixed output, a lockless ringbuffer is used for the
mixer to provide events that a secondary thread will pop off and process.
Rather than each buffer being individually allocated with a generated 'thunk'
ID that's used with a uint:ptr map, buffers are allocated in arrays of 64
within a vector. Each group of 64 has an associated 64-bit mask indicating
which are free to use, and the buffer ID is comprised of the two array indices
which directly locate the buffer (no searching, binary or otherwise).
Currently no buffers are actually deallocated after being allocated, though
they are reused. So an app that creates a ton of buffers once, then deletes
them all and uses only a couple from then on, will have a bit of waste, while
an app that's more consistent with the number of used buffers won't be a
problem. This can be improved by removing elements of the containing vector
that contain all-free buffers while there are plenty of other free buffers.
Also, this method can easily be applied to other resources, like sources.