At least SSE and ARM have opcodes that handle float-to-int conversions well
enough. Also, Clang doesn't inline lrintf, incurring function call overhead for
what should be a single opcode.
For some reason, the { if(!x)__builtin_unreachable(); } construct does not
provide the same optimization opportunity for Clang (even though the condition
being false would trigger undefined behavior by reaching unreachable code, it
still performs checks and such for the condition potentially being false).
Using __builtin_assume seems to work better.
Semaphores allow for semi-persistent signals, compared to a condition variable
which requires a mutex for proper detection. A semaphore can be 'post'ed after
writing some data on one thread, and another thread will be able to recognize
it quickly even if the post occured in between checking for data and waiting.
This more correctly fixes a race condition with events since the mixer
shouldn't be using mutexes, and arbitrary wake-ups just to make sure an event
wasn't missed was quite inefficient.
To avoid having unknown user code running in the mixer thread that could
significantly delay the mixed output, a lockless ringbuffer is used for the
mixer to provide events that a secondary thread will pop off and process.
Rather than each buffer being individually allocated with a generated 'thunk'
ID that's used with a uint:ptr map, buffers are allocated in arrays of 64
within a vector. Each group of 64 has an associated 64-bit mask indicating
which are free to use, and the buffer ID is comprised of the two array indices
which directly locate the buffer (no searching, binary or otherwise).
Currently no buffers are actually deallocated after being allocated, though
they are reused. So an app that creates a ton of buffers once, then deletes
them all and uses only a couple from then on, will have a bit of waste, while
an app that's more consistent with the number of used buffers won't be a
problem. This can be improved by removing elements of the containing vector
that contain all-free buffers while there are plenty of other free buffers.
Also, this method can easily be applied to other resources, like sources.
Rather than hackily combining bit flags with the format, to increase the number
of potential flags. alBufferData now behaves as if calling alBufferStorageSOFT
with a flags value of 0.
Requires having the same format as the last call to alBufferData. Also only
makes sense when given a NULL data pointer, as otherwise the internal data will
be overwritten anyway.
Requires the MAP_READ_BIT or MAP_WRITE_BIT flags to be OR'd with the format
upon a call to alBufferData, to enable mappable storage for the given access
types. This will fail if the format requires internal conversion and doesn't
resemble the original input data, so the app can be guaranteed the size, type,
and layout of the original data is the same as what's in storage.
Then alMapBufferSOFT may be called with appropriate bit flags to get a readable
and/or writable pointer to the buffer's sample storage. alUnmapBufferSOFT must
be called when access is finished. It is currently invalid to map a buffer that
is attached to a source, or to attach a buffer to a source that is currently
mapped. This restriction may be eased in the future, at least to allow read-
only access while in use (perhaps also to allow writing, if coherency can be
achieved).
Currently the access flags occupy the upper 8 bits of a 32-bit bitfield to
avoid clashing with format enum values, which don't use more than 16 or 17
bits. This means any future formats are limited to 24-bit enum values, and also
means only 8 flags are possible when declaring storage. The alternative would
be to add a new function (alBufferStorage?) with a separate flags parameter.