With the vector all-pass applied in a self-contained function, the individual
steps of the early and late reverb stages can be better optimized with tighter
loops. That allows for more data to be held local, resulting in less thrashing
from reloading the same values multiple times.
There is room for further improvement, depending on the length of the early
delay lines and all-pass delay lines allowing for bulk reads.
The late reverb line lengths are long enough to ensure a single process loop
won't rely on reading samples it wrote in the same call. So we can safely read
in all samples we need from the feedback buffer up front, then more efficiently
filter them.
Instead of generating both the early and late reverb samples first, then mixing
them both to output, this now generates and mixes the early reflections then
generates and mixes the late reverb. There's no reason to hold both at the same
time so this reduces the amount of temporary storage needed.
Otherwise, using the device's maximum frequency will cause the weighting
factors to shift for different sample rates, irrespective of the content being
processed. 20khz is the maximum allowed reference frequency, and also acts as
the upper limit of human hearing.
Because density/late line length changes start affecting late reverb output
right away, with samples that are still going through feedback decay and not
just new input samples, it makes more sense to correct for it on output instead
of input. This has the additional benefit of working with the output mixer's
gain fading, avoiding discontinuities from significant density gain changes.
Now it only accounts for the representable frequency range (0.5 normalized, or
0...pi radians instead of tau). Previously, the bulk of the weighting factors
was given to the HF decay (nearly 90%, given a 44.1khz sample rate and the
default 5khz reference), with low- and mid-frequency decays splitting the
remaining 10%. Now it's closer to 75%, matching the range of representable
frequencies above the reference.
This could probably be improved further due to human hearing being less
sensitive to higher frequencies, but that is much more complicated.
This is not the output compressor/limiter, but the EFX effect. Consequently, it
simply compresses the dynamic range around 1.0 (boosting samples below it by up
to double, reducing samples above it by as much as half). This is not intended
to prevent clipping on the output, but to instead reduce the range between
quiet sounds and loud sounds.
Two new CMake options are available for 32-bit targets that accept -msse:
ALSOFT_ENABLE_SSE_CODEGEN and ALSOFT_ENABLE_SSE2_CODEGEN, which default to
TRUE. This should not affect MSVC, which already defaults to SSE2 codegen.
Draining the ALSA device via stopping puts it into a setup state, which
requires re-preparing before playback can start again. Preparing it prior to
the first start seems to cause no harm, so just always do it before starting.
In 'alcCaptureCloseDevice', check if the capture device is
running and stop it if necessary.
This fixes the case where the device data is deallocated while
a background thread is still running (Issue #199)