Classic B-Format uses scaling factors W=1, X=sqrt(2), Y=sqrt(2), and Z=sqrt(2),
which is +3dB louder than FuMa. The base factors are designed assuming classic
scaling, so encoding a 0dBFS FuMa signal without accounting for this would
result in the UHJ signal peaking at about -3dBFS. Similarly, decoding UHJ to
FuMa B-Format would be +3dB louder than intended.
So encoding needs to implicitly boost the signal by +3dB, and decoding needs to
attenuate by -3dB.
For HOA signals, the number of responses used with slightly varying delays
causes noticeable attenuation in the higher frequencies because of destructive
phase interference. This is not a result of minimum phase alignment (attempts
to compensate for minimum phase had negligible results), nor does it affect
first-order signals (which only has 4 unique responses on each side).
This alternate alignment is only used when doing second-order rendering for
HRTF output, which is not the default with HRTF. It's likely not very ideal,
but it's necessary to prevent second-order rendering with HRTF from sounding
muffled.
For 2-channel UHJ, two decoding equations are provided in the original paper.
The alternative one is most often referenced for 2-channel UHJ decoding, but
the original/general one can also be used by assuming T is fully attenuated
(which the format allows for, as T can be variably attenuated by a factor
between 0 and 1 to deal with an imperfect transmission medium).
Neither method can be perfect for 2-channel UHJ, it's irrevocably lossy to the
original source, but my subjective testing indicates the general equation
produces less audibly errant results.
There are no known file formats intended to support 3- and 4-channel UHJ, but
it is possible to store them in various audio files when a player/decoder is
aware of what it's dealing with. So there's no reason not to have it as an
option.
Currently only supports 2-channel UHJ, and the produced .amb files shouldn't be
played as normal B-Format (decoded 2-channel UHJ needs to use different shelf
filters).
libsndfile apparently has issues reading floating-point wave files as 16-bit
samples (produces silence). Even on other file formats, reading float samples
as integer samples has no over/underflow protection, so this is better for
those formats too.
For real this time. The non-all-passed signal needs a one-sample delay over the
all-passed signal. Because of the way the all-pass FIR filter is structured,
it wouldn't otherwise use the last buffered sample, allowing it to be shifted
forward in time by one sample.
Also, remove a couple unnecessary buffers.
This uses a bit more memory (each voice needs to hold buffers for the
deinterleaved samples of each channel, instead of just one buffer for the
current channel being mixed on the device), but it will allow for handling
formats that need or prefer their channels decoded together.
NULL devices are still checked, but invalid non-NULL device handles will invoke
undefined behavior, as will attempting to close the device while the function
is being executed (modifying the device state while the function is being
called was inadvertently already UB, and will now remain so).
This change is solely so alcRenderSamplesSOFT can be used in a buffer callback,
and other places that need functions to be real-time safe. The verification
requires locking to access the device list, which isn't allowed in a real-time
callback.
It will not be called while the device is running. If the first call succeeds,
a subsequent call that happens to fail must leave the existing device state as
it was so it can be resumed.
This is a rough first pass. It will fail when trying to re-open the same device
which can only be opened once (for instance, with direct hardware access, on
hardware that doesn't do its own mixing). Some backends won't guarantee the new
device is usable until the reset() or start() call.
This is mostly for the SampleConverter, used by some capture backends. When
recording at really low rates, like 5512hz, with a device capturing at a higher
rate like 44100hz or 48000hz, it hits the filter's downscaling limit and
produces pure silence.
In such cases, it's better to just accept some aliasing noise so that the app
will still get some recognizable audio. The alternative would be to scale the
desired rate by 2x, 3x, etc until it's above the bsinc limit, then take every
2nd, 3rd, etc sample of the result as if by an extra simpler resampler pass.