(Note: This commit breaks libobs compilation. Skip if bisecting)
This variable is somewhat redundant. Volume is already known/accessible
to front-ends.
(Note: This commit breaks libobs compilation. Skip if bisecting)
Uses a callback and allows the caller to mix audio. Additionally,
allows the callback to return audio later, allowing it to buffer as much
as it needs.
The skipped frame count (dropped frames due to encoding being
overloaded) would erroneously include lagged frames (dropped frames due
to render stalls). This will make diagnosing actual issues a user might
be having a bit easier.
With certain devices (AVerMedia C985 and LGP), audio timestamps are
bad, and a 50ms threshold of audio data "smoothing" (making consecutive
audio packets seamless with one another) isn't enough to handle bad
consecutive timestamp values. After testing, 70ms sufficiently solves
the issue.
This improves logging for when audio data insertion is way out of bounds
or is getting cut off in the front due to a bad negative sync offset.
Instead of throwing out a log message for every time this happens with
each piece of data, it now states when the out of bounds or cutoff has
started and stopped only.
This fixes a case where an insertion of audio data would pass
valid_timestamp_range yet the insert position would cause a negative
integer position and thus an unsigned integer overflow.
The minimum and maximum color range values were not being set by the
video_format_get_parameters function when full range was in use; I
assume it's because the expected min/max values of full range is
{0.0, 0.0, 0.0} and {1.0, 1.0, 1.0}, so I'm just making it so that it
sets those values if using full range.
Way to replicate the issue (windows):
1.) Create a win-dshow device capture source
2.) Select a device and set it to a YUV color format to enable YUV color
conversion
3.) Select "Full Range" in the color range property.
4.) Restart OBS, the device will then start up, but will display green
due to the fact that the min/max range values were never set, and are
left at their default value of 0.
API changed:
--------------------------
void obs_output_set_audio_encoder(
obs_output_t *output,
obs_encoder_t *encoder);
obs_encoder_t *obs_output_get_audio_encoder(
const obs_output_t *output);
obs_encoder_t *obs_audio_encoder_create(
const char *id,
const char *name,
obs_data_t *settings);
Changed to:
--------------------------
/* 'idx' specifies the track index of the output */
void obs_output_set_audio_encoder(
obs_output_t *output,
obs_encoder_t *encoder,
size_t idx);
/* 'idx' specifies the track index of the output */
obs_encoder_t *obs_output_get_audio_encoder(
const obs_output_t *output,
size_t idx);
/* 'mixer_idx' specifies the mixer index to capture audio from */
obs_encoder_t *obs_audio_encoder_create(
const char *id,
const char *name,
obs_data_t *settings,
size_t mixer_idx);
Overview
--------------------------
This feature allows multiple audio mixers to be used at a time. This
capability was able to be added with surprisingly very little extra
overhead. Audio will not be mixed unless it's assigned to a specific
mixer, and mixers will not mix unless they have an active mix
connection.
Mostly this will be useful for being able to separate out specific audio
for recording versus streaming, but will also be useful for certain
streaming services that support multiple audio streams via RTMP.
I didn't want to use a variable amount of mixers due to the desire to
reduce heap allocations, so currently I set the limit to 4 simultaneous
mixers; this number can be increased later if needed, but honestly I
feel like it's just the right number to use.
Sources:
Sources can now specify which audio mixers their audio is mixed to; this
can be a single mixer or multiple mixers at a time. The
obs_source_set_audio_mixers function sets the audio mixer which an audio
source applies to. For example, 0xF would mean that the source applies
to all four mixers.
Audio Encoders:
Audio encoders now must specify which specific audio mixer they use when
they encode audio data.
Outputs:
Outputs that use encoders can now support multiple audio tracks at once
if they have the OBS_OUTPUT_MULTI_TRACK capability flag set. This is
mostly only useful for certain types of RTMP transmissions, though may
be useful for file formats that support multiple audio tracks as well
later on.
Previously, the design for the interaction between the encoder thread
and the graphics thread was that the encoder thread would signal to the
graphics thread when to start drawing each frame. The original idea
behind this was to prevent mutually cascading stalls of encoding or
graphics rendering (i.e., if rendering took too long, then encoding
would have to catch up, then rendering would have to catch up again, and
so on, cascading upon each other). The ultimate goal was to prevent
encoding from impacting graphics and vise versa.
However, eventually it was realized that there were some fundamental
flaws with this design.
1. Stray frame duplication. You could not guarantee that a frame would
render on time, so sometimes frames would unintentionally be lost if
there was any sort of minor hiccup or if the thread took too long to
be scheduled I'm guessing.
2. Frame timing in the rendering thread was less accurate. The only
place where frame timing was accurate was in the encoder thread, and
the graphics thread was at the whim of thread scheduling. On higher
end computers it was typically fine, but it was just generally not
guaranteed that a frame would be rendered when it was supposed to be
rendered.
So the solution (originally proposed by r1ch and paibox) is to instead
keep the encoding and graphics threads separate as usual, but instead of
the encoder thread controlling the graphics thread, the graphics thread
now controls the encoder thread. The encoder thread keeps a limited
cache of frames, then the graphics thread copies frames in to the cache
and increments a semaphore to schedule the encoder thread to encode that
data.
In the cache, each frame has an encode counter. If the frame cache is
full (e.g., the encoder taking too long to return frames), it will not
cache a new frame, but instead will just increment the counter on the
last frame in the cache to schedule that frame to encode again, ensuring
that frames are on time and reducing CPU usage by lowering video
complexity. If the graphics thread takes too long to render a frame,
then it will add that frame with the count value set to the total amount
of frames that were missed (actual legitimately duplicated frames).
Because the cache gives many frames of breathing room for the encoder to
encode frames, this design helps improve results especially when using
encoding presets that have higher complexity and CPU usage, minimizing
the risk of needlessly skipped or duplicated frames.
I also managed to sneak in what should be a bit of an optimization to
reduce copying of frame data, though how much of an optimization it
ultimately ends up being is debatable.
So to sum it up, this commit increases accuracy of frame timing,
completely removes stray frame duplication, gives better results for
higher complexity encoding presets, and potentially optimizes the frame
pipeline a tiny bit.
In video-io.c, video frames could skip, but what would happen is the
frame's timestamp would repeat for the next frame, giving the next frame
a non-monotonic timestamp, and then jump. This could mess up syncing
slightly when the frame is finally given to an outputs.
70 milliseconds is a bit too high for the default audio timestamp
smoothing threshold. The full range of error thus becomes 140
milliseconds, which is a bit more than necessary to worry about. For
the time being, I feel it may be worth it to try 50 milliseconds.
This Fixes a minor flaw with the API where data had to always be mutable
to be usable by the API.
Functions that do not modify the fundamental underlying data of a
structure should be marked as constant, both for safety and to signify
that the parameter is input only and will not be modified by the
function using it.
Typedef pointers are unsafe. If you do:
typedef struct bla *bla_t;
then you cannot use it as a constant, such as: const bla_t, because
that constant will be to the pointer itself rather than to the
underlying data. I admit this was a fundamental mistake that must
be corrected.
All typedefs that were pointer types will now have their pointers
removed from the type itself, and the pointers will be used when they
are actually used as variables/parameters/returns instead.
This does not break ABI though, which is pretty nice.
Audio that goes below the minimum expecting timing (current time -
buffering time) is automatically removed. However, delayed audio is not
removed regardless of its delay. This puts a hard cap of 6 seconds from
current time that the maximum delay audio can have. This will also
prevent the circular buffer from dynamically growing too large.
Doing timestamp smoothing in obs-source.c is good because timestamps can
typically operate on a different timebase, however, obs-source.c can
also change that time base dynamically (such as with async video and
unexpected timestamp jumps), so in order to ensure that audio is
seamless in the output as well, perform timestamp smoothing in
audio-io.c as well just as an extra precautionary measure.
This is sort of hard to explain: the scale_video_output function was
overwriting the current frame. If scaling was disabled, it would do
nothing, and return success, and all would be well. If it was enabled,
it would then call the scaler, and then replace the contents of the
'data' function parameter with the scaled frame data. The problem with
this is that I was passing video_output::cur_frame directly, which
overwrites its previous value with the scaled frame data. Then if
cur_frame was not updated on time, it would end up trying to scale the
previously scaled image, if that makes sense. it would call the video
scaler with the same from for both the source and destination.
So the simple fix was to simply use a local variable and pass that in as
a parameter to prevent this bug from occurring.
The bug here is that when conversion is active, the source video frame
is initialized with the destination height/width/format instead of the
source height/width/format.
This implements the 'frame skipping' mechanism to forcibly cause frames
to be duplicated in order to reduce encoder complexity so the encoder
can catch up to the video, otherwise it will continue to be
progressively behind and will cause a desync of the video.
Typically, if a user gets this issue, they should turn down their
settings. For the love of god do not tell them that 'frames are
skipping', just tell them that CPU usage is high, and that they should
consider turning down their settings.