There are devices like the GV-USB2 that produce frames with smmoth
timestamps at an uneven pace, which causes OBS to stutter because the
unbuffered path is designed to aggressively operate on the latest frame.
We can make the unbuffered path work by making two adjustments:
- Don't discard the current frame until it has elapsed.
- Don't skip frames in the queue until they have elapsed.
The buffered path still has problems with deinterlacing GV-USB2 output,
but the unbuffered path is better anyway.
Testing:
GV-USB2, Unbuffered: Stuttering is gone!
GV-USB2, Buffered: No regression (still broken).
SC-512N1-L/DVI, Unbuffered: No regression (still works).
SC-512N1-L/DVI, Buffered: No regression (still works).
It's a waste of GPU time to do two fullscreen passes to render final mix
previews. Use blend states to simulate the black background of
DrawBackdrop() for the following situations:
- Main preview window (Studio Mode off)
- Studio Mode: Program
This does not effect:
- Studio Mode: Preview (still uses DrawBackdrop)
- Fullscreen Projector (uses GPU clear to black)
- Windowed Projector (uses GPU clear to black)
intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080
Before:
DrawBackdrop: ~529 us
main texture: ~367 us (Cheaper than drawing a black quad?)
After:
[DrawBackdrop optimized away]
main texture: ~383 us
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.
Also add "base_dimension" uniform to avoid unnecessary division.
Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
Unlike get_properties, there is not reason to not call get_defaults if it is
given in addition to get_defaults2. Additonally this fixes the bug with
'init_encoder' which would only ever call get_defaults, resulting in broken
encoders if those used get_defaults2.
This implements pausing of outputs. To accomplish this, raw audio/video
data is halted to the encoders or raw output. Pausing is as precisely
timed as possible according to the timing of the obs_output_pause call,
and audio data will be spliced down to the exact audio sample in
accordance to that timing at the start/end marks.
Outputs that support this (outputs used for recording) can set the
OBS_OUTPUT_CAN_PAUSE capability flag.
If the audio subsystem was buffered to any extent, the audio of a raw
output would start off at a negative offset, requiring each raw output
to implement a "prepare_audio" function (as seen in the FFmpeg output)
in order to ensure proper synchronization with video. This did not
apply to encoded outputs because it was already being performed by the
obs-encoder code.
If an audio source does not provide enough data at a steady pace, the
timestamp update does not happen, and buffering increases until it
maxes out. To counteract this, update the timestamp anyway.
Another issue for decoupled audio sources is that timing is not
adjusted for divergence from system time. Making this adjustment is
better for timing stability.
5+ hours of stable audio without any buffering on my GV-USB2 where it
used to add 21ms every 5 mintues or so.
Fixes https://obsproject.com/mantis/view.php?id=1269
Fix ternary test to use BGRX render targets for YUV to RGB
conversions. The previous behavior may have been fine though since
the shaders fill the alpha channel with 1.0 anyway.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
Similar to item_visible, this event fires whenever a scene item is
locked or unlocked. This allows the UI and libobs to remain in sync
regarding scene elements' statuses.
Switch for loop to do/while because we know the condition is always
true for the first loop.
Replace int math with float math to play nicely with more GPUs.
Add variables imagesize/targetsize to avoid redundant reciprocals.
Intel GPA results: 1166 -> 836 us
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.
(Note: GPU timings from Intel GPA are volatile.)
NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
Normally, paired encoders are unpaired when they stop. However, if the
pairing occurs before the encoders actually start, and the encoders
never actually end up starting, they are never unpaired, and that
pairing stays with them until the next time an output is started up
again. That in turn can cause an output that uses one of the encoders
but not the other to not function correctly, and neither properly
"start" nor stop because the data is queued continually in the
interleaved packet array.
For example, let's say there are two outputs, two video encoders, and
one audio encoder. This can be reproduced by using advanced output mode
and making the two outputs use separate video encoders while sharing
track 1's audio encoder. If you start up the stream output first and it
fails to fully connect for whatever reason (bad server, bad stream key,
etc), then you start up the recording output, the recording output will
appear to be running, but will not stop when you hit "stop recording".
It will stay perpetually on "stopping recording" and will get stuck that
way. This is because when the streaming output started, the streaming
output would initially pair video encoder A with audio encoder A before
the encoders actually fully started up (as the encoders do not fully
start up until a connection is successfully made), and when the
recording output starts up after that disconnection, audio encoder A
will wait for video encoder A rather than video encoder B because that
pairing was never actually cleared.
So, instead of pairing encoders when the output starts, wait until the
encoders themselves are being started and then pair the encoders at that
point in time. This ensures that the encoders start up and will clear
their pairing when no longer in use.
A previous refactoring to make DrawMatrix unnecessary has left behind
unreachable YUV conversions. Even if this code was somehow reachable,
DrawMatrix for YUV -> RGB doesn't exist anymore, so they would render
incorrectly anyway.
(This commit also modifies the UI, obs-ffmpeg, and obs-output modules)
Fixes a long-time regression where the program would lock up if an
encode call fails. Shuts down all outputs associated with the failing
encoder and displays an error message to the user.
Ideally, it would be best if a more detailed error could be displayed to
the user about the nature of the error, though the primary problem is
the encoder errors are typically not something the user would be able to
understand. The current message is a bit of a generic error message;
improvement is welcome.
Another suggestion is to try to have the encoder restart seamlessly,
though it would take a significant amount of work to be able to make it
do something like that properly, and it sort of assumes that encoder
failures are sporadic, which may not necessarily be the case with some
hardware encoders on some systems. It may be better just to use another
encoder in that case. For now, seamless restart is ruled out.
The issue with the current bilinear_lowres_scale effect is that it
samples adjacent texels, disregarding the texel-to-pixel ratio. If the
ratio is large, this can lead to aliasing. This change provides a fair
set of texture samples across the entire pixel.
The 8-sample pattern used here comes from Direct3D.
This commit adds a function to forcefully stop a transition, and to
increment/decrement the showing counter for a source with the MAIN_VIEW
type.
These functions are needed for the transition previews to work as
intended.
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.
To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.
"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.
One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.
For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.
This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.
Fixes https://obsproject.com/mantis/view.php?id=1108