RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.
Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12
1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us
1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us
1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us
1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
There are devices like the GV-USB2 that produce frames with smmoth
timestamps at an uneven pace, which causes OBS to stutter because the
unbuffered path is designed to aggressively operate on the latest frame.
We can make the unbuffered path work by making two adjustments:
- Don't discard the current frame until it has elapsed.
- Don't skip frames in the queue until they have elapsed.
The buffered path still has problems with deinterlacing GV-USB2 output,
but the unbuffered path is better anyway.
Testing:
GV-USB2, Unbuffered: Stuttering is gone!
GV-USB2, Buffered: No regression (still broken).
SC-512N1-L/DVI, Unbuffered: No regression (still works).
SC-512N1-L/DVI, Buffered: No regression (still works).
It's a waste of GPU time to do two fullscreen passes to render final mix
previews. Use blend states to simulate the black background of
DrawBackdrop() for the following situations:
- Main preview window (Studio Mode off)
- Studio Mode: Program
This does not effect:
- Studio Mode: Preview (still uses DrawBackdrop)
- Fullscreen Projector (uses GPU clear to black)
- Windowed Projector (uses GPU clear to black)
intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080
Before:
DrawBackdrop: ~529 us
main texture: ~367 us (Cheaper than drawing a black quad?)
After:
[DrawBackdrop optimized away]
main texture: ~383 us
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.
Also add "base_dimension" uniform to avoid unnecessary division.
Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
Unlike get_properties, there is not reason to not call get_defaults if it is
given in addition to get_defaults2. Additonally this fixes the bug with
'init_encoder' which would only ever call get_defaults, resulting in broken
encoders if those used get_defaults2.
This implements pausing of outputs. To accomplish this, raw audio/video
data is halted to the encoders or raw output. Pausing is as precisely
timed as possible according to the timing of the obs_output_pause call,
and audio data will be spliced down to the exact audio sample in
accordance to that timing at the start/end marks.
Outputs that support this (outputs used for recording) can set the
OBS_OUTPUT_CAN_PAUSE capability flag.
If the audio subsystem was buffered to any extent, the audio of a raw
output would start off at a negative offset, requiring each raw output
to implement a "prepare_audio" function (as seen in the FFmpeg output)
in order to ensure proper synchronization with video. This did not
apply to encoded outputs because it was already being performed by the
obs-encoder code.
If an audio source does not provide enough data at a steady pace, the
timestamp update does not happen, and buffering increases until it
maxes out. To counteract this, update the timestamp anyway.
Another issue for decoupled audio sources is that timing is not
adjusted for divergence from system time. Making this adjustment is
better for timing stability.
5+ hours of stable audio without any buffering on my GV-USB2 where it
used to add 21ms every 5 mintues or so.
Fixes https://obsproject.com/mantis/view.php?id=1269
Fix ternary test to use BGRX render targets for YUV to RGB
conversions. The previous behavior may have been fine though since
the shaders fill the alpha channel with 1.0 anyway.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
In 'libobs/CMakeLists.txt', use '${CMAKE_INSTALL_LIBDIR}' instead of
'${CMAKE_INSTALL_PREFIX}/lib', as the latter results into 'libobs.pc'
being installed under '/lib' when '/lib64' would be more appropriate.
In 'libobs/libobs.pc.in', use '@CMAKE_INSTALL_FULL_LIBDIR@' for
'libdir', '@CMAKE_INSTALL_FULL_INCLUDEDIR@' for 'includedir',
and '@CMAKE_INSTALL_PREFIX@' for 'prefix'.
Gentoo-Bug: https://bugs.gentoo.org/644538
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
Similar to item_visible, this event fires whenever a scene item is
locked or unlocked. This allows the UI and libobs to remain in sync
regarding scene elements' statuses.
Switch for loop to do/while because we know the condition is always
true for the first loop.
Replace int math with float math to play nicely with more GPUs.
Add variables imagesize/targetsize to avoid redundant reciprocals.
Intel GPA results: 1166 -> 836 us
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.
(Note: GPU timings from Intel GPA are volatile.)
NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
Normally, paired encoders are unpaired when they stop. However, if the
pairing occurs before the encoders actually start, and the encoders
never actually end up starting, they are never unpaired, and that
pairing stays with them until the next time an output is started up
again. That in turn can cause an output that uses one of the encoders
but not the other to not function correctly, and neither properly
"start" nor stop because the data is queued continually in the
interleaved packet array.
For example, let's say there are two outputs, two video encoders, and
one audio encoder. This can be reproduced by using advanced output mode
and making the two outputs use separate video encoders while sharing
track 1's audio encoder. If you start up the stream output first and it
fails to fully connect for whatever reason (bad server, bad stream key,
etc), then you start up the recording output, the recording output will
appear to be running, but will not stop when you hit "stop recording".
It will stay perpetually on "stopping recording" and will get stuck that
way. This is because when the streaming output started, the streaming
output would initially pair video encoder A with audio encoder A before
the encoders actually fully started up (as the encoders do not fully
start up until a connection is successfully made), and when the
recording output starts up after that disconnection, audio encoder A
will wait for video encoder A rather than video encoder B because that
pairing was never actually cleared.
So, instead of pairing encoders when the output starts, wait until the
encoders themselves are being started and then pair the encoders at that
point in time. This ensures that the encoders start up and will clear
their pairing when no longer in use.
A previous refactoring to make DrawMatrix unnecessary has left behind
unreachable YUV conversions. Even if this code was somehow reachable,
DrawMatrix for YUV -> RGB doesn't exist anymore, so they would render
incorrectly anyway.
(This commit also modifies the UI, obs-ffmpeg, and obs-output modules)
Fixes a long-time regression where the program would lock up if an
encode call fails. Shuts down all outputs associated with the failing
encoder and displays an error message to the user.
Ideally, it would be best if a more detailed error could be displayed to
the user about the nature of the error, though the primary problem is
the encoder errors are typically not something the user would be able to
understand. The current message is a bit of a generic error message;
improvement is welcome.
Another suggestion is to try to have the encoder restart seamlessly,
though it would take a significant amount of work to be able to make it
do something like that properly, and it sort of assumes that encoder
failures are sporadic, which may not necessarily be the case with some
hardware encoders on some systems. It may be better just to use another
encoder in that case. For now, seamless restart is ruled out.
The issue with the current bilinear_lowres_scale effect is that it
samples adjacent texels, disregarding the texel-to-pixel ratio. If the
ratio is large, this can lead to aliasing. This change provides a fair
set of texture samples across the entire pixel.
The 8-sample pattern used here comes from Direct3D.
This commit adds a function to forcefully stop a transition, and to
increment/decrement the showing counter for a source with the MAIN_VIEW
type.
These functions are needed for the transition previews to work as
intended.