The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
Similar to item_visible, this event fires whenever a scene item is
locked or unlocked. This allows the UI and libobs to remain in sync
regarding scene elements' statuses.
Switch for loop to do/while because we know the condition is always
true for the first loop.
Replace int math with float math to play nicely with more GPUs.
Add variables imagesize/targetsize to avoid redundant reciprocals.
Intel GPA results: 1166 -> 836 us
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.
(Note: GPU timings from Intel GPA are volatile.)
NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
Normally, paired encoders are unpaired when they stop. However, if the
pairing occurs before the encoders actually start, and the encoders
never actually end up starting, they are never unpaired, and that
pairing stays with them until the next time an output is started up
again. That in turn can cause an output that uses one of the encoders
but not the other to not function correctly, and neither properly
"start" nor stop because the data is queued continually in the
interleaved packet array.
For example, let's say there are two outputs, two video encoders, and
one audio encoder. This can be reproduced by using advanced output mode
and making the two outputs use separate video encoders while sharing
track 1's audio encoder. If you start up the stream output first and it
fails to fully connect for whatever reason (bad server, bad stream key,
etc), then you start up the recording output, the recording output will
appear to be running, but will not stop when you hit "stop recording".
It will stay perpetually on "stopping recording" and will get stuck that
way. This is because when the streaming output started, the streaming
output would initially pair video encoder A with audio encoder A before
the encoders actually fully started up (as the encoders do not fully
start up until a connection is successfully made), and when the
recording output starts up after that disconnection, audio encoder A
will wait for video encoder A rather than video encoder B because that
pairing was never actually cleared.
So, instead of pairing encoders when the output starts, wait until the
encoders themselves are being started and then pair the encoders at that
point in time. This ensures that the encoders start up and will clear
their pairing when no longer in use.
A previous refactoring to make DrawMatrix unnecessary has left behind
unreachable YUV conversions. Even if this code was somehow reachable,
DrawMatrix for YUV -> RGB doesn't exist anymore, so they would render
incorrectly anyway.
(This commit also modifies the UI, obs-ffmpeg, and obs-output modules)
Fixes a long-time regression where the program would lock up if an
encode call fails. Shuts down all outputs associated with the failing
encoder and displays an error message to the user.
Ideally, it would be best if a more detailed error could be displayed to
the user about the nature of the error, though the primary problem is
the encoder errors are typically not something the user would be able to
understand. The current message is a bit of a generic error message;
improvement is welcome.
Another suggestion is to try to have the encoder restart seamlessly,
though it would take a significant amount of work to be able to make it
do something like that properly, and it sort of assumes that encoder
failures are sporadic, which may not necessarily be the case with some
hardware encoders on some systems. It may be better just to use another
encoder in that case. For now, seamless restart is ruled out.
The issue with the current bilinear_lowres_scale effect is that it
samples adjacent texels, disregarding the texel-to-pixel ratio. If the
ratio is large, this can lead to aliasing. This change provides a fair
set of texture samples across the entire pixel.
The 8-sample pattern used here comes from Direct3D.
This commit adds a function to forcefully stop a transition, and to
increment/decrement the showing counter for a source with the MAIN_VIEW
type.
These functions are needed for the transition previews to work as
intended.
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.
To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.
"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.
One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.
For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.
This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.
Fixes https://obsproject.com/mantis/view.php?id=1108
Ensures that functions loaded by `os_dlsym()` come only from the
specified library that was loaded with `os_dlopen()` rather than the set
of libraries loaded by the specified library.
libobs: Add support for limited to full color range conversions when
using RGB or Y800 formats, and move RGB converison for Y800 formats to
the GPU.
decklink: Stop hiding color space/range properties for RGB formats, and
remove "YUV" from "YUV Color Space" and "YUV Color Range".
win-dshow: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
UI: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
Fixes handling of the `obs_source_frame::full_range` member variable,
which is often set to false by default by many plugins even when using
RGB, which would cause RGB to be marked as "partial range". This change
is crucial for when partial range RBG support is implemented.
Adds `obs_source_frame2` structure that replaces the `full_range` member
variable with a `range` variable, which uses the `video_range_type` enum
to allow handling default range values. This member variable treats
VIDEO_RANGE_DEFAULT as full range if the format is RGB, and partial
range if the format is YUV.
Also adds `obs_source_output_video2` and `obs_source_preload_video2`
functions which use the `obs_source_frame2` structure instead of the
`obs_source_frame` structure.
When using the original `obs_source_frame`, `obs_source_output_video`,
and `obs_source_preload_video` functions, RGB will always be full range
by default for backward compatibility purposes.
This reverts commit d91bd327d7a8bb4597562fc26da4edb7b56874ff, which
broke alpha with sources, scenes, and filter, causing them all to become
opaque unintentionally.
The line drawing functions previously assumed the upper-left 3x3 for
box_transform only held scale. The matrix can also hold rotation, so
pass in scale separately.
Fixes https://obsproject.com/mantis/view.php?id=1442
Property groups allow multiple properties to be combined into a single
visual group and thus can be used to reduce visual clutter and improve
user experience by giving additional context to a property. The name of
a child of a group is not affected by the group and thus all property
names must still be unique.
A new parent field has been added to obs_properties_t to ease the
tracking of existing properties. That way properties inside a group will
also be able to verify that they have a unique name.
Currently several shaders need "DrawMatrix" techniques to support the
possibility that the input texture is a "YUV" format. Also, "DrawMatrix"
is overloaded for translation in both directions when it is written for
RGB to "YUV" only.
A cleaner solution is to handle "YUV" to RGB up-front as part of format
conversion, and ensure only RGB inputs reach the other shaders. This is
necessary to someday perform correct scale filtering without the cost of
redundant "YUV" conversions per texture tap.
A necessary prerequisite for this is to add conversion support for
VIDEO_FORMAT_I444, and that is now in place. There was already a hack in
place to cover VIDEO_FORMAT_Y800. All other "YUV" formats already have
conversion functions.
"DrawMatrix" has been removed from shaders that only supported "YUV" to
RGB conversions. It still exists in shaders that perform RGB to "YUV"
conversions, and the implementations have been sanitized accordingly.
Adds more log information for effects, such as the reformatted effect code, parameters, techniques and passes. This is useful for tracking down issues and what actually ends up being compiled in the end, without having to guess where the error is actually at, as what is compiled and what was originally there usually differ by a lot.
This information is only logged if libobs is compiled in debug and _DEBUG_SHADERS is defined and not compiled if either of those are not the case.
Add D3D/GL debug markers to make RenderDoc captures easier to tranverse.
Also add obs_source_get_name_no_null() to avoid boilerplate for safe
string formatting.
Closesobsproject/obs-studio#1799