106 Commits

Author SHA1 Message Date
Richard Stanway
82b5a39ea4 libobs: Mark raw_active and gpu_encoder_active as volatile
These were operated on by atomic functions but were not marked as
volatile or loaded with os_atomic_load_long, potentially introducing
subtle race conditions. Detected by ThreadSanitizer.
2022-01-18 03:49:20 -08:00
wangshaohui
bb59dfd060 libobs: Move position for calling execute_graphics_tasks
The new order is as below:
[message pump]
output_frame
render_displays
execute_graphics_tasks
2021-12-31 17:34:20 -08:00
jp9000
8b3416c1e7 libobs: Implement deferred destruction of sources
(This also modifies the UI)

The purpose of deferring destruction of sources is to ensure that:
1.) Hard locks from enumeration cannot occur with source destruction.
  For example, if the browser source is destroyed while in the graphics
  thread, the browser thread would wait for the graphics thread, but the
  graphics thread would still be waiting for the browser thread, causing
  a hard lock.
2.) When destroys occur during source enumeration, that the integrity of
  the context's next pointer in the linked list can no longer be
  compromised
3.) Source releases are fully asynchronous rather than having the risk
  of stalling the calling thread
4.) We can wait for source destruction when switching scene collections
  or when shutting down rather than hoping for threads to be finished
  with sources.

This introduces a new requirement when cleaning up scene/source data:
the obs_wait_for_destroy_queue() function. It is highly recommended that
this function be called after cleaning up sources. It will return true
if at least one or more sources were destroyed. Otherwise it will return
false. Forks are highly advised to call this function manually on source
cleanup -- preferably in a loop, in conjunction with processing
outstanding OBS signals and UI events.
2021-12-19 11:53:19 -08:00
jp9000
408ce92146 Revert "libobs: Do not release while traversing sources for tick"
This reverts commit 080090c40e81bea82754ffccbda2f625eed64e06.
2021-12-19 11:25:57 -08:00
jp9000
080090c40e libobs: Do not release while traversing sources for tick
obs_source_release should not be called while iterating through the
global sources linked list, otherwise the linked list will be
compromised. Annoying.

Basically the same fix as obsproject/obs-studio#5600, but should be
slightly more optimal and a bit more explicit.
2021-12-15 11:57:52 -08:00
jp9000
b2c09d3523 libobs: Fix potentially unsafe linked list traversal
Fixes an issue pointed out in obsproject/obs-browser#333 where a source
may destroy the next source in obs_source_video_tick(), thus
invalidating the next source in the linked list. Get the next source in
the list *after* calling obs_source_video_tick() rather than before.

Closes obsproject/obs-studio#5600
2021-12-14 10:34:53 -08:00
jpark37
71bd5860ce libobs: Final downsample with SRGB formats 2021-01-21 07:42:57 -08:00
jpark37
6aa50b3ef1 libobs: Use autoreleasepool for graphics thread
Apparently necessary to clean up macOS leaks.
2020-06-08 13:16:35 -07:00
jpark37
5734ab7a9b libobs: WinRT and dispatcher init on graphics thread
Suspected necessary for WGC stability.
2020-05-21 09:25:54 -07:00
jp9000
8de20ab3be libobs: Add task scheduling features
(This commit also modifies the UI)

Adds the ability to schedule tasks for certain threads
2020-03-14 10:54:37 -07:00
jpark37
3703581472 libobs: Pump graphics loop one final time for cleanup 2020-02-23 19:43:10 -08:00
jpark37
82cdc6e8c6 libobs: Pump Win32 messages on the graphics thread
Necessary for upcoming Windows Graphics Capture support.
2020-02-22 21:02:33 -08:00
jpark37
ade65df2aa libobs: Add gs_begin_frame for duplicators
We really shouldn't be resetting duplicator state as part of gs_flush.
gs_begin_scene is not ideal because it is called twice per frame, and
only after duplicators have been ticked. Even though it makes no
user-facing difference, it makes more logical sense to reset at the top
of the frame than the bottom.
2019-10-10 21:06:01 -07:00
jpark37
42bf026a49 libobs: Fix video warnings 2019-08-30 22:13:03 -07:00
jpark37
0ea820b277 libobs: UI: Add Area scaling for downscale output
Now that Lanczos downscale blurring has been removed, the Area shader
can attempt to fill the void.
2019-08-14 22:33:52 -07:00
jpark37
bdd8d64053 libobs: Separate textures for YUV input
The shaders to unpack YUV information from the same texture were rather
complicated. Breaking them up into separate textures makes the shaders
much simpler, and we can remove the PRECISION_OFFSET hack.

Performance also gets a nice boost on Intel for planar textures.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080

UYVY: 473 us -> 457 us
YUY2: 492 us -> 422 us
YVYU: 491 us -> 441 us
I420: 1637 us -> 505 us
I422: 1644 us -> 482 us
I444: 1653 us -> 504 us
NV12: 1656 us -> 369 us
Y800 (limited): 270 us -> 277 us
Y800 (full): 263 us -> 289 us
RGB (limited): 341 us -> 411 us
BGR3 (limited): 512 us -> 509 us
BGR3 (full): 527 us -> 534 us
2019-08-09 21:14:29 -07:00
jpark37
9aacc99b3e libobs: Separate textures for YUV output, fix chroma
The shaders to pack YUV information into the same texture were rather
complicated and suffering precision issues. Breaking them up into
separate textures makes the shaders much simpler and avoids having to
compute large integer offsets. Unfortunately, the code to handle
multiple textures is not as pleasant, but at least the NV12 rendering
path is no longer separate.

In addition, write chroma samples to "standard" offsets. For I444,
there's no difference, but I420/NV12 formats now have chroma shifted to
the left as 4:2:0 is shown in the H.264 specification.

Intel GPA, SetStablePowerState, Intel HD Graphics 530

Expect speed incrase:
I420: 844 us -> 493 us (254 us + 190 us + 274 us)
I444: 837 us -> 747 us (258 us + 276 us + 272 us)
NV12: 450 us -> 368 us (319 us + 168 us)

Expect no change:
NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us)
RGB: 359 us -> 387 us

Fixes https://obsproject.com/mantis/view.php?id=624
Fixes https://obsproject.com/mantis/view.php?id=1512
2019-07-26 23:21:41 -07:00
jpark37
2656bf0a90 libobs: Rework RGB to YUV conversion
RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12

1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us

1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us

1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us

1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
2019-07-22 01:12:35 -07:00
jpark37
e5b004fd48 libobs: Remove YUV transformation on CPU
This code path does not appear to be used. Breakpoint-inspected all four
output formats I420/I444/NV12/RGB, and they are all behaving as they
should.
2019-07-22 01:12:01 -07:00
jpark37
85cc7c84bc libobs: obs-filters: Area upscale shader
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.

Also add "base_dimension" uniform to avoid unnecessary division.

Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
2019-07-17 21:11:18 -07:00
jp9000
70ecbcd5d4 libobs: Add obs_get_frame_interval_ns
Returns the current video frame interval between frames, in nanoseconds.
2019-07-07 16:38:21 -07:00
jp9000
f53df7da64 clang-format: Apply formatting
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed.  Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
2019-06-23 23:49:10 -07:00
James Park
aa22b61e3e libobs: Full-screen triangle format conversions
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.

A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.

Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.

This can be applied to more situations, but start small first.

Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
2019-06-18 22:29:07 -07:00
James Park
8d6ed988e6 libobs: Remove unnecessary frame pipelining
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.

(Note: GPU timings from Intel GPA are volatile.)

NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
2019-05-24 01:03:21 -07:00
James Park
ba21fb947e libobs: Fix various alpha issues
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.

To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.

"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.

One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.

For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.

This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.

obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.

Fixes https://obsproject.com/mantis/view.php?id=1108
2019-05-08 20:26:52 -07:00
jp9000
f109d1c2bf Revert "libobs: libobs-d3d11: obs-filters: No excess alpha"
This reverts commit d91bd327d7a8bb4597562fc26da4edb7b56874ff, which
broke alpha with sources, scenes, and filter, causing them all to become
opaque unintentionally.
2019-04-25 08:36:41 -07:00
Jim
9480cc4fd2
Merge pull request #1675 from admshao/clear-linux-compiling-warnings
Clear linux compiling warnings
2019-04-14 04:21:21 -07:00
Jim
f1399b6d18
Merge pull request #1765 from jpark37/blend-alpha
libobs: libobs-d3d11: Fix blend alpha overflow
2019-04-14 00:22:20 -07:00
James Park
21f4dd63d4 libobs: UI: Use graphics debug markers
Add D3D/GL debug markers to make RenderDoc captures easier to tranverse.

Also add obs_source_get_name_no_null() to avoid boilerplate for safe
string formatting.

Closes obsproject/obs-studio#1799
2019-04-08 02:05:37 -07:00
James Park
d91bd327d7 libobs: libobs-d3d11: obs-filters: No excess alpha
Currently SrcBlendAlpha and DestBlendAlpha are both ONE, and can
combine together to form two. This is not a noticeable problem for
UNORM targets because the channels are clamped, but it will likely
become a problem if FLOAT targets are more widely used.

This change switches DestBlendAlpha to INVSRCALPHA, and starts
backgrounds as opaque black instead of transparent black. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
2019-04-07 18:16:56 -07:00
Shaolin
721302bf00 libobs: Clear all compiler warnings 2019-03-29 06:29:04 -03:00
jp9000
2f90bcf684 libobs: Fix frame not being cleared
Fixes the remaining case where a frame from the previous
recording/stream could show up at the beginning of the next
recording/stream on the same running session when using the new version
of NVENC.  Textures are being converted for both raw and texture-based
encoders, so this variable which determines whether a texture is ready
and has been converted should be cleared in both cases.
2019-03-12 12:54:47 -07:00
jp9000
cd3d64215e libobs: Fix first frame when output restarted
When all outputs stop, and then the output starts back up again at a
later point after that, the last frame data or two from the previous
output session would end up as the first frame or two of the proceeding
output.  This was because certain rendering variables were not being
properly cleared when a new output starts back up.
2019-03-05 19:20:18 -08:00
jp9000
7f01fee8c2 libobs: Always query shared texture handle for encoding
Always query the texture's shared handle in case the texture had to be
rebuilt from a driver crash.
2019-03-04 04:54:25 -08:00
jp9000
573197af5b libobs: Fix race conditions
Uses obs_source_get_ref on the sources enumerated in the tick_sources
function in obs-video.c to ensure a reference has been incremented
before calling that source's video_tick, and replaces an
obs_source_addref with obs_source_get_ref in the push_audio_tree
function in obs-audio.c to ensure that it cannot increment a source that
has already decremented its reference to 0.
2019-02-12 19:23:24 -08:00
jp9000
d416f781fd libobs: Fix crash starting raw encoder before gpu encoder
Fixes a crash when starting a raw encoder before a GPU encoder.
2019-02-10 22:22:31 -08:00
jp9000
93ba6e7128 libobs: Add texture-based encoding support
Allows the ability to encode by passing NV12 textures.  This uses a
separate thread for texture-based encoders with a small queue of
textures.  An output texture with a keyed mutex shared texture is locked
between OBS and each encoder.  A new encoder callback and capability
flag is used to encode with textures.
2019-02-07 17:00:46 -08:00
jp9000
28d0cc8b97 libobs: Use NV12 textures when available 2019-02-07 17:00:46 -08:00
Colin Edwards
19bc92d267 Decklink: Keyer support 2019-01-04 17:34:00 -06:00
jp9000
45b5291530 libobs: Deactivate unnecessary GPU ops when not encoding
Reduces GPU usage when encoding is not active.  Does not perform color
conversion, frame staging, or frame downloading unless encoding is
explicitly active.
2018-04-23 08:14:18 -07:00
jp9000
0ffc9bbf05 libobs: Add video tick callback functions
Allows the ability to have a callback invoked every time video ticks.
Particularly useful for scripting.
2018-01-03 17:03:57 -08:00
jp9000
7f6cf97bd7 libobs: Add obs_render_main_texture
(Note: This commit also modifies UI and test)

This makes it so that main preview panes are rendered with the main
output texture rather than re-rendering the main view.  The view will
render all objects again, whereas the output texture will be a single
texture render of the same exact thing.

Also fixes some abnormal artifacting when scaling the main preview pane.
2018-01-01 18:52:47 -08:00
jp9000
2c58185af3 libobs: Rename obs_video_thread to obs_graphics_thread
This is to prevent confusion with video_thread in
libobs/media-io/video-io.c, which is used exclusively for video
encoding/output.  Also prevents confusion in the profiler log data.
2017-10-28 00:22:03 -07:00
jp9000
3ea23320b8 libobs: Initialize randomization seed in video thread
Ensures that any rand() calls in the video thread will have a unique
seed to start from.
2017-10-03 18:48:56 -07:00
Palana
9ce9c35b0d libobs: Fix texture_ready feedback for CPU conversion path 2017-09-13 16:39:27 +02:00
jp9000
cb9a478821 libobs: Add function to get average render time
Useful for real-time rendering statistics
2017-05-13 01:21:16 -07:00
jp9000
ad57aa1520 libobs: Add function to allow custom output drawing
Optionally allows drawing directly to the primary output instead of
having to use a source to draw.
2017-05-06 11:29:29 -07:00
jp9000
95ce556051 libobs: Add obs_get_active_fps function
Allows getting the current active framerate that the core is rendering
with.  This takes in to account any rendering lag or stalls that may be
occurring.
2016-08-22 12:05:57 -07:00
jp9000
d49833830c libobs: Add ability to use scale filters on scene items
Allows the ability to use scale filters such as point, bicubic, lanczos
on specific scene items, disabled by default.  When using one of the
latter two options, if the item's scale is under half of the source's
original size, it uses the bilinear low resolution downscale shader
instead.
2016-06-29 08:00:54 -07:00
jp9000
a5c9350be5 libobs: Remove "presentation volume" and "base volume" (skip)
(Note: This commit breaks libobs compilation.  Skip if bisecting)

These variables are considered obsolete and will no longer be needed.
2016-01-26 11:49:32 -08:00