85 Commits

Author SHA1 Message Date
jp9000
f53df7da64 clang-format: Apply formatting
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed.  Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
2019-06-23 23:49:10 -07:00
James Park
aa22b61e3e libobs: Full-screen triangle format conversions
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.

A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.

Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.

This can be applied to more situations, but start small first.

Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
2019-06-18 22:29:07 -07:00
James Park
8d6ed988e6 libobs: Remove unnecessary frame pipelining
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.

(Note: GPU timings from Intel GPA are volatile.)

NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
2019-05-24 01:03:21 -07:00
James Park
ba21fb947e libobs: Fix various alpha issues
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.

To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.

"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.

One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.

For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.

This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.

obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.

Fixes https://obsproject.com/mantis/view.php?id=1108
2019-05-08 20:26:52 -07:00
jp9000
f109d1c2bf Revert "libobs: libobs-d3d11: obs-filters: No excess alpha"
This reverts commit d91bd327d7a8bb4597562fc26da4edb7b56874ff, which
broke alpha with sources, scenes, and filter, causing them all to become
opaque unintentionally.
2019-04-25 08:36:41 -07:00
Jim
9480cc4fd2
Merge pull request #1675 from admshao/clear-linux-compiling-warnings
Clear linux compiling warnings
2019-04-14 04:21:21 -07:00
Jim
f1399b6d18
Merge pull request #1765 from jpark37/blend-alpha
libobs: libobs-d3d11: Fix blend alpha overflow
2019-04-14 00:22:20 -07:00
James Park
21f4dd63d4 libobs: UI: Use graphics debug markers
Add D3D/GL debug markers to make RenderDoc captures easier to tranverse.

Also add obs_source_get_name_no_null() to avoid boilerplate for safe
string formatting.

Closes obsproject/obs-studio#1799
2019-04-08 02:05:37 -07:00
James Park
d91bd327d7 libobs: libobs-d3d11: obs-filters: No excess alpha
Currently SrcBlendAlpha and DestBlendAlpha are both ONE, and can
combine together to form two. This is not a noticeable problem for
UNORM targets because the channels are clamped, but it will likely
become a problem if FLOAT targets are more widely used.

This change switches DestBlendAlpha to INVSRCALPHA, and starts
backgrounds as opaque black instead of transparent black. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
2019-04-07 18:16:56 -07:00
Shaolin
721302bf00 libobs: Clear all compiler warnings 2019-03-29 06:29:04 -03:00
jp9000
2f90bcf684 libobs: Fix frame not being cleared
Fixes the remaining case where a frame from the previous
recording/stream could show up at the beginning of the next
recording/stream on the same running session when using the new version
of NVENC.  Textures are being converted for both raw and texture-based
encoders, so this variable which determines whether a texture is ready
and has been converted should be cleared in both cases.
2019-03-12 12:54:47 -07:00
jp9000
cd3d64215e libobs: Fix first frame when output restarted
When all outputs stop, and then the output starts back up again at a
later point after that, the last frame data or two from the previous
output session would end up as the first frame or two of the proceeding
output.  This was because certain rendering variables were not being
properly cleared when a new output starts back up.
2019-03-05 19:20:18 -08:00
jp9000
7f01fee8c2 libobs: Always query shared texture handle for encoding
Always query the texture's shared handle in case the texture had to be
rebuilt from a driver crash.
2019-03-04 04:54:25 -08:00
jp9000
573197af5b libobs: Fix race conditions
Uses obs_source_get_ref on the sources enumerated in the tick_sources
function in obs-video.c to ensure a reference has been incremented
before calling that source's video_tick, and replaces an
obs_source_addref with obs_source_get_ref in the push_audio_tree
function in obs-audio.c to ensure that it cannot increment a source that
has already decremented its reference to 0.
2019-02-12 19:23:24 -08:00
jp9000
d416f781fd libobs: Fix crash starting raw encoder before gpu encoder
Fixes a crash when starting a raw encoder before a GPU encoder.
2019-02-10 22:22:31 -08:00
jp9000
93ba6e7128 libobs: Add texture-based encoding support
Allows the ability to encode by passing NV12 textures.  This uses a
separate thread for texture-based encoders with a small queue of
textures.  An output texture with a keyed mutex shared texture is locked
between OBS and each encoder.  A new encoder callback and capability
flag is used to encode with textures.
2019-02-07 17:00:46 -08:00
jp9000
28d0cc8b97 libobs: Use NV12 textures when available 2019-02-07 17:00:46 -08:00
Colin Edwards
19bc92d267 Decklink: Keyer support 2019-01-04 17:34:00 -06:00
jp9000
45b5291530 libobs: Deactivate unnecessary GPU ops when not encoding
Reduces GPU usage when encoding is not active.  Does not perform color
conversion, frame staging, or frame downloading unless encoding is
explicitly active.
2018-04-23 08:14:18 -07:00
jp9000
0ffc9bbf05 libobs: Add video tick callback functions
Allows the ability to have a callback invoked every time video ticks.
Particularly useful for scripting.
2018-01-03 17:03:57 -08:00
jp9000
7f6cf97bd7 libobs: Add obs_render_main_texture
(Note: This commit also modifies UI and test)

This makes it so that main preview panes are rendered with the main
output texture rather than re-rendering the main view.  The view will
render all objects again, whereas the output texture will be a single
texture render of the same exact thing.

Also fixes some abnormal artifacting when scaling the main preview pane.
2018-01-01 18:52:47 -08:00
jp9000
2c58185af3 libobs: Rename obs_video_thread to obs_graphics_thread
This is to prevent confusion with video_thread in
libobs/media-io/video-io.c, which is used exclusively for video
encoding/output.  Also prevents confusion in the profiler log data.
2017-10-28 00:22:03 -07:00
jp9000
3ea23320b8 libobs: Initialize randomization seed in video thread
Ensures that any rand() calls in the video thread will have a unique
seed to start from.
2017-10-03 18:48:56 -07:00
Palana
9ce9c35b0d libobs: Fix texture_ready feedback for CPU conversion path 2017-09-13 16:39:27 +02:00
jp9000
cb9a478821 libobs: Add function to get average render time
Useful for real-time rendering statistics
2017-05-13 01:21:16 -07:00
jp9000
ad57aa1520 libobs: Add function to allow custom output drawing
Optionally allows drawing directly to the primary output instead of
having to use a source to draw.
2017-05-06 11:29:29 -07:00
jp9000
95ce556051 libobs: Add obs_get_active_fps function
Allows getting the current active framerate that the core is rendering
with.  This takes in to account any rendering lag or stalls that may be
occurring.
2016-08-22 12:05:57 -07:00
jp9000
d49833830c libobs: Add ability to use scale filters on scene items
Allows the ability to use scale filters such as point, bicubic, lanczos
on specific scene items, disabled by default.  When using one of the
latter two options, if the item's scale is under half of the source's
original size, it uses the bilinear low resolution downscale shader
instead.
2016-06-29 08:00:54 -07:00
jp9000
a5c9350be5 libobs: Remove "presentation volume" and "base volume" (skip)
(Note: This commit breaks libobs compilation.  Skip if bisecting)

These variables are considered obsolete and will no longer be needed.
2016-01-26 11:49:32 -08:00
jp9000
726163aa29 libobs: Report lost frame count due to rendering lag
This has been missing for a bit too long, and should make it
easier/faster to diagnose issues users might be having.
2016-01-25 17:29:09 -08:00
jp9000
a702d88c25 libobs: Don't track active transitions
This was originally used for calculating audio volume if transitions
were active, but transitions won't work that way so tracking the active
transitions is no longer needed.
2015-12-22 06:18:19 -08:00
jp9000
1bcbaf8e75 libobs: Use byte sequence for non-breaking spaces
Use explicit UTF-8 byte sequence for the "no-break space" character.

Prevents issues with certain editors, and fixes the following compiler
warning on Visual C++:

warning C4819: The file contains a character that cannot be represented
in the current code page (X). Save the file in Unicode format to prevent
data loss
2015-10-15 01:31:07 -07:00
Palana
cf6b75e067 libobs: Add profiler calls 2015-08-12 15:30:29 +02:00
Palana
7187c1b6d4 libobs: Move video_sleep call 2015-08-12 15:30:27 +02:00
jp9000
b89ea47b96 (API Change) libobs: Remove main window funcs/vars
(Non-compiling commit: windowless-context branch)

API Changed:
---------------------
Removed functions:
- obs_add_draw_callback
- obs_remove_draw_callback
- obs_resize
- obs_preview_set_enabled
- obs_preview_enabled

Removed member variables from struct obs_video_info:
- window_width
- window_height
- window

Summary:
---------------------
Changes the core libobs API to not be dependent upon a main window/view.
If you wish to draw to a window/view, use an obs_display object to
handle it.

This allows the use of libobs without requiring a window to be present
on the system.  This is also prunes code that had to be needlessly
duplicated to handle the "main" window.
2015-08-05 01:07:09 -07:00
Anthony Catel
ffb3ca4595 libobs: Use one copy for RGBA output when possible
A minor optimization: in copy_rgbx_frame (used when libobs is set to
output RGBA frames instead of YUV frames), if the line sizes for the
source and destination match, just use a single memcpy call for all of
the data instead of multiple memcpy calls.
2015-07-24 10:42:44 -07:00
jp9000
51dd204c6f libobs: Save clamped video time
The "clamped" video time is the system time per video frame that is
closest to the current system time, but always divisible by the frame
interval.  For example, if the last frame system timestamp was 1600 and
the new frame is 2500, but the frame interval is 800, then the
"clamped" video time is 2400.

This clamped value is useful to get the relative system time without any
jitter.
2015-06-04 18:04:23 -07:00
Palana
a563fbc05b libobs: Add weak reference type for obs_source 2015-05-07 01:57:14 +02:00
jp9000
908a165d62 Add planar YUV 4:4:4 format support
Adds the ability to natively output with planar YUV 4:4:4.
2015-04-17 20:16:40 -07:00
jp9000
a32f8a5d19 libobs: Fix RGB output
RGB output wasn't occurring due to the fact that the frame simply wasn't
being copied.
2015-04-15 18:41:09 -07:00
jp9000
13fd6ff064 libobs: Use bilinear low res scale effect
The normal scaling methods cannot sample enough pixels to create an
accurate output image when the output size is under half the base size,
so use the bilinear low resolution scaling effect in that case instead
to ensure a more accurate low resolution image.
2015-04-10 07:27:25 -07:00
jp9000
05fc9c5b78 libobs: Fix calculation copying aligned textures
Direct3D textures are usually aligned to a specific pitch, so their
internal width is often not equal to the expected output width; this
means that if we want to use it on our texture output, that we must
de-align the texture while copying the texture data.

However, I unintentionally messed up the calculation at some point with
RGBA textures, so the variable size I was supposed to be using was
supposed to be multiplied by 4 (for RGBA), while I was still expecting
single channel data.  So, if the texture width was something like 1332,
the source (directx) texture line size would be somewhere at or above
5328 (because it's RGBA), then destination is at 1332 (YUV luma plane),
and it would unintentionally treat 3996 (or 5328 - 1332) bytes as the
unused alignment data.  So this fixes that miscalculation.
2015-01-14 14:57:27 -08:00
jp9000
b120f7cc80 libobs: Fix sync bug in new frame handling code
The return value of os_sleepto_ns is true if it waited to the specified
time, and false if the current time is past the specified time.  So it
basically returns true if it successfully waited.

I just didn't check the return value properly here, so it ended up just
setting the count of frames to 1 if overshot, ultimately causing sync
issues.
2015-01-05 14:07:22 -08:00
jp9000
f93b2fe794 Set various thread names
Helps identify which threads are which when debugging
2015-01-03 02:37:20 -08:00
jp9000
11106c2fce libobs: Redesign/optimize frame encoding handling
Previously, the design for the interaction between the encoder thread
and the graphics thread was that the encoder thread would signal to the
graphics thread when to start drawing each frame.  The original idea
behind this was to prevent mutually cascading stalls of encoding or
graphics rendering (i.e., if rendering took too long, then encoding
would have to catch up, then rendering would have to catch up again, and
so on, cascading upon each other).  The ultimate goal was to prevent
encoding from impacting graphics and vise versa.

However, eventually it was realized that there were some fundamental
flaws with this design.

1. Stray frame duplication.  You could not guarantee that a frame would
   render on time, so sometimes frames would unintentionally be lost if
   there was any sort of minor hiccup or if the thread took too long to
   be scheduled I'm guessing.

2. Frame timing in the rendering thread was less accurate.  The only
   place where frame timing was accurate was in the encoder thread, and
   the graphics thread was at the whim of thread scheduling.  On higher
   end computers it was typically fine, but it was just generally not
   guaranteed that a frame would be rendered when it was supposed to be
   rendered.

So the solution (originally proposed by r1ch and paibox) is to instead
keep the encoding and graphics threads separate as usual, but instead of
the encoder thread controlling the graphics thread, the graphics thread
now controls the encoder thread.  The encoder thread keeps a limited
cache of frames, then the graphics thread copies frames in to the cache
and increments a semaphore to schedule the encoder thread to encode that
data.

In the cache, each frame has an encode counter.  If the frame cache is
full (e.g., the encoder taking too long to return frames), it will not
cache a new frame, but instead will just increment the counter on the
last frame in the cache to schedule that frame to encode again, ensuring
that frames are on time and reducing CPU usage by lowering video
complexity.  If the graphics thread takes too long to render a frame,
then it will add that frame with the count value set to the total amount
of frames that were missed (actual legitimately duplicated frames).

Because the cache gives many frames of breathing room for the encoder to
encode frames, this design helps improve results especially when using
encoding presets that have higher complexity and CPU usage, minimizing
the risk of needlessly skipped or duplicated frames.

I also managed to sneak in what should be a bit of an optimization to
reduce copying of frame data, though how much of an optimization it
ultimately ends up being is debatable.

So to sum it up, this commit increases accuracy of frame timing,
completely removes stray frame duplication, gives better results for
higher complexity encoding presets, and potentially optimizes the frame
pipeline a tiny bit.
2014-12-31 04:03:47 -08:00
jp9000
c431ac6aa5 libobs: Refactor source volume transition design
This changes the way source volume handles transitioning between being
active and inactive states.

The previous way that transitioning handled volume was that it set the
presentation volume of the source and all of its sub-sources to 0.0 if
the source was inactive, and 1.0 if active.  Transition sources would
then also set the presentation volume for sub-sources to whatever their
transitioning volume was.  However, the problem with this is that the
design didn't take in to account if the source or its sub-sources were
active anywhere else, so because of that it would break if that ever
happened, and I didn't realize that when I was designing it.

So instead, this completely overhauls the design of handling
transitioning volume.  Each frame, it'll go through all sources and
check whether they're active or inactive and set the base volume
accordingly.  If transitions are currently active, it will actually walk
the active source tree and check whether the source is in a
transitioning state somewhere.

 - If the source is a sub-source of a transition, and it's not active
   outside of the transition, then the transition will control the
   volume of the source.

 - If the source is a sub-source of a transition, but it's also active
   outside of the transition, it'll defer to whichever is louder.

This also adds a new callback to the obs_source_info structure for
transition sources, get_transition_volume, which is called to get the
transitioning volume of a sub-source.
2014-12-28 01:51:43 -08:00
jp9000
c88220552f (API Change) libobs: Add bicubic/lanczos scaling
This adds bicubic and lanczos scaling capability to libobs to improve
scaling quality and sharpness when the output resolution has to be
scaled relative to the base resolution.  Bilinear is also available,
although bilinear has rather poor quality and causes scaling to appear
blurry.

If the output resolution is close to the base resolution, then bilinear
is used instead as an optimization, as there's no need to use these
shaders if scaling is not in use.

The Bicubic and Lanczos effects are also exposed via exported function
to allow the ability to use those shaders in plugin modules if desired.

The API change adds a variable 'scale_type' to the obs_video_info
structure that allows the user interface to choose what type of scaling
filter should be used.
2014-12-15 01:55:12 -08:00
jp9000
b07862286a (API Change) Add colorspace info to obs_video_info
This was an important change because we were originally using an
hard-coded 709/partial range color matrix for the output, which was
causing problems for people wanting to use different formats or color
spaces.  This will now automatically generate the color matrix depending
on the format, color space, and range, or use an identity matrix if the
video format is RGB instead of YUV.
2014-12-11 19:51:30 -08:00
jp9000
87ac9c91bc libobs: Add flush to video pipeline
On certain GPUs, if you don't flush and the window is minimized it can
endlessly accumulate memory due to what I'm assuming are driver design
flaws (though I can't know for sure).  The flush seems to prevent this
from happening, at least from my tests.  It would be nice if this
weren't necessary.
2014-12-07 23:15:13 -08:00
jp9000
d14dbbc540 Add timestamp circlebuf for video input/output
At the start of each render loop, it would get the timestamp, and then
it would then assign that timestamp to whatever frame was downloaded.
However, the frame that was downloaded was usually occurred a number of
frames ago, so it would assign the wrong timestamp value to that frame.

This fixes that issue by storing the timestamps in a circular buffer.
2014-10-22 20:32:48 -07:00