Commit Graph

18 Commits (3fe44c6331d57f0edbc9a5fe60d52a4ecb98b835)

Author SHA1 Message Date
jpark37 3d6f5c8ad6 libobs: Add YUV alpha formats
This will allow YUV alpha formats to be converted to RGBA on the GPU.
2019-08-11 11:26:22 -07:00
jpark37 bdd8d64053 libobs: Separate textures for YUV input
The shaders to unpack YUV information from the same texture were rather
complicated. Breaking them up into separate textures makes the shaders
much simpler, and we can remove the PRECISION_OFFSET hack.

Performance also gets a nice boost on Intel for planar textures.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080

UYVY: 473 us -> 457 us
YUY2: 492 us -> 422 us
YVYU: 491 us -> 441 us
I420: 1637 us -> 505 us
I422: 1644 us -> 482 us
I444: 1653 us -> 504 us
NV12: 1656 us -> 369 us
Y800 (limited): 270 us -> 277 us
Y800 (full): 263 us -> 289 us
RGB (limited): 341 us -> 411 us
BGR3 (limited): 512 us -> 509 us
BGR3 (full): 527 us -> 534 us
2019-08-09 21:14:29 -07:00
jpark37 9aacc99b3e libobs: Separate textures for YUV output, fix chroma
The shaders to pack YUV information into the same texture were rather
complicated and suffering precision issues. Breaking them up into
separate textures makes the shaders much simpler and avoids having to
compute large integer offsets. Unfortunately, the code to handle
multiple textures is not as pleasant, but at least the NV12 rendering
path is no longer separate.

In addition, write chroma samples to "standard" offsets. For I444,
there's no difference, but I420/NV12 formats now have chroma shifted to
the left as 4:2:0 is shown in the H.264 specification.

Intel GPA, SetStablePowerState, Intel HD Graphics 530

Expect speed incrase:
I420: 844 us -> 493 us (254 us + 190 us + 274 us)
I444: 837 us -> 747 us (258 us + 276 us + 272 us)
NV12: 450 us -> 368 us (319 us + 168 us)

Expect no change:
NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us)
RGB: 359 us -> 387 us

Fixes https://obsproject.com/mantis/view.php?id=624
Fixes https://obsproject.com/mantis/view.php?id=1512
2019-07-26 23:21:41 -07:00
James Park 37f663a789 libobs: obs-ffmpeg: win-dshow: Planar 4:2:2 video
This format has been seen when using FFmpeg MJPEG decompression.
2019-07-25 20:11:37 -07:00
jpark37 2656bf0a90 libobs: Rework RGB to YUV conversion
RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12

1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us

1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us

1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us

1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
2019-07-22 01:12:35 -07:00
James Park aa22b61e3e libobs: Full-screen triangle format conversions
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.

A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.

Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.

This can be applied to more situations, but start small first.

Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
2019-06-18 22:29:07 -07:00
James Park 614025742b libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support
Someone mentioned this format preserves the most quality for a
particular capture card using V4L2.
2019-05-30 06:05:53 -07:00
James Park a86710ec5b libobs: Support limited color range for RGB/Y800 sources
libobs: Add support for limited to full color range conversions when
using RGB or Y800 formats, and move RGB converison for Y800 formats to
the GPU.

decklink: Stop hiding color space/range properties for RGB formats, and
remove "YUV" from "YUV Color Space" and "YUV Color Range".

win-dshow: Remove "YUV" from "YUV Color Space" and "YUV Color Range".

UI: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
2019-04-25 15:13:05 -07:00
James Park 69c215345a libobs: Simplify YUV conversion
Currently several shaders need "DrawMatrix" techniques to support the
possibility that the input texture is a "YUV" format. Also, "DrawMatrix"
is overloaded for translation in both directions when it is written for
RGB to "YUV" only.

A cleaner solution is to handle "YUV" to RGB up-front as part of format
conversion, and ensure only RGB inputs reach the other shaders. This is
necessary to someday perform correct scale filtering without the cost of
redundant "YUV" conversions per texture tap.

A necessary prerequisite for this is to add conversion support for
VIDEO_FORMAT_I444, and that is now in place. There was already a hack in
place to cover VIDEO_FORMAT_Y800. All other "YUV" formats already have
conversion functions.

"DrawMatrix" has been removed from shaders that only supported "YUV" to
RGB conversions. It still exists in shaders that perform RGB to "YUV"
conversions, and the implementations have been sanitized accordingly.
2019-04-11 23:00:03 -07:00
jp9000 28d0cc8b97 libobs: Use NV12 textures when available 2019-02-07 17:00:46 -08:00
Palana db1da73647 libobs: Fix I420 shader for (width/2)%4 == 2 resolutions
For those resolutions the last two chroma samples of every other
line would be overwritten by the last chroma samples of the previous
line (depending on sampler used), producing artifacts on the left
edge of the resulting image (e.g. any color present on the right
edge of the image would "bleed" to every other line on
the left edge)
2017-09-13 16:39:36 +02:00
jp9000 f9b5da513a libobs: Fix tex.Load lookup (needs int3, not int2)
libobs' shader language is basically HLSL, and tex.Load uses an int3 for
2D textures, with texture mipmap index for the last component.  This bug
bypassed testing because the front-end automatically switches to OpenGL
if D3D11 initialization fails, and when converted to GLSL, works fine
because texelFetch only requires two components.  This also means
there's a bug in GLSL shader conversion code, because it's essentially
ignoring the third component when it shouldn't be.
2017-05-06 10:39:42 -07:00
jp9000 e7f754df97 libobs: Use tex.Load for reverse NV12/I420 funcs
Eventually, most things should be replaced with Load where applicable
(though in some cases sub-pixel sampling is desired).

This commit also fixes a bug where NV12 async sources wouldn't render
correctly.
2017-05-06 01:24:45 -07:00
jp9000 7bc8dc3471 libobs: Add Planar444 conversion to effect 2015-04-16 22:43:46 -07:00
jp9000 2fa37a1f2e libobs-opengl: Fix render targets being flipped
When render targets are used, they output to the render target inverted
due to the way that opengl works.  This fixes that issue by inverting
the projection matrix so that it renders the image upside down and
inverting the front face from counterclockwise to clockwise.
2015-03-22 18:38:45 -07:00
jp9000 817a724dea libobs: Add NV12_Reverse shader 2014-12-21 10:14:18 -08:00
jp9000 ca8a9fb5a7 libobs: Fix conversion shader D3D display bug
Just for a quick background: D3D's fmod intrinsic is very imprecise.
Naturally floating points aren't precise at all, and when the numbers
you're dealing with become very large, it can often be off by 0.1 or
more.

However, apparently 0.1 isn't enough of an offset to ensure a proper
value when using the fmod intrinsic and then flooring the value.  0.2
seems to fix the issue and make the image display properly.
2014-12-09 14:21:01 -08:00
BtbN 38c2fc87aa Move all data into the subdir it belongs to
Completely removes the build dir in favor of cmake based build layouting
2014-07-19 01:38:41 +02:00