Commit Graph

84 Commits (0a218e06b7601c0e561db518b1b5878557cd1ebc)

Author SHA1 Message Date
Jim 29782cd594 libobs: And fix area scaling effect with RGBA 2022-06-04 19:54:52 -07:00
Jim a33a5d2151 libobs: Fix bilinear lowres RGBA as well 2022-06-04 19:03:20 -07:00
Jim 4a2a06b22f libobs: Fix RGBA format output not working
Due to a bug in shader parsing, it thinks that because the token
"multiplier" is used here, that the "multiplier" uniform is being used.

This is a workaround for the issue because fixing the parser is probably
going to be much more annoying than just working around the issue for
now.
2022-06-04 17:47:47 -07:00
jpark37 bacd4713da libobs: Clear low bits when writing P010
Don't want to rely on consumer to ignore those bits.
2022-06-04 01:00:21 -07:00
jpark37 11da542a0d libobs: Add max_luminance to obs_source_frame
Used in situations where source luminance is greater than HDR nominal
peak setting to avoid clipping by applying BT.2408 maxRGB EETF.
2022-05-27 14:56:47 -07:00
jpark37 4304329d0c libobs: Ignore lower six bits for P010 sources 2022-05-17 02:46:41 -07:00
Norihiro Kamae dd43635e78 libobs: Fix reserved word in variable names
The use of the reserved name caused initialization failure on Linux.
2022-05-11 11:28:20 -04:00
jpark37 0f53dc28bb libobs: Reduce PQ shader math 2022-05-11 03:38:07 -07:00
jpark37 2a0d8d1c9c libobs: Add support for reading I420 PQ
Not normally a valid combination, but Xbox writes 8-bit HDR videos.
2022-05-08 14:12:41 -07:00
jpark37 ed835810b4 libobs: Use tabs in format_conversion.effect 2022-05-08 14:12:41 -07:00
Jim 952988d9ec
Merge pull request #6231 from mvji/prores_pix_fmt
Add support for GPU conversion of YUV422P10LE, YUV444P12LE, YUVA444P12LE
2022-05-04 02:01:27 -07:00
jpark37 a82bbb416f libobs: Fix NaNs when using EETF for HLG 2022-04-26 10:11:28 -07:00
jpark37 ed85307f7e libobs: Clean up color.effect a bit 2022-04-26 10:11:28 -07:00
mvji d3a8ef7128 libobs: Add support for YUV422P10LE, YUV444P12LE, YUVA444P12LE 2022-04-19 19:37:07 +02:00
jpark37 338608bd67 libobs,UI: Support HLG nominal peak level
HLG output uses MovieLabs-recommended procedure.

- If peak luminance is greater than 1000, use maxRGB EETF to 1000.
- Otherwise, don't tonemap.
- Then use normal HLG conversion procedure with gamma 1.2 (1000 nits).
2022-04-14 09:36:44 -07:00
jpark37 08f50a7d22 libobs: Add more color handling to default/opaque 2022-04-13 06:23:11 -07:00
jpark37 06111d5b10 libobs: Add high-precision sRGB support 2022-04-08 17:19:23 -07:00
jpark37 0ed0f2cdb4 libobs: Add I010/P010 support, TRC enum 2022-04-03 00:01:25 -07:00
jpark37 c455aaedba libobs: Add color spaces to deinterlace shaders 2022-04-02 20:14:31 -07:00
jpark37 a8bc994f07 libobs: Add color spaces to scale shaders 2022-03-26 13:00:34 -07:00
jpark37 87ab39c412 libobs: Render main texture for active color space
Preview will draw SDR white luminance from settings (default 300 nits)
when displayed on an HDR monitor rather than CCCS 80 nits.
2022-03-23 22:35:27 -07:00
jpark37 4e6765a01b libobs: Remove DrawSrgbDecompressPremultiplied
Technique is no longer referenced, and doesn't seem useful.
2021-10-02 05:53:27 -07:00
jpark37 10cf411f99 libobs: DrawSrgbDecompress for default_rect.effect
Necessary for upcoming fix to browser source alpha.
2021-09-30 17:35:06 -07:00
jpark37 c9766d8e28 libobs: Add DrawSrgbDecompress default technique
Useful when the texture does not support SRGB conversion on load.
2021-07-11 08:26:05 -07:00
jpark37 05b507d900 libobs: DrawSrgbDecompressPremultiplied technique
Necessary for an upcoming fix to browser source.
2021-07-11 08:11:12 -07:00
jpark37 b70161bc67 libobs: Add DrawOpaque for rect effect
Needed by Syphon Client.
2021-07-04 10:35:54 -07:00
jpark37 a8f0a27a3a libobs: Remove DrawAlphaBlend technique 2021-05-11 13:12:39 -07:00
jpark37 de7cec3dee libobs: Add DrawAlphaBlend technique
Useful when fixed-function blend does not provide enough precision.
2021-05-05 09:43:34 -07:00
jpark37 840e2b3d43 libobs: Add DrawNonlinearAlpha technique
This allows OBS to mimic other programs that incorrectly use alpha.

Requires premultiplied blend state.
2021-04-24 17:18:42 -07:00
jpark37 9bdb16aa78 libobs: Fix Area shaders missing for RGB output
Area downscale setting currently only works with YUV outputs. This adds
the missing DrawAlphaDivide technique.
2019-08-31 00:25:24 -07:00
jpark37 af01e044a2 libobs: Fix Lanczos calculations
- Fix: Ensure (1, 1) coordinate gets clamped.
- Fix: Increase weight precision by premultiplying UV in VS.
- Cleanup: Group coordinates 012/345 instead of 024/135.
- Cleanup: Remove unnecessary branches.

NVIDIA RTX 2080 Ti, Intel GPA, SetStablePowerState

256x224 -> 1323x1080: 123 us -> 123 us
2019-08-25 10:00:23 -07:00
jpark37 3485c4cdac libobs: Simplify bicubic weight calculations
Also increase weight precision by premultiplying UV in VS.

Intel HD Graphics 530, Intel GPA, SetStablePowerState

256x224 -> 1323x1080: 1221 us -> 1020 us
2019-08-25 10:00:10 -07:00
jpark37 9f5d218e16 libobs: Remove unnecessary divides from Lanczos 2019-08-14 21:36:23 -07:00
jpark37 93f1ab789d libobs: Fix dark lines using Lanczos
When texel samples are not exactly on texel centers, weight calculations
will involve a divide by a number very close to zero, resulting in
precision issues. Restore normalization of weights to compensate.
2019-08-14 21:00:09 -07:00
jpark37 3d6f5c8ad6 libobs: Add YUV alpha formats
This will allow YUV alpha formats to be converted to RGBA on the GPU.
2019-08-11 11:26:22 -07:00
Jim 31a902b3af
Merge pull request #2018 from jpark37/yuv-simplify2
libobs: Separate textures for YUV input
2019-08-10 22:41:28 -07:00
Jim ecfcb64056
Merge pull request #1994 from jpark37/faster-lanczos
libobs: Optimize lanczos shader, remove scaling
2019-08-10 03:02:26 -07:00
jpark37 bdd8d64053 libobs: Separate textures for YUV input
The shaders to unpack YUV information from the same texture were rather
complicated. Breaking them up into separate textures makes the shaders
much simpler, and we can remove the PRECISION_OFFSET hack.

Performance also gets a nice boost on Intel for planar textures.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080

UYVY: 473 us -> 457 us
YUY2: 492 us -> 422 us
YVYU: 491 us -> 441 us
I420: 1637 us -> 505 us
I422: 1644 us -> 482 us
I444: 1653 us -> 504 us
NV12: 1656 us -> 369 us
Y800 (limited): 270 us -> 277 us
Y800 (full): 263 us -> 289 us
RGB (limited): 341 us -> 411 us
BGR3 (limited): 512 us -> 509 us
BGR3 (full): 527 us -> 534 us
2019-08-09 21:14:29 -07:00
jpark37 9aacc99b3e libobs: Separate textures for YUV output, fix chroma
The shaders to pack YUV information into the same texture were rather
complicated and suffering precision issues. Breaking them up into
separate textures makes the shaders much simpler and avoids having to
compute large integer offsets. Unfortunately, the code to handle
multiple textures is not as pleasant, but at least the NV12 rendering
path is no longer separate.

In addition, write chroma samples to "standard" offsets. For I444,
there's no difference, but I420/NV12 formats now have chroma shifted to
the left as 4:2:0 is shown in the H.264 specification.

Intel GPA, SetStablePowerState, Intel HD Graphics 530

Expect speed incrase:
I420: 844 us -> 493 us (254 us + 190 us + 274 us)
I444: 837 us -> 747 us (258 us + 276 us + 272 us)
NV12: 450 us -> 368 us (319 us + 168 us)

Expect no change:
NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us)
RGB: 359 us -> 387 us

Fixes https://obsproject.com/mantis/view.php?id=624
Fixes https://obsproject.com/mantis/view.php?id=1512
2019-07-26 23:21:41 -07:00
jpark37 f27ece50c9 libobs: Optimize lanczos shader, remove scaling
Use bilinear filtering to reduce 36 taps to 25 for the regular path.
This works because the middle weights are always between 0 and 1,
allowing texture coordinates to be placed strategically to sample
correct ratios. I'm not sure about the undistort path, so I've left that
alone.

Also remove scaling added in #526, after which weight normalization is
unnecessary. If we want to use or invent an algorithm with alternate
downscaling properties, that's fine, but I don't think we should change
Lanczos scaling to mean something it's not. The scale implementation was
also seen not working when applied directly to scene items because of
assumptions made about the projection matrix.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, D3D11
644x478 -> 1323x1080: 3890 us -> 3401 us
1920x1080 -> 1280x720: 2555 us -> 2261 us
2019-07-26 20:45:33 -07:00
Jim 62c7e00d16
Merge pull request #1993 from jpark37/faster-bicubic
Optimize bicubic shader
2019-07-26 00:36:19 -07:00
jpark37 2721ac4a85 libobs: Optimize bicubic shader
Use bilinear filtering to reduce 16 taps to 9 for the regular path. This
works because the middle weights are always between 0 and 1, allowing
texture coordinates to be placed strategically to sample correct ratios.
I'm not sure about the undistort path, so I've left that alone.

Also remove weight normalization. I'm not seeing that make even a small
difference.

Intel HD Graphics 530, D3D11
644x478 -> 1323x1080: 1790 us -> 1279 us
1920x1080 -> 1280x720: 1301 us -> 918 us

References:
https://entropymine.com/imageworsener/bicubic/
http://vec3.ca/bicubic-filtering-in-fewer-taps/
http://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch24.html
2019-07-25 22:21:11 -07:00
James Park 37f663a789 libobs: obs-ffmpeg: win-dshow: Planar 4:2:2 video
This format has been seen when using FFmpeg MJPEG decompression.
2019-07-25 20:11:37 -07:00
jpark37 2656bf0a90 libobs: Rework RGB to YUV conversion
RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.

Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12

1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us

1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us

1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us

1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
2019-07-22 01:12:35 -07:00
jpark37 85cc7c84bc libobs: obs-filters: Area upscale shader
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.

Also add "base_dimension" uniform to avoid unnecessary division.

Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
2019-07-17 21:11:18 -07:00
James Park aa22b61e3e libobs: Full-screen triangle format conversions
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.

A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.

Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.

This can be applied to more situations, but start small first.

Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
2019-06-18 22:29:07 -07:00
Jim ab70bff4b3
Merge pull request #1913 from jpark37/area-shader-optimization
libobs: Area-resampling shader optimizations
2019-06-17 20:40:25 -07:00
Jim fafda14963
Merge pull request #1906 from jpark37/bgr-three
libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support
2019-06-15 16:40:44 -07:00
Jim dd607b422f
Merge pull request #1881 from jpark37/lowres-fair-sampling
libobs: Improve low-resolution bilinear sampling
2019-06-15 16:03:02 -07:00
James Park 9f66b90d99 libobs: Area-resampling shader optimizations
Switch for loop to do/while because we know the condition is always
true for the first loop.

Replace int math with float math to play nicely with more GPUs.

Add variables imagesize/targetsize to avoid redundant reciprocals.

Intel GPA results: 1166 -> 836 us
2019-06-03 23:11:23 -07:00