obs-studio

Commit Graph

Author	SHA1	Message	Date
Jim	29782cd594	libobs: And fix area scaling effect with RGBA	2022-06-04 19:54:52 -07:00
Jim	a33a5d2151	libobs: Fix bilinear lowres RGBA as well	2022-06-04 19:03:20 -07:00
Jim	4a2a06b22f	libobs: Fix RGBA format output not working Due to a bug in shader parsing, it thinks that because the token "multiplier" is used here, that the "multiplier" uniform is being used. This is a workaround for the issue because fixing the parser is probably going to be much more annoying than just working around the issue for now.	2022-06-04 17:47:47 -07:00
jpark37	bacd4713da	libobs: Clear low bits when writing P010 Don't want to rely on consumer to ignore those bits.	2022-06-04 01:00:21 -07:00
jpark37	11da542a0d	libobs: Add max_luminance to obs_source_frame Used in situations where source luminance is greater than HDR nominal peak setting to avoid clipping by applying BT.2408 maxRGB EETF.	2022-05-27 14:56:47 -07:00
jpark37	4304329d0c	libobs: Ignore lower six bits for P010 sources	2022-05-17 02:46:41 -07:00
Norihiro Kamae	dd43635e78	libobs: Fix reserved word in variable names The use of the reserved name caused initialization failure on Linux.	2022-05-11 11:28:20 -04:00
jpark37	0f53dc28bb	libobs: Reduce PQ shader math	2022-05-11 03:38:07 -07:00
jpark37	2a0d8d1c9c	libobs: Add support for reading I420 PQ Not normally a valid combination, but Xbox writes 8-bit HDR videos.	2022-05-08 14:12:41 -07:00
jpark37	ed835810b4	libobs: Use tabs in format_conversion.effect	2022-05-08 14:12:41 -07:00
Jim	952988d9ec	Merge pull request #6231 from mvji/prores_pix_fmt Add support for GPU conversion of YUV422P10LE, YUV444P12LE, YUVA444P12LE	2022-05-04 02:01:27 -07:00
jpark37	a82bbb416f	libobs: Fix NaNs when using EETF for HLG	2022-04-26 10:11:28 -07:00
jpark37	ed85307f7e	libobs: Clean up color.effect a bit	2022-04-26 10:11:28 -07:00
mvji	d3a8ef7128	libobs: Add support for YUV422P10LE, YUV444P12LE, YUVA444P12LE	2022-04-19 19:37:07 +02:00
jpark37	338608bd67	libobs,UI: Support HLG nominal peak level HLG output uses MovieLabs-recommended procedure. - If peak luminance is greater than 1000, use maxRGB EETF to 1000. - Otherwise, don't tonemap. - Then use normal HLG conversion procedure with gamma 1.2 (1000 nits).	2022-04-14 09:36:44 -07:00
jpark37	08f50a7d22	libobs: Add more color handling to default/opaque	2022-04-13 06:23:11 -07:00
jpark37	06111d5b10	libobs: Add high-precision sRGB support	2022-04-08 17:19:23 -07:00
jpark37	0ed0f2cdb4	libobs: Add I010/P010 support, TRC enum	2022-04-03 00:01:25 -07:00
jpark37	c455aaedba	libobs: Add color spaces to deinterlace shaders	2022-04-02 20:14:31 -07:00
jpark37	a8bc994f07	libobs: Add color spaces to scale shaders	2022-03-26 13:00:34 -07:00
jpark37	87ab39c412	libobs: Render main texture for active color space Preview will draw SDR white luminance from settings (default 300 nits) when displayed on an HDR monitor rather than CCCS 80 nits.	2022-03-23 22:35:27 -07:00
jpark37	4e6765a01b	libobs: Remove DrawSrgbDecompressPremultiplied Technique is no longer referenced, and doesn't seem useful.	2021-10-02 05:53:27 -07:00
jpark37	10cf411f99	libobs: DrawSrgbDecompress for default_rect.effect Necessary for upcoming fix to browser source alpha.	2021-09-30 17:35:06 -07:00
jpark37	c9766d8e28	libobs: Add DrawSrgbDecompress default technique Useful when the texture does not support SRGB conversion on load.	2021-07-11 08:26:05 -07:00
jpark37	05b507d900	libobs: DrawSrgbDecompressPremultiplied technique Necessary for an upcoming fix to browser source.	2021-07-11 08:11:12 -07:00
jpark37	b70161bc67	libobs: Add DrawOpaque for rect effect Needed by Syphon Client.	2021-07-04 10:35:54 -07:00
jpark37	a8f0a27a3a	libobs: Remove DrawAlphaBlend technique	2021-05-11 13:12:39 -07:00
jpark37	de7cec3dee	libobs: Add DrawAlphaBlend technique Useful when fixed-function blend does not provide enough precision.	2021-05-05 09:43:34 -07:00
jpark37	840e2b3d43	libobs: Add DrawNonlinearAlpha technique This allows OBS to mimic other programs that incorrectly use alpha. Requires premultiplied blend state.	2021-04-24 17:18:42 -07:00
jpark37	9bdb16aa78	libobs: Fix Area shaders missing for RGB output Area downscale setting currently only works with YUV outputs. This adds the missing DrawAlphaDivide technique.	2019-08-31 00:25:24 -07:00
jpark37	af01e044a2	libobs: Fix Lanczos calculations - Fix: Ensure (1, 1) coordinate gets clamped. - Fix: Increase weight precision by premultiplying UV in VS. - Cleanup: Group coordinates 012/345 instead of 024/135. - Cleanup: Remove unnecessary branches. NVIDIA RTX 2080 Ti, Intel GPA, SetStablePowerState 256x224 -> 1323x1080: 123 us -> 123 us	2019-08-25 10:00:23 -07:00
jpark37	3485c4cdac	libobs: Simplify bicubic weight calculations Also increase weight precision by premultiplying UV in VS. Intel HD Graphics 530, Intel GPA, SetStablePowerState 256x224 -> 1323x1080: 1221 us -> 1020 us	2019-08-25 10:00:10 -07:00
jpark37	9f5d218e16	libobs: Remove unnecessary divides from Lanczos	2019-08-14 21:36:23 -07:00
jpark37	93f1ab789d	libobs: Fix dark lines using Lanczos When texel samples are not exactly on texel centers, weight calculations will involve a divide by a number very close to zero, resulting in precision issues. Restore normalization of weights to compensate.	2019-08-14 21:00:09 -07:00
jpark37	3d6f5c8ad6	libobs: Add YUV alpha formats This will allow YUV alpha formats to be converted to RGBA on the GPU.	2019-08-11 11:26:22 -07:00
Jim	31a902b3af	Merge pull request #2018 from jpark37/yuv-simplify2 libobs: Separate textures for YUV input	2019-08-10 22:41:28 -07:00
Jim	ecfcb64056	Merge pull request #1994 from jpark37/faster-lanczos libobs: Optimize lanczos shader, remove scaling	2019-08-10 03:02:26 -07:00
jpark37	bdd8d64053	libobs: Separate textures for YUV input The shaders to unpack YUV information from the same texture were rather complicated. Breaking them up into separate textures makes the shaders much simpler, and we can remove the PRECISION_OFFSET hack. Performance also gets a nice boost on Intel for planar textures. Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080 UYVY: 473 us -> 457 us YUY2: 492 us -> 422 us YVYU: 491 us -> 441 us I420: 1637 us -> 505 us I422: 1644 us -> 482 us I444: 1653 us -> 504 us NV12: 1656 us -> 369 us Y800 (limited): 270 us -> 277 us Y800 (full): 263 us -> 289 us RGB (limited): 341 us -> 411 us BGR3 (limited): 512 us -> 509 us BGR3 (full): 527 us -> 534 us	2019-08-09 21:14:29 -07:00
jpark37	9aacc99b3e	libobs: Separate textures for YUV output, fix chroma The shaders to pack YUV information into the same texture were rather complicated and suffering precision issues. Breaking them up into separate textures makes the shaders much simpler and avoids having to compute large integer offsets. Unfortunately, the code to handle multiple textures is not as pleasant, but at least the NV12 rendering path is no longer separate. In addition, write chroma samples to "standard" offsets. For I444, there's no difference, but I420/NV12 formats now have chroma shifted to the left as 4:2:0 is shown in the H.264 specification. Intel GPA, SetStablePowerState, Intel HD Graphics 530 Expect speed incrase: I420: 844 us -> 493 us (254 us + 190 us + 274 us) I444: 837 us -> 747 us (258 us + 276 us + 272 us) NV12: 450 us -> 368 us (319 us + 168 us) Expect no change: NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us) RGB: 359 us -> 387 us Fixes https://obsproject.com/mantis/view.php?id=624 Fixes https://obsproject.com/mantis/view.php?id=1512	2019-07-26 23:21:41 -07:00
jpark37	f27ece50c9	libobs: Optimize lanczos shader, remove scaling Use bilinear filtering to reduce 36 taps to 25 for the regular path. This works because the middle weights are always between 0 and 1, allowing texture coordinates to be placed strategically to sample correct ratios. I'm not sure about the undistort path, so I've left that alone. Also remove scaling added in #526, after which weight normalization is unnecessary. If we want to use or invent an algorithm with alternate downscaling properties, that's fine, but I don't think we should change Lanczos scaling to mean something it's not. The scale implementation was also seen not working when applied directly to scene items because of assumptions made about the projection matrix. Intel GPA, SetStablePowerState, Intel HD Graphics 530, D3D11 644x478 -> 1323x1080: 3890 us -> 3401 us 1920x1080 -> 1280x720: 2555 us -> 2261 us	2019-07-26 20:45:33 -07:00
Jim	62c7e00d16	Merge pull request #1993 from jpark37/faster-bicubic Optimize bicubic shader	2019-07-26 00:36:19 -07:00
jpark37	2721ac4a85	libobs: Optimize bicubic shader Use bilinear filtering to reduce 16 taps to 9 for the regular path. This works because the middle weights are always between 0 and 1, allowing texture coordinates to be placed strategically to sample correct ratios. I'm not sure about the undistort path, so I've left that alone. Also remove weight normalization. I'm not seeing that make even a small difference. Intel HD Graphics 530, D3D11 644x478 -> 1323x1080: 1790 us -> 1279 us 1920x1080 -> 1280x720: 1301 us -> 918 us References: https://entropymine.com/imageworsener/bicubic/ http://vec3.ca/bicubic-filtering-in-fewer-taps/ http://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch24.html	2019-07-25 22:21:11 -07:00
James Park	37f663a789	libobs: obs-ffmpeg: win-dshow: Planar 4:2:2 video This format has been seen when using FFmpeg MJPEG decompression.	2019-07-25 20:11:37 -07:00
jpark37	2656bf0a90	libobs: Rework RGB to YUV conversion RGB to YUV converison was previously baked into every scale shader, but this work has been moved to the YUV packing shaders. The scale shaders now write RGBA instead. In the case where base and output resolutions are identical, the render texture is forwarded directly to the YUV pack step, skipping an entire fullscreen pass. Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12 1920x1080, Before: RGBA -> UYVX: ~321 us UYVX -> Y: ~480 us UYVX -> UV: ~127 us 1920x1080, After: [forward render texture] RGBA -> Y: ~487 us RGBA -> UV: ~131 us 1920x1080 -> 1280x720, Before: RGBA -> UYVX: ~268 us UYVX -> Y: ~209 us UYVX -> UV: ~57 us 1920x1080 -> 1280x720, After: RGBA -> RGBA (rescale): ~268 us RGBA -> Y: ~210 us RGBA -> UV: ~58 us	2019-07-22 01:12:35 -07:00
jpark37	85cc7c84bc	libobs: obs-filters: Area upscale shader Add a separate shader for area upscaling to take advantage of bilinear filtering. Iterating over texels is unnecessary in the upscale case because a target pixel can only overlap 1 or 2 texels in X and Y directions. When only overlapping one texel, adjust UVs to sample texel center to avoid filtering. Also add "base_dimension" uniform to avoid unnecessary division. Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us	2019-07-17 21:11:18 -07:00
James Park	aa22b61e3e	libobs: Full-screen triangle format conversions The cache coherency of rasterization for full-screen passes is better using an oversized triangle that is clipped rather than two triangles. Traversal order of rasterization is GPU-specific, but will almost certainly be better using an undivided primitive. A smaller benefit is that quads along the diagonal are not evaluated multiple times, but that's minor in comparison. Redo format shaders to bypass vertex buffer, and input layout. Add global shader bool "obs_glsl_compile" to make API-specific decisions, i.e. handle upside-down UVs. gl_ortho is not needed for format conversion because the vertex shader does not use ViewProj anymore. This can be applied to more situations, but start small first. Testbed full screen passes, Intel HD Graphics 530: RGBA -> UYVX: 467 -> 439 us, ~6% savings UYVX -> uv: 295 -> 239 us, ~19% savings	2019-06-18 22:29:07 -07:00
Jim	ab70bff4b3	Merge pull request #1913 from jpark37/area-shader-optimization libobs: Area-resampling shader optimizations	2019-06-17 20:40:25 -07:00
Jim	fafda14963	Merge pull request #1906 from jpark37/bgr-three libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support	2019-06-15 16:40:44 -07:00
Jim	dd607b422f	Merge pull request #1881 from jpark37/lowres-fair-sampling libobs: Improve low-resolution bilinear sampling	2019-06-15 16:03:02 -07:00
James Park	9f66b90d99	libobs: Area-resampling shader optimizations Switch for loop to do/while because we know the condition is always true for the first loop. Replace int math with float math to play nicely with more GPUs. Add variables imagesize/targetsize to avoid redundant reciprocals. Intel GPA results: 1166 -> 836 us	2019-06-03 23:11:23 -07:00

1 2

84 Commits (0a218e06b7601c0e561db518b1b5878557cd1ebc)