obs-studio

Commit Graph

Author	SHA1	Message	Date
jpark37	9bdb16aa78	libobs: Fix Area shaders missing for RGB output Area downscale setting currently only works with YUV outputs. This adds the missing DrawAlphaDivide technique.	2019-08-31 00:25:24 -07:00
jpark37	af01e044a2	libobs: Fix Lanczos calculations - Fix: Ensure (1, 1) coordinate gets clamped. - Fix: Increase weight precision by premultiplying UV in VS. - Cleanup: Group coordinates 012/345 instead of 024/135. - Cleanup: Remove unnecessary branches. NVIDIA RTX 2080 Ti, Intel GPA, SetStablePowerState 256x224 -> 1323x1080: 123 us -> 123 us	2019-08-25 10:00:23 -07:00
jpark37	3485c4cdac	libobs: Simplify bicubic weight calculations Also increase weight precision by premultiplying UV in VS. Intel HD Graphics 530, Intel GPA, SetStablePowerState 256x224 -> 1323x1080: 1221 us -> 1020 us	2019-08-25 10:00:10 -07:00
jpark37	9f5d218e16	libobs: Remove unnecessary divides from Lanczos	2019-08-14 21:36:23 -07:00
jpark37	93f1ab789d	libobs: Fix dark lines using Lanczos When texel samples are not exactly on texel centers, weight calculations will involve a divide by a number very close to zero, resulting in precision issues. Restore normalization of weights to compensate.	2019-08-14 21:00:09 -07:00
jpark37	3d6f5c8ad6	libobs: Add YUV alpha formats This will allow YUV alpha formats to be converted to RGBA on the GPU.	2019-08-11 11:26:22 -07:00
Jim	31a902b3af	Merge pull request #2018 from jpark37/yuv-simplify2 libobs: Separate textures for YUV input	2019-08-10 22:41:28 -07:00
Jim	ecfcb64056	Merge pull request #1994 from jpark37/faster-lanczos libobs: Optimize lanczos shader, remove scaling	2019-08-10 03:02:26 -07:00
jpark37	bdd8d64053	libobs: Separate textures for YUV input The shaders to unpack YUV information from the same texture were rather complicated. Breaking them up into separate textures makes the shaders much simpler, and we can remove the PRECISION_OFFSET hack. Performance also gets a nice boost on Intel for planar textures. Intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080 UYVY: 473 us -> 457 us YUY2: 492 us -> 422 us YVYU: 491 us -> 441 us I420: 1637 us -> 505 us I422: 1644 us -> 482 us I444: 1653 us -> 504 us NV12: 1656 us -> 369 us Y800 (limited): 270 us -> 277 us Y800 (full): 263 us -> 289 us RGB (limited): 341 us -> 411 us BGR3 (limited): 512 us -> 509 us BGR3 (full): 527 us -> 534 us	2019-08-09 21:14:29 -07:00
jpark37	9aacc99b3e	libobs: Separate textures for YUV output, fix chroma The shaders to pack YUV information into the same texture were rather complicated and suffering precision issues. Breaking them up into separate textures makes the shaders much simpler and avoids having to compute large integer offsets. Unfortunately, the code to handle multiple textures is not as pleasant, but at least the NV12 rendering path is no longer separate. In addition, write chroma samples to "standard" offsets. For I444, there's no difference, but I420/NV12 formats now have chroma shifted to the left as 4:2:0 is shown in the H.264 specification. Intel GPA, SetStablePowerState, Intel HD Graphics 530 Expect speed incrase: I420: 844 us -> 493 us (254 us + 190 us + 274 us) I444: 837 us -> 747 us (258 us + 276 us + 272 us) NV12: 450 us -> 368 us (319 us + 168 us) Expect no change: NV12 (HW): 580 (481 us + 166 us) us -> 588 us (468 us + 247 us) RGB: 359 us -> 387 us Fixes https://obsproject.com/mantis/view.php?id=624 Fixes https://obsproject.com/mantis/view.php?id=1512	2019-07-26 23:21:41 -07:00
jpark37	f27ece50c9	libobs: Optimize lanczos shader, remove scaling Use bilinear filtering to reduce 36 taps to 25 for the regular path. This works because the middle weights are always between 0 and 1, allowing texture coordinates to be placed strategically to sample correct ratios. I'm not sure about the undistort path, so I've left that alone. Also remove scaling added in #526, after which weight normalization is unnecessary. If we want to use or invent an algorithm with alternate downscaling properties, that's fine, but I don't think we should change Lanczos scaling to mean something it's not. The scale implementation was also seen not working when applied directly to scene items because of assumptions made about the projection matrix. Intel GPA, SetStablePowerState, Intel HD Graphics 530, D3D11 644x478 -> 1323x1080: 3890 us -> 3401 us 1920x1080 -> 1280x720: 2555 us -> 2261 us	2019-07-26 20:45:33 -07:00
Jim	62c7e00d16	Merge pull request #1993 from jpark37/faster-bicubic Optimize bicubic shader	2019-07-26 00:36:19 -07:00
jpark37	2721ac4a85	libobs: Optimize bicubic shader Use bilinear filtering to reduce 16 taps to 9 for the regular path. This works because the middle weights are always between 0 and 1, allowing texture coordinates to be placed strategically to sample correct ratios. I'm not sure about the undistort path, so I've left that alone. Also remove weight normalization. I'm not seeing that make even a small difference. Intel HD Graphics 530, D3D11 644x478 -> 1323x1080: 1790 us -> 1279 us 1920x1080 -> 1280x720: 1301 us -> 918 us References: https://entropymine.com/imageworsener/bicubic/ http://vec3.ca/bicubic-filtering-in-fewer-taps/ http://developer.download.nvidia.com/books/HTML/gpugems/gpugems_ch24.html	2019-07-25 22:21:11 -07:00
James Park	37f663a789	libobs: obs-ffmpeg: win-dshow: Planar 4:2:2 video This format has been seen when using FFmpeg MJPEG decompression.	2019-07-25 20:11:37 -07:00
jpark37	2656bf0a90	libobs: Rework RGB to YUV conversion RGB to YUV converison was previously baked into every scale shader, but this work has been moved to the YUV packing shaders. The scale shaders now write RGBA instead. In the case where base and output resolutions are identical, the render texture is forwarded directly to the YUV pack step, skipping an entire fullscreen pass. Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12 1920x1080, Before: RGBA -> UYVX: ~321 us UYVX -> Y: ~480 us UYVX -> UV: ~127 us 1920x1080, After: [forward render texture] RGBA -> Y: ~487 us RGBA -> UV: ~131 us 1920x1080 -> 1280x720, Before: RGBA -> UYVX: ~268 us UYVX -> Y: ~209 us UYVX -> UV: ~57 us 1920x1080 -> 1280x720, After: RGBA -> RGBA (rescale): ~268 us RGBA -> Y: ~210 us RGBA -> UV: ~58 us	2019-07-22 01:12:35 -07:00
jpark37	85cc7c84bc	libobs: obs-filters: Area upscale shader Add a separate shader for area upscaling to take advantage of bilinear filtering. Iterating over texels is unnecessary in the upscale case because a target pixel can only overlap 1 or 2 texels in X and Y directions. When only overlapping one texel, adjust UVs to sample texel center to avoid filtering. Also add "base_dimension" uniform to avoid unnecessary division. Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us	2019-07-17 21:11:18 -07:00
James Park	aa22b61e3e	libobs: Full-screen triangle format conversions The cache coherency of rasterization for full-screen passes is better using an oversized triangle that is clipped rather than two triangles. Traversal order of rasterization is GPU-specific, but will almost certainly be better using an undivided primitive. A smaller benefit is that quads along the diagonal are not evaluated multiple times, but that's minor in comparison. Redo format shaders to bypass vertex buffer, and input layout. Add global shader bool "obs_glsl_compile" to make API-specific decisions, i.e. handle upside-down UVs. gl_ortho is not needed for format conversion because the vertex shader does not use ViewProj anymore. This can be applied to more situations, but start small first. Testbed full screen passes, Intel HD Graphics 530: RGBA -> UYVX: 467 -> 439 us, ~6% savings UYVX -> uv: 295 -> 239 us, ~19% savings	2019-06-18 22:29:07 -07:00
Jim	ab70bff4b3	Merge pull request #1913 from jpark37/area-shader-optimization libobs: Area-resampling shader optimizations	2019-06-17 20:40:25 -07:00
Jim	fafda14963	Merge pull request #1906 from jpark37/bgr-three libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support	2019-06-15 16:40:44 -07:00
Jim	dd607b422f	Merge pull request #1881 from jpark37/lowres-fair-sampling libobs: Improve low-resolution bilinear sampling	2019-06-15 16:03:02 -07:00
James Park	9f66b90d99	libobs: Area-resampling shader optimizations Switch for loop to do/while because we know the condition is always true for the first loop. Replace int math with float math to play nicely with more GPUs. Add variables imagesize/targetsize to avoid redundant reciprocals. Intel GPA results: 1166 -> 836 us	2019-06-03 23:11:23 -07:00
James Park	614025742b	libobs: linux-v412: obs-ffmpeg: Add packed BGR3 video support Someone mentioned this format preserves the most quality for a particular capture card using V4L2.	2019-05-30 06:05:53 -07:00
James Park	0c5cb83bf4	libobs: Remove saturate from RGB -> YUV conversion Incoming texture is UNORM, so the value must already be saturated.	2019-05-18 22:10:42 -07:00
James Park	fede4fb784	libobs: Improve low-resolution bilinear sampling The issue with the current bilinear_lowres_scale effect is that it samples adjacent texels, disregarding the texel-to-pixel ratio. If the ratio is large, this can lead to aliasing. This change provides a fair set of texture samples across the entire pixel. The 8-sample pattern used here comes from Direct3D.	2019-05-13 23:54:14 -07:00
James Park	ba21fb947e	libobs: Fix various alpha issues There are cases where alpha is multiplied unnecessarily. This change attempts to use premultiplied alpha blending for composition. To keep this change simple, The filter chain will continue to use straight alpha. Otherwise, every source would need to modified to output premultiplied, and every filter modified for premultiplied input. "DrawAlphaDivide" shader techniques have been added to convert from premultiplied alpha to straight alpha for final output. "DrawMatrix" techniques ignore alpha, so they do not appear to need changing. One remaining issue is that scale effects are set up here to use the same shader logic for both scale filters (straight alpha - incorrectly), and output composition (premultiplied alpha - correctly). A fix could be made to add additional shaders for straight alpha, but the "real" fix may be to eliminate the straight alpha path at some point. For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could combine together to form alpha values greater than one. This is not as noticeable of a problem for UNORM targets because the channels are clamped, but it will likely become a problem in more situations if FLOAT targets are used. This change switches DestBlendAlpha to INVSRCALPHA. The blending behavior of stacked transparents is preserved without overflowing the alpha channel. obs-transitions: Use premultiplied alpha blend, and simplify shaders because both inputs and outputs use premultiplied alpha now. Fixes https://obsproject.com/mantis/view.php?id=1108	2019-05-08 20:26:52 -07:00
James Park	a86710ec5b	libobs: Support limited color range for RGB/Y800 sources libobs: Add support for limited to full color range conversions when using RGB or Y800 formats, and move RGB converison for Y800 formats to the GPU. decklink: Stop hiding color space/range properties for RGB formats, and remove "YUV" from "YUV Color Space" and "YUV Color Range". win-dshow: Remove "YUV" from "YUV Color Space" and "YUV Color Range". UI: Remove "YUV" from "YUV Color Space" and "YUV Color Range".	2019-04-25 15:13:05 -07:00
James Park	f66625bf1e	libobs: Fix shader for GLSL vec4 to vec3 truncation fix.	2019-04-14 14:15:48 -07:00
James Park	69c215345a	libobs: Simplify YUV conversion Currently several shaders need "DrawMatrix" techniques to support the possibility that the input texture is a "YUV" format. Also, "DrawMatrix" is overloaded for translation in both directions when it is written for RGB to "YUV" only. A cleaner solution is to handle "YUV" to RGB up-front as part of format conversion, and ensure only RGB inputs reach the other shaders. This is necessary to someday perform correct scale filtering without the cost of redundant "YUV" conversions per texture tap. A necessary prerequisite for this is to add conversion support for VIDEO_FORMAT_I444, and that is now in place. There was already a hack in place to cover VIDEO_FORMAT_Y800. All other "YUV" formats already have conversion functions. "DrawMatrix" has been removed from shaders that only supported "YUV" to RGB conversions. It still exists in shaders that perform RGB to "YUV" conversions, and the implementations have been sanitized accordingly.	2019-04-11 23:00:03 -07:00
James Park	c4819678c9	libobs: Fix and simplify Area scale filter It appears there's a projection flip that is applied in some situations, like the preview pane in studio mode, and the shader math fails when it's active causing the output color to be zero. This fixes the math for GLSL (with a tiny redundancy penalty to HLSL), and cleans up some unnecessary code along the way. Use abs() to avoid zero area in case the OpenGL projection flip is active. Also simplify the math, and remove the unnecessary sampler state.	2019-04-04 08:39:54 -07:00
James Park	746820e35a	libobs: Fix Area scale filter for GLSL Remove const qualifiers because they are syntax errors for GLSL when used like in C.	2019-03-23 13:16:50 -07:00
James Park	7d811499e0	Add "Area" scale filter This new scale filter computes pixels by weighing the coverage area of source pixels over the target pixel. This algorithm works well for both upsampling and downsampling, but was mainly designed to upscale high-quality low-resolution sources like RGB/HDMI retro consoles. I've heard of people using odd workarounds like scaling up to very high resolutions before scaling back down to preserve pixel shartpness. This algorithm directly addresses this use-case in a much more direct fashion. The Area scale filter does a better job of preserving the thickness of thin features than the Point filter. The Area scale filter does not look at source pixels that lie outside of the target pixel, leading to a much sharper image than Bilinear, Bicubic, and Lanczos filters. This filter should interpolate pixels in linear space, but OBS is not equipped to do that at the moment. libobs: Add GPU effect, and wire up scene serialization. obs-filters: Add Area as an option for scale_filter. UI: Add Area as an option for both scene items, and canvas downscaling.	2019-03-06 20:53:15 -08:00
VodBox	f095cb2d0e	UI: Add scene item canvas overflow to preview	2019-02-08 20:38:53 +13:00
jp9000	28d0cc8b97	libobs: Use NV12 textures when available	2019-02-07 17:00:46 -08:00
Palana	db1da73647	libobs: Fix I420 shader for (width/2)%4 == 2 resolutions For those resolutions the last two chroma samples of every other line would be overwritten by the last chroma samples of the previous line (depending on sampler used), producing artifacts on the left edge of the resulting image (e.g. any color present on the right edge of the image would "bleed" to every other line on the left edge)	2017-09-13 16:39:36 +02:00
jp9000	7c6c7bc4c0	libobs: Add random shader Strangely, to the "Solid" effect file.	2017-05-06 11:29:24 -07:00
jp9000	f9b5da513a	libobs: Fix tex.Load lookup (needs int3, not int2) libobs' shader language is basically HLSL, and tex.Load uses an int3 for 2D textures, with texture mipmap index for the last component. This bug bypassed testing because the front-end automatically switches to OpenGL if D3D11 initialization fails, and when converted to GLSL, works fine because texelFetch only requires two components. This also means there's a bug in GLSL shader conversion code, because it's essentially ignoring the third component when it shouldn't be.	2017-05-06 10:39:42 -07:00
jp9000	e7f754df97	libobs: Use tex.Load for reverse NV12/I420 funcs Eventually, most things should be replaced with Load where applicable (though in some cases sub-pixel sampling is desired). This commit also fixes a bug where NV12 async sources wouldn't render correctly.	2017-05-06 01:24:45 -07:00
Take Vos	ab3531caa9	libobs: Add optional ultrawide -> wide scaling techniques This algorithm reduces scaling distortion on the center of the image when scaling from ultrawide to wide. (Jim: edited effect files to prevent an impact in performance for standard scaling. Now effectively generates an extra pixel shader, and the extra code is only applied to the DrawUndistort technique, while the original Draw technique is unaffected due to the compiler automatically removing unused code branches via the hard-coded boolean value) From jp9000/obs-studio#762	2017-01-30 05:59:17 -08:00
jp9000	84ce1076f1	libobs: Fix field order of retro/linear 2x shaders The field orders of retro 2x and linear 2x deinterlace shaders were inverted. Note that yadif 2x does not act the same in this regard, its field ordering is correct due to how it operates.	2016-04-24 01:21:30 -07:00
jp9000	8a9f1bc7c1	libobs: Fix discard/retro deinterlace equations	2016-04-20 20:13:49 -07:00
jp9000	96d848f3d2	libobs: Add premultiplied alpha base effect	2016-03-26 21:41:49 -07:00
sam8641	a7ce53367c	libobs: Fix lanczos scaling quality issue Closes jp9000/obs-studio#526	2016-03-24 12:35:24 -07:00
jp9000	07c644c581	libobs: Add deinterlacing API functions Adds deinterlacing API functions. Both standard and 2x variants are supported. Deinterlacing is set via obs_source_set_deinterlace_mode and obs_source_set_deinterlace_field_order. This was implemented in to the core itself because deinterlacing should happen before effect filters are processed, but after async filters are processed. If this were added as a filter, there is the possibility that a different filter is processed before deinterlacing, which could mess with the result. It was also a bit easier to implement this way due to the fact that that deinterlacing may need to have access to the previous async frame. Effects were split in to separate files to reduce load time (especially for yadif shaders which take a significant amount of time to compile).	2016-03-21 21:22:32 -07:00
jp9000	9e15e3d8fd	libobs: Remove need for DrawMatrix technique in effects (Note: This commit also modifies obs-filters and text-freetype2) This simplifies writing of effects. DrawMatrix is no longer necessary because there are no sources that require drawing with a color matrix other than async sources, and async sources are automatically processed and don't defer their initial render stage to filters.	2016-03-21 21:22:26 -07:00
jp9000	7bc8dc3471	libobs: Add Planar444 conversion to effect	2015-04-16 22:43:46 -07:00
jp9000	6e572d849f	libobs: Don't use 'output' as a keyword in shader The bilinear lowres scale effect was using 'output' for a variable, which is apparently a reserved keyword in GLSL on macs. This slipped by me due to the fact that this didn't occur with OpenGL on my windows machine.	2015-04-10 09:58:04 -07:00
jp9000	65517ea4cf	libobs: Add low resolution bilinear scale effect This effect preserves detail of images that are scaled below half size by using sampling 9 pixels.	2015-04-10 07:27:24 -07:00
jp9000	9b238ef71e	libobs: Add obs_get_opaque_effect function This returns a common effect useful for rendering an image with the alpha channel overridden to 1.0.	2015-03-22 19:18:04 -07:00
jp9000	2fa37a1f2e	libobs-opengl: Fix render targets being flipped When render targets are used, they output to the render target inverted due to the way that opengl works. This fixes that issue by inverting the projection matrix so that it renders the image upside down and inverting the front face from counterclockwise to clockwise.	2015-03-22 18:38:45 -07:00
Palana	1a53c8ca66	Rename parameters to avoid GLSL keyword conflicts Refer to https://www.opengl.org/registry/doc/GLSLangSpec.4.10.6.clean.pdf for a list of current (reserved) keywords. In the future the shader compiler in libobs-opengl should probably take care of avoiding those name conflicts (bonus points for transparently remapping the names of effect parameters)	2015-01-08 01:42:22 +01:00

1 2

55 Commits (63d66c87e40904331568266b50e33233efce4fed)