This reverts commit 4da73445c3.
This is being reverted because apparently it causes flickering displays
for some people. Bad drivers or something? Not sure. Very annoying.
Complex external systems using the D3D11 device may need to perform
their own device loss handling, the upcoming Windows Graphics Capture
support for example.
Because this did not have WINAPI (stdcall) specified as the calling
convention on the gdi32 export, caused a crash due to stack corruption
on the 32bit version of OBS.
We really shouldn't be resetting duplicator state as part of gs_flush.
gs_begin_scene is not ideal because it is called twice per frame, and
only after duplicators have been ticked. Even though it makes no
user-facing difference, it makes more logical sense to reset at the top
of the frame than the bottom.
Not in love with STL, but lets at least use the semantically-correct
collection. It's also a shame this is a global variable with gross
pre-main allocations, but attaching it to the device instance would
break the interface.
(This commit also modifies the UI)
This solves the issue where OBS would be deprioritized by Windows over
fullscreen games, causing OBS to lag out whereas the games would still
run fine.
Feature Level 9.3 appears to never have actually worked because shaders
are compiled as straight 4_0 instead of 4_0_level_9_3. That being the
case, baseline against 10_0 instead.
NV12 GPU copies to staging textures for CPU read take a ridiculously
long time on my integrated Intel GPU. Using R8/R8G8 instead seems to be
a huge speed-up.
Intel HD Graphics 530, D3D11 query timings, SetStablePowerState
NV12: ~3268 us (minimum of wild timings)
R8/R8G8: ~781 us (most frequently occurring timing)
This change only wraps the functionality. I have rough code to exercise
the the query functionality, but that part is not really clean enough to
submit.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings