The normal scaling methods cannot sample enough pixels to create an
accurate output image when the output size is under half the base size,
so use the bilinear low resolution scaling effect in that case instead
to ensure a more accurate low resolution image.
Direct3D textures are usually aligned to a specific pitch, so their
internal width is often not equal to the expected output width; this
means that if we want to use it on our texture output, that we must
de-align the texture while copying the texture data.
However, I unintentionally messed up the calculation at some point with
RGBA textures, so the variable size I was supposed to be using was
supposed to be multiplied by 4 (for RGBA), while I was still expecting
single channel data. So, if the texture width was something like 1332,
the source (directx) texture line size would be somewhere at or above
5328 (because it's RGBA), then destination is at 1332 (YUV luma plane),
and it would unintentionally treat 3996 (or 5328 - 1332) bytes as the
unused alignment data. So this fixes that miscalculation.
The return value of os_sleepto_ns is true if it waited to the specified
time, and false if the current time is past the specified time. So it
basically returns true if it successfully waited.
I just didn't check the return value properly here, so it ended up just
setting the count of frames to 1 if overshot, ultimately causing sync
issues.
Previously, the design for the interaction between the encoder thread
and the graphics thread was that the encoder thread would signal to the
graphics thread when to start drawing each frame. The original idea
behind this was to prevent mutually cascading stalls of encoding or
graphics rendering (i.e., if rendering took too long, then encoding
would have to catch up, then rendering would have to catch up again, and
so on, cascading upon each other). The ultimate goal was to prevent
encoding from impacting graphics and vise versa.
However, eventually it was realized that there were some fundamental
flaws with this design.
1. Stray frame duplication. You could not guarantee that a frame would
render on time, so sometimes frames would unintentionally be lost if
there was any sort of minor hiccup or if the thread took too long to
be scheduled I'm guessing.
2. Frame timing in the rendering thread was less accurate. The only
place where frame timing was accurate was in the encoder thread, and
the graphics thread was at the whim of thread scheduling. On higher
end computers it was typically fine, but it was just generally not
guaranteed that a frame would be rendered when it was supposed to be
rendered.
So the solution (originally proposed by r1ch and paibox) is to instead
keep the encoding and graphics threads separate as usual, but instead of
the encoder thread controlling the graphics thread, the graphics thread
now controls the encoder thread. The encoder thread keeps a limited
cache of frames, then the graphics thread copies frames in to the cache
and increments a semaphore to schedule the encoder thread to encode that
data.
In the cache, each frame has an encode counter. If the frame cache is
full (e.g., the encoder taking too long to return frames), it will not
cache a new frame, but instead will just increment the counter on the
last frame in the cache to schedule that frame to encode again, ensuring
that frames are on time and reducing CPU usage by lowering video
complexity. If the graphics thread takes too long to render a frame,
then it will add that frame with the count value set to the total amount
of frames that were missed (actual legitimately duplicated frames).
Because the cache gives many frames of breathing room for the encoder to
encode frames, this design helps improve results especially when using
encoding presets that have higher complexity and CPU usage, minimizing
the risk of needlessly skipped or duplicated frames.
I also managed to sneak in what should be a bit of an optimization to
reduce copying of frame data, though how much of an optimization it
ultimately ends up being is debatable.
So to sum it up, this commit increases accuracy of frame timing,
completely removes stray frame duplication, gives better results for
higher complexity encoding presets, and potentially optimizes the frame
pipeline a tiny bit.
This changes the way source volume handles transitioning between being
active and inactive states.
The previous way that transitioning handled volume was that it set the
presentation volume of the source and all of its sub-sources to 0.0 if
the source was inactive, and 1.0 if active. Transition sources would
then also set the presentation volume for sub-sources to whatever their
transitioning volume was. However, the problem with this is that the
design didn't take in to account if the source or its sub-sources were
active anywhere else, so because of that it would break if that ever
happened, and I didn't realize that when I was designing it.
So instead, this completely overhauls the design of handling
transitioning volume. Each frame, it'll go through all sources and
check whether they're active or inactive and set the base volume
accordingly. If transitions are currently active, it will actually walk
the active source tree and check whether the source is in a
transitioning state somewhere.
- If the source is a sub-source of a transition, and it's not active
outside of the transition, then the transition will control the
volume of the source.
- If the source is a sub-source of a transition, but it's also active
outside of the transition, it'll defer to whichever is louder.
This also adds a new callback to the obs_source_info structure for
transition sources, get_transition_volume, which is called to get the
transitioning volume of a sub-source.
This adds bicubic and lanczos scaling capability to libobs to improve
scaling quality and sharpness when the output resolution has to be
scaled relative to the base resolution. Bilinear is also available,
although bilinear has rather poor quality and causes scaling to appear
blurry.
If the output resolution is close to the base resolution, then bilinear
is used instead as an optimization, as there's no need to use these
shaders if scaling is not in use.
The Bicubic and Lanczos effects are also exposed via exported function
to allow the ability to use those shaders in plugin modules if desired.
The API change adds a variable 'scale_type' to the obs_video_info
structure that allows the user interface to choose what type of scaling
filter should be used.
This was an important change because we were originally using an
hard-coded 709/partial range color matrix for the output, which was
causing problems for people wanting to use different formats or color
spaces. This will now automatically generate the color matrix depending
on the format, color space, and range, or use an identity matrix if the
video format is RGB instead of YUV.
On certain GPUs, if you don't flush and the window is minimized it can
endlessly accumulate memory due to what I'm assuming are driver design
flaws (though I can't know for sure). The flush seems to prevent this
from happening, at least from my tests. It would be nice if this
weren't necessary.
At the start of each render loop, it would get the timestamp, and then
it would then assign that timestamp to whatever frame was downloaded.
However, the frame that was downloaded was usually occurred a number of
frames ago, so it would assign the wrong timestamp value to that frame.
This fixes that issue by storing the timestamps in a circular buffer.
Typedef pointers are unsafe. If you do:
typedef struct bla *bla_t;
then you cannot use it as a constant, such as: const bla_t, because
that constant will be to the pointer itself rather than to the
underlying data. I admit this was a fundamental mistake that must
be corrected.
All typedefs that were pointer types will now have their pointers
removed from the type itself, and the pointers will be used when they
are actually used as variables/parameters/returns instead.
This does not break ABI though, which is pretty nice.
API Removed:
- graphics_t obs_graphics();
Replaced With:
- void obs_enter_graphics();
- void obs_leave_graphics();
Description:
obs_graphics() was somewhat of a pointless function. The only time
that it was ever necessary was to pass it as a parameter to
gs_entercontext() followed by a subsequent gs_leavecontext() call after
that. So, I felt that it made a bit more sense just to implement
obs_enter_graphics() and obs_leave_graphics() functions to do the exact
same thing without having to repeat that code. There's really no need
to ever "hold" the graphics pointer, though I suppose that could change
in the future so having a similar function come back isn't out of the
question.
Still, this at least reduces the amount of unnecessary repeated code for
the time being.
Similar to the shader functions, the effect parameter functions take
the effect as a parameter. However, the effect parameter is pretty
pointless, because the effect parameter.. parameter stores the effect
pointer interally.
Add API for streaming services. The services API simplifies the
creation of custom service features and user interface.
Custom streaming services later on will be able to do things such as:
- Be able to use service-specific APIs via modules, allowing a more
direct means of communicating with the service and requesting or
setting service-specific information
- Get URL/stream key via other means of authentication such as OAuth,
or be able to build custom URLs for services that require that sort
of thing.
- Query information (such as viewer count, chat, follower
notifications, and other information)
- Set channel information (such as current game, current channel title,
activating commercials)
Also, I reduce some repeated code that was used for all libobs objects.
This includes the name of the object, the private data, settings, as
well as the signal and procedure handlers.
I also switched to using linked lists for the global object lists,
rather than using an array of pointers (you could say it was..
pointless.) ..Anyway, the linked list info is also stored in the shared
context data structure.
Modify the obs_display API so that it always uses an orthographic
projection that is the size of the display, rather than OBS' base size.
Having it do an orthographic projection to OBS' base size was silly
because it meant that everything would be skewed if you wanted to draw
1:1 in the display. This deoes mean that the callbacks must handle
resizing the images, but it's worth it to ensure 1:1 draw sizes.
As for the preview widget, instead of making some funky widget within
widget that resizes, it's just going to be a widget within the entire
top layout. Also changed the preview padding color to gray.
LOG_ERROR should be used in places where though recoverable (or at least
something that can be handled safely), was unexpected, and may affect
the user/application.
LOG_WARNING should be used in places where it's not entirely unexpected,
is recoverable, and doesn't really affect the user/application.
- Add CoreAudio device input capture for mac audio capturing. The code
should cover just about everything for capturing mac input device
audio. Because of the way mac audio is designed, users may have no
choice but to obtain the open source soundflower software to capture
their mac's desktop audio. It may be necessary for us to distribute
it with the program as well.
- Hide event backend
- Use win32 events for windows
- Allow timed waits for events
- Fix a few warnings
Add a scaler interface (defaults to swscale), and if a separate output
wants to use a different scale or format than the default output format,
allow a scaler instance to be created automatically for that output,
which will then receive the new scaled output.
- Changed glMapBuffer to glMapBufferRange to allow invalidation. Using
just glMapBuffer alone was causing some unacceptable stalls.
- Changed dynamic buffers from GL_DYNAMIC_WRITE to GL_STREAM_WRITE
because I had misunderstood the OpenGL specification
- Added _OPENGL and _D3D11 builtin preprocessor macros to effects to
allow special processing if needed
- Added fmod support to shaders (NOTE: D3D and GL do not function
identically with negative numbers when using this. Positive numbers
however function identically)
- Created a planar conversion shader that converts from packed YUV to
planar 420 right on the GPU without any CPU processing. Reduces
required GPU download size to approximately 37.5% of its normal rate
as well. GPU usage down by 10 entire percentage points despite the
extra required pass.
There were a *lot* of warnings, managed to remove most of them.
Also, put warning flags before C_FLAGS and CXX_FLAGS, rather than after,
as -Wall -Wextra was overwriting flags that came before it.
Originally, the rendering system was designed to only display sources
and such, but I realized there would be a flaw; if you wanted to render
the main viewport in a custom way, or maybe even the entire application
as a graphics-based front end, you wouldn't have been able to do that.
Displays have now been separated in to viewports and displays. A
viewport is used to store and draw sources, a display is used to handle
draw callbacks. You can even use displays without using viewports to
draw custom render displays containing graphics calls if you wish, but
usually they would be used in combination with source viewports at
least.
This requires a tiny bit more work to create simple source displays, but
in the end its worth it for the added flexibility and options it brings.
The API used to be designed in such a way to where it would expect
exports for each individual source/output/encoder/etc. You would export
functions for each and it would automatically load those functions based
on a specific naming scheme from the module.
The idea behind this was that I wanted to limit the usage of structures
in the API so only functions could be used. It was an interesting idea
in theory, but this idea turned out to be flawed in a number of ways:
1.) Requiring exports to create sources/outputs/encoders/etc meant that
you could not create them by any other means, which meant that
things like faruton's .net plugin would become difficult.
2.) Export function declarations could not be checked, therefore if you
created a function with the wrong parameters and parameter types,
the compiler wouldn't know how to check for that.
3.) Required overly complex load functions in libobs just to handle it.
It makes much more sense to just have a load function that you call
manually. Complexity is the bane of all good programs.
4.) It required that you have functions of specific names, which looked
and felt somewhat unsightly.
So, to fix these issues, I replaced it with a more commonly used API
scheme, seen commonly in places like kernels and typical C libraries
with abstraction. You simply create a structure that contains the
callback definitions, and you pass it to a function to register that
definition (such as obs_register_source), which you call in the
obs_module_load of the module.
It will also automatically check the structure size and ensure that it
only loads the required values if the structure happened to add new
values in an API change.
The "main" source file for each module must include obs-module.h, and
must use OBS_DECLARE_MODULE() within that source file.
Also, started writing some doxygen documentation in to the main library
headers. Will add more detailed documentation as I go.
- Fill in the rest of the FFmpeg test output code for testing so it
actually properly outputs data.
- Improve the main video subsystem to be a bit more optimal and
automatically output I420 or NV12 if needed.
- Fix audio subsystem insertation and byte calculation. Now it will
seamlessly insert new audio data in to the audio stream based upon
its timestamp value. (Be extremely cautious when using floating
point calculations for important things like this, and always round
your values and check your values)
- Use 32 byte alignment in case of future optimizations and export a
function to get the current alignment.
- Make os_sleepto_ns return true if slept, false if the time has
already been passed before the call.
- Fix sinewave output so that it actually properly calculates a middle
C sinewave.
- Change the use of row_bytes to linesize (also makes it a bit more
consistent with FFmpeg's naming as well)
- Add planar audio support. FFmpeg and libav use planar audio for many
encoders, so it was somewhat necessary to add support in libobs
itself.
- Improve/adjust FFmpeg test output plugin. The exports were somewhat
messed up (making me rethink how exports should be done). Not yet
functional; it handles video properly, but it still does not handle
audio properly.
- Improve planar video code. The planar video code was not properly
accounting for row sizes for each plane. Specifying row sizes for
each plane has now been added. This will also make it more compatible
with FFmpeg/libav.
- Fixed a bug where callbacks wouldn't create properly in audio-io and
video-io code.
- Implement 'blogva' function to allow for va_list usage with libobs
logging.
- Implement texture scaling/conversion/downloading for the main view so
we can finally start getting data to output.
Also, redesign how it works a bit, it will now properly wait one full
frame for each step in the process: rendering the main texture,
scaling the main texture to an output texture, staging/downloading the
ouput texture, and then outputting that staged data. This way, the
GPU will have more than enough time to fully complete each step.
- Fix a bug with OpenGL plugin's texture staging function. Was using
glBindBuffer instead of what should have been used: glBindTexture.
- Change the naming scheme of the variables in default.effect. It's now
named with the idea of just "color matrix" in mind instead of "yuv
matrix", and instead of DrawRGBToYUV, it's now just DrawMatrix.
- I seem to have fixed ths issues with the main preview widget. It
seems you just need to set the right window attributes to stop it from
breaking. Though when opengl is enabled, there appears to be a weird
background glitch in the Qt stuff -- I'm not entirely sure what's
going on. Bug in Qt?
Also fixed the layout issues, and the widget now properly resizes and
centers in to its parent widget.
- Prevent the render loop from accessing data if the data isn't valid.
Because obs->data is freed before the graphics stuff, it can cause
the graphics to keep trying to query the obs->data.displays_mutex
after it had already been destroyed.
Scenes will now signal via their source when an item has been added
or removed from them.
"add" - Item added to the scene.
Parameters: "scene": Scene that the item was added to.
"item": Item that was added.
"remove" - Item removed from the scene.
Parameters: "scene": Scene that the item was removed from.
"item": Item that was removed.