libobs: Redesign/optimize frame encoding handling

Previously, the design for the interaction between the encoder thread
and the graphics thread was that the encoder thread would signal to the
graphics thread when to start drawing each frame.  The original idea
behind this was to prevent mutually cascading stalls of encoding or
graphics rendering (i.e., if rendering took too long, then encoding
would have to catch up, then rendering would have to catch up again, and
so on, cascading upon each other).  The ultimate goal was to prevent
encoding from impacting graphics and vise versa.

However, eventually it was realized that there were some fundamental
flaws with this design.

1. Stray frame duplication.  You could not guarantee that a frame would
   render on time, so sometimes frames would unintentionally be lost if
   there was any sort of minor hiccup or if the thread took too long to
   be scheduled I'm guessing.

2. Frame timing in the rendering thread was less accurate.  The only
   place where frame timing was accurate was in the encoder thread, and
   the graphics thread was at the whim of thread scheduling.  On higher
   end computers it was typically fine, but it was just generally not
   guaranteed that a frame would be rendered when it was supposed to be
   rendered.

So the solution (originally proposed by r1ch and paibox) is to instead
keep the encoding and graphics threads separate as usual, but instead of
the encoder thread controlling the graphics thread, the graphics thread
now controls the encoder thread.  The encoder thread keeps a limited
cache of frames, then the graphics thread copies frames in to the cache
and increments a semaphore to schedule the encoder thread to encode that
data.

In the cache, each frame has an encode counter.  If the frame cache is
full (e.g., the encoder taking too long to return frames), it will not
cache a new frame, but instead will just increment the counter on the
last frame in the cache to schedule that frame to encode again, ensuring
that frames are on time and reducing CPU usage by lowering video
complexity.  If the graphics thread takes too long to render a frame,
then it will add that frame with the count value set to the total amount
of frames that were missed (actual legitimately duplicated frames).

Because the cache gives many frames of breathing room for the encoder to
encode frames, this design helps improve results especially when using
encoding presets that have higher complexity and CPU usage, minimizing
the risk of needlessly skipped or duplicated frames.

I also managed to sneak in what should be a bit of an optimization to
reduce copying of frame data, though how much of an optimization it
ultimately ends up being is debatable.

So to sum it up, this commit increases accuracy of frame timing,
completely removes stray frame duplication, gives better results for
higher complexity encoding presets, and potentially optimizes the frame
pipeline a tiny bit.
This commit is contained in:
jp9000
2014-12-31 01:53:13 -08:00
parent 11dd7912ce
commit 11106c2fce
5 changed files with 216 additions and 160 deletions

View File

@@ -23,6 +23,8 @@
extern "C" {
#endif
struct video_frame;
/* Base video output component. Use this to create a video output track. */
struct video_output;
@@ -72,6 +74,7 @@ struct video_output_info {
uint32_t fps_den;
uint32_t width;
uint32_t height;
size_t cache_size;
enum video_colorspace colorspace;
enum video_range_type range;
@@ -137,11 +140,12 @@ EXPORT bool video_output_active(const video_t *video);
EXPORT const struct video_output_info *video_output_get_info(
const video_t *video);
EXPORT void video_output_swap_frame(video_t *video, struct video_data *frame);
EXPORT bool video_output_wait(video_t *video);
EXPORT bool video_output_lock_frame(video_t *video, struct video_frame *frame,
int count, uint64_t timestamp);
EXPORT void video_output_unlock_frame(video_t *video);
EXPORT uint64_t video_output_get_frame_time(const video_t *video);
EXPORT uint64_t video_output_get_time(const video_t *video);
EXPORT void video_output_stop(video_t *video);
EXPORT bool video_output_stopped(video_t *video);
EXPORT enum video_format video_output_get_format(const video_t *video);
EXPORT uint32_t video_output_get_width(const video_t *video);