libobs: Redesign/optimize frame encoding handling

Previously, the design for the interaction between the encoder thread and the graphics thread was that the encoder thread would signal to the graphics thread when to start drawing each frame. The original idea behind this was to prevent mutually cascading stalls of encoding or graphics rendering (i.e., if rendering took too long, then encoding would have to catch up, then rendering would have to catch up again, and so on, cascading upon each other). The ultimate goal was to prevent encoding from impacting graphics and vise versa. However, eventually it was realized that there were some fundamental flaws with this design. 1. Stray frame duplication. You could not guarantee that a frame would render on time, so sometimes frames would unintentionally be lost if there was any sort of minor hiccup or if the thread took too long to be scheduled I'm guessing. 2. Frame timing in the rendering thread was less accurate. The only place where frame timing was accurate was in the encoder thread, and the graphics thread was at the whim of thread scheduling. On higher end computers it was typically fine, but it was just generally not guaranteed that a frame would be rendered when it was supposed to be rendered. So the solution (originally proposed by r1ch and paibox) is to instead keep the encoding and graphics threads separate as usual, but instead of the encoder thread controlling the graphics thread, the graphics thread now controls the encoder thread. The encoder thread keeps a limited cache of frames, then the graphics thread copies frames in to the cache and increments a semaphore to schedule the encoder thread to encode that data. In the cache, each frame has an encode counter. If the frame cache is full (e.g., the encoder taking too long to return frames), it will not cache a new frame, but instead will just increment the counter on the last frame in the cache to schedule that frame to encode again, ensuring that frames are on time and reducing CPU usage by lowering video complexity. If the graphics thread takes too long to render a frame, then it will add that frame with the count value set to the total amount of frames that were missed (actual legitimately duplicated frames). Because the cache gives many frames of breathing room for the encoder to encode frames, this design helps improve results especially when using encoding presets that have higher complexity and CPU usage, minimizing the risk of needlessly skipped or duplicated frames. I also managed to sneak in what should be a bit of an optimization to reduce copying of frame data, though how much of an optimization it ultimately ends up being is debatable. So to sum it up, this commit increases accuracy of frame timing, completely removes stray frame duplication, gives better results for higher complexity encoding presets, and potentially optimizes the frame pipeline a tiny bit.
2014-12-31 01:53:13 -08:00
parent 11dd7912ce
commit 11106c2fce
5 changed files with 216 additions and 160 deletions
--- a/libobs/obs.c
+++ b/libobs/obs.c
@@ -51,6 +51,7 @@ static inline void make_video_info(struct video_output_info *vi,
 	vi->height  = ovi->output_height;
 	vi->range   = ovi->range;
 	vi->colorspace = ovi->colorspace;
+	vi->cache_size = 6;
 }

 #define PIXEL_SIZE 4
@@ -163,7 +164,6 @@ static bool obs_init_gpu_conversion(struct obs_video_info *ovi)
 static bool obs_init_textures(struct obs_video_info *ovi)
 {
 	struct obs_core_video *video = &obs->video;
-	bool yuv = format_is_yuv(ovi->output_format);
 	uint32_t output_height = video->gpu_conversion ?
 		video->conversion_height : ovi->output_height;
 	size_t i;
@@ -188,11 +188,6 @@ static bool obs_init_textures(struct obs_video_info *ovi)

 		if (!video->output_textures[i])
 			return false;
-
-		if (yuv)
-			obs_source_frame_init(&video->convert_frames[i],
-					ovi->output_format,
-					ovi->output_width,ovi->output_height);
 	}

 	return true;
@@ -383,7 +378,6 @@ static void obs_free_video(void)
 			gs_texture_destroy(video->render_textures[i]);
 			gs_texture_destroy(video->convert_textures[i]);
 			gs_texture_destroy(video->output_textures[i]);
-			obs_source_frame_free(&video->convert_frames[i]);

 			video->copy_surfaces[i]    = NULL;
 			video->render_textures[i]  = NULL;
@@ -393,7 +387,7 @@ static void obs_free_video(void)

 		gs_leave_context();

-		circlebuf_free(&video->timestamp_buffer);
+		circlebuf_free(&video->vframe_info_buffer);

 		memset(&video->textures_rendered, 0,
 				sizeof(video->textures_rendered));