When using multiple video encoders together with a single audio encoder,
the audio wouldn't be in sync.
The reason why this occurred is because the dts_usec variable of the
encoder packet (which is based on system time) would always be reset to
a value based upon the dts (which is not guaranteed to be based on
system time) in the apply_interleaved_packet_offset function. This
would then in turn cause it to miscalculate the starting audio/video
offsets, which are required to calculate sync.
So instead of calling that function unnecessarily, separate the check
for whether audio/video has been received in to a new function, and only
start applying the interleaved offsets after audio and video have
actually started up and the starting offsets have been calculated.