libobs: Start audio tracks before starting video tracks
When using multi-track audio, encoders cannot be paired like they can
when only using a single audio track with video, so it has to choose the
best point in the interleaved buffer as the "starting point", and if the
encoders start up at different times, it has to prune that data and wait
to start the output on the next video keyframe. When the audio encoders
started up, there was the case where the encoders would take some time
to load, and it would cause the pruning code to wait for the next
keyframe to ensure startup syncing.
Starting the audio encoders before starting the video encoder should
reduce the possibility of that happening in a multi-track scenario.