This is an alpha quality implementation of a Metal renderer backend for OBS exclusive to Apple Silicon Macs. It supports all default source types, filters, and transitions provided by OBS Studio
-emit-objc-header
compile flag and @cdecl("<FUNCTION NAME>")
decorators are used to expose desired functions to libobs
To manually render contents into a window using Metal one has to use a CAMetalLayer
that is set to be a NSView
's backing layer. This layer can provide a CAMetalDrawable
object which the compositor will use when it renders a new frame of the desktop. This drawable can provide a texture that OBS Studio can render into to generate output like the main preview.
Because Metal is much more integrated with macOS than OpenGL and designed with energy efficiency in mind, a CAMetalLayer
will never provide more drawables than necessary, which means that there can be at most 3 drawables "in flight". If all available drawables are in use (either by OBS Studio to render into or by the compositor to render the desktop output) a request for a new drawable will block until an old drawable expires and a new one has been generated.
This means that if OBS renders at a higher framerate than the operating system's compositor, it will exhaust this budget and OBS Studio's renderer will be stalled and will have to wait until a new drawable is available. This effectively means that OBS Studio's maximum frame rate is limited to the operating system's screen refresh interval.
The current implementation avoids the issue of stalling OBS Studio's video render framerate, at the cost of possible framerate issues with the preview itself. OBS will always render a preview at its own framerate (which can be higher but also lower than the operating system's refresh interval) and callback provided to macOS will be used instead to copy (or "blit") this preview texture into a drawable that is only kept around as short as necessary to finish this copy operation.
This decouples the update of previews from the rendering of their contents, but obviously makes this blit operation dependent on a projector having finished rendering, as otherwise the callback might blit an incomplete preview or multi-view. It is this synchronization that can lead to slow and "choppy" frame rates if the refresh interval of the operating system and the interval at which OBS can finish rendering a preview are too misaligned.
Note: This is a known issue and work on a fix or better implementation of preview rendering is in progress. As the way CAMetalLayer
works is the opposite of the way DXGISwapChain
s work, it requires a lot more resource management and housekeeping in the Metal backend to get right.
Compiled in Release configuration the Metal renderer already has about the same CPU impact and render times as the OpenGL renderer on an M1 Mac even though neither the Swift code nor the Metal code has been optimized in any way. The late generation (and switches) of pipeline states and buffers is a costly operation and the way OBS Studio's renderer operates puts a natural ceiling on the performance improvements the Metal renderer could achieve (as it does lots of small render operations but with a lot of context switching between CPU and GPU).
In Debug mode the performance is a bit worse, but that's in part due to Xcode using the debug variant of the Metal framework, which allows inspection and reflection on all Metal types, including live previews of textures, buffers, debugging of shaders, and more.
Usually one would prefer to upload all data in big batches (preferably into a big MTLHeap
object) and then pick and choose elements for each render pass to limit the switch between CPU and GPU, but this is not compatible with how OBS Studio's renderer works at this moment.
Note: All these observations are based on OBS Studio's own CPU and render time statistics which are flawed as the clock speeds of either CPU and GPU are not taken into account.
Load
calls because libobs
shaders depend on the implicit conversion of a 32-bit float vector to integer values when passed to the texture's load command (read
in MSL)libobs
shaders assume BGRX and only provide a float3
value in their pixel shaders. Transpiled Metal shaders instead return a float4
with a 1.0
alpha valueUInt32
values into a float4
in vertex data provided via the [[stage_in]]
attribute to benefit from vertex fetch (where the pipeline itself is made aware of the buffer layout via a vertex descriptor and thus fetches the data from the buffer as needed) vs. the classic "vertex push" method
libobs
to provide color buffer data - to fix this, the values are unpacked and converted into a float4
when the GPU buffers are created for a vertex buffer