RenderingServerDefault::_draw wants to get all of the materials. The threaded
loader is still loading the materials:
1) The main thread locks canvas_singleton->shader.mutex via
RendererRD::Utilities::update_dirty_resources
-> RendererRD::MaterialStorage::_update_queued_materials
-> RendererCanvasRenderRD::CanvasMaterialData::update_parameters()
2) RendererCanvasRenderRD::CanvasMaterialData::update_parameters() wants
to get the current shader via ShaderRD::version_get_shader. This
eventually wants to wait on the load of that shader via
WorkerThreadPool::wait_for_group_task_completion in
ShaderRD::_compile_version_end
At this point the main thread is waiting for the threaded load to finish
loading the shader. Meanwhile in a thread not too far away the load is
progressing:
1) The loader is loading the tres in ResourceLoaderText::load which
eventually wants to ShaderMaterial::set_shader
2) We eventually end up in RendererCanvasRenderRD::CanvasShaderData::set_code
3) set_code wants to lock canvas_singleton->shader.mutex to protect
against concurrent compilation
At this point the main thread is waiting on the load to complete, but
holds "canvas_singleton->shader.mutex" via update_parameters(), and the
loader is waiting on "canvas_singleton->shader.mutex" in set_code.
Tragedy ensues etc.
To fix the problem we don't acquire the lock until after we're sure
we're done compiling the shader.
The same pattern exists in scene_shader_forward_clustered.cpp which
was created in the same commit as the code in the canvas singleton.
With this change the deadlock can no longer occur as set_code() can now run
concurrently with RenderingServerDefault::_draw() as the path where the
main thread waits on the workerthreadpool with
canvas_singleton->shader.mutex held can no longer occur.
Data integrity wise: previously you'd think that
canvas_singleton->shader.mutex would have protected against metadata
changes to uniforms, ubo_offsets, and texture_uniforms but set_code() used
to write to some of these (uniforms, ubo_size) without taking the lock with
no ill effects because the material isn't updated until the shader has loaded.
A similar pattern was found in scene_shader_forward_mobile.cpp and was
fixed in the same way.
On my machine, this reduces incremental compilation time after an edit of
`rendering_server.h` by 1s, and paves the way for more decoupling in
rendering code.
A number of headers in the codebase included `rendering_server.h` just for
some enum definitions. This means that any change to `rendering_server.h` or
one of its dependencies would trigger a massive incremental rebuild.
With this change, we decouple a number of classes from `rendering_server.h`,
greatly speeding up incremental rebuilds for that area.
On my machine, this reduces incremental compilation time after an edit of
`rendering_server.h` by 60s (from 2m57s).
This work is a heavily refactored and rewritten from TheForge's initial
code.
TheForge's original code had too many race conditions and was
fundamentally flawed as it was too easy to incur into those data races
by accident.
However they identified the proper places that needed changes, and the
idea was sound. I used their work as a blueprint to design this work.
This PR implements:
- Introduction of UMA buffers used by a few buffers
(most notably the ones filled by _fill_instance_data).
Ironically this change seems to positively affect PC more than it does
on Mobile.
Updates D3D12 Memory Allocator to get GPU_UPLOAD heap support.
Metal implementation by Stuart Carnie.
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: TheForge team