Report on pipeline and SPIR-V persistent cache implementation #6268
CLV-Iclucia
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I implemented persistent save/load APIs for Vulkan VkPipelineCache to enable disk serialization and reuse across application runs.
In addition, I designed and integrated a separate on-disk cache for compiled SPIR-V binaries, ensuring that redundant shader compilations are avoided and pipeline build efficiency is significantly improved across runs.
This function is still far from complete. Many problems remain to be solved.
Technical Learning
I read the documentation for APIs and guides of
VkPipelineCache
, and I read this blog post to learn about the best practice for storing, loading and validating pipeline cache.I also learned the source code of NCNN to learn the whole process from compiling shaders to building the final compute pipeline, especially about current cache mechanism that computes the key of a pipeline to avoid repeated creation during a single run.
Changes Introduced
int PipelineCache::load_pipeline_cache(const char* path)
: This method will instructPipelineCache
object to load pipeline cache file frompath
. This method returns0
if loading successfully and returns a nonzero value otherwise. If fails,PipelineCache
will try to use emptyVkPipelineCache
object to create pipelines.int PipelineCache::save_pipeline_cache(const char* path)
: This method will instructPipelineCache
object to saveVkPipelineCache
object as a file topath
. This method returns0
if saving the file successfully and returns a nonzero value otherwise.void PipelineCache::set_shader_cache_dir(const char* dir)
: This method will set the SPIR-V code cache directory used byPipelineCache
object todir
. All the SPIR-V code produced during creation will be saved underdir
. When compiling shaders,PipelineCache
will first try to look for file cache in the cache directory to skip compilation. If not specified, the default cache directory will be$LOCALAPPDATA/ncnn/shadercache
on Windows and$HOME/.ncnn/shadercache
on other platforms. Returns nonzero value if failing.int PipelineCache::clear_shader_cache() const
: This method will clear the current SPIR-V code cache directory. Returns nonzero value if failing.VulkanDevice::create_pipeline
: add an argument of typeVkPipelineCache*
to enable creatingVkPipeline
usingVkPipelineCache
.int VulkanDevice::create_empty_pipeline_cache(VkPipelineCache* vk_pipeline_cache)
: creates aVkPipelineCache
object with empty data. Returns nonzero value if failing.int VulkanDevice::create_pipeline_cache_with_data(const void* initial_data, size_t data_size, VkPipelineCache* vk_pipeline_cache)
: creates aVkPipelineCache
object with initial data starting frominitial_data
withdata_size
bytes. Returns nonzero value if failing.test_pipeline_cache
: this is a simple test for testing the functionality of pipeline cache.Implementaion details:
I use
vkGetPipelineCacheData
to get the pipeline cache data binary and combine it with a file header for validation. The header format isThis design basically follows the practice in this blog post but adds
version
andreserved
fields for possible future compatibility.The design of spirv cache file is also like:
The header for this is:
The design for this will be explained later.
When compiling a shader code,
PipelineCache
will first compute a key using multiple options. It will first use the key to search internal cache(usingstd::map
). If this fails, it will use the decimal string of the key as file name to search for cache file in the cache directory. If succeed it will load the code and cache it in the internal cache.Usage examples
Problems and solutions
1. Bottleneck of pipeline creation
I found that simply using
VkPipelineCache
can acceleratevkCreatePipeline
greatly, but the cost of this step seems unsignificant compared with shader compilation.So I have to implement shader SPIR-V cache to accelerate this process.
2. Cross-platform file operations
The project currently uses C++11, which does not provide a unified API for filesystem operations such as renaming or removing files. As a result, platform-specific implementations were required inside
pipelinecache.cpp
.For now, I implemented platform-specific handling directly in
pipelinecache.cpp
for minimal changes. A possible future improvement would be to abstract these into a dedicated cross-platform file utility module (similar to how some projects adopt afilesystem.h
wrapper).3. SPIR-V cache invalidation strategy
The content of compiled SPIR-V binaries can change due to multiple factors:
If these are not accounted for, stale SPIR-V caches may cause incorrect or incompatible pipelines.
I introduced an
ncnn_version
field in the SPIR-V cache header. This field should be updated whenever relevant changes are introduced (e.g., new GLSLang versions or internal NCNN changes affecting shader compilation). On load, the version is validated, and outdated caches are discarded. But I believe this is not the best practice. Perhaps updating this field automatically in the building system is better.4. Testing and API exposure
There is a tradeoff between providing flexible testing APIs for SPIR-V cache and keeping the public API surface minimal. Exposing too many low-level cache file operations complicates the API, while hiding them makes unit testing difficult.
The only thing I can do is use two hashes but that is far from enough.
5. Security of SPIR-V cache files
If a SPIR-V cache file is maliciously altered, and both the content and the hash are manipulated, the cache may load compromised shaders and use them to create false pipelines.
The only thing I can do is use multiple hash codes but that is far from enough.
Performance
In a single pipeline creation test, the time taken for creating a pipeline is reduced from 90ms to 0.4ms using the two caches across runs (mocked by creating and destroying GPU repeatedly) on my PC and this is mainly contributed by spirv code cache.
The CPU is AMD Ryzen 7 5800H and the GPU is Nvidia RTX 3060.
The main contribution comes from SPIR-V cache.
Beta Was this translation helpful? Give feedback.
All reactions