You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've recently run into an issue where the snapshotter runs out of memory when resolving a large layer. The image in question is the pgbench 14 image. Unfortunately, I'm unable to provide the exact image as it is in a private repository.
The issue only arises when prefetch is enabled (noprefetch = false).
The culprit seems to be the 833M layer (digest: sha256:09c35e8efcc4ec49fad18bf1cdf237fad3a683de32e612094b4d896ea3bf98b1).
The working theory is that this is caused by buffers backing the memory cached chunks being put inside a buffer pool after the cache is written to the filesystem
We found that a large number of 4M chunks were being created (~2900) that would be placed inside the buffer pool after being written out to the filesystem.
Our theory seems to be validated when we pass in cache.Direct() to the Cache function in:
As a quick workaround, let's add a configuration (of config.toml) to forcefully enable Direct mode for the cache. Then I'll work on improving the memory consumption of the cache (maybe we need something like size-awaresync.Pool implementation).
FYI: Specifying metadata_store = "db" in config.toml enables storing filesystem metadata on disk. (since v0.10.0)
This can reduce the memory consumption of containerd-stargz-grpc (#415 (comment)).
Hello,
I've recently run into an issue where the snapshotter runs out of memory when resolving a large layer. The image in question is the
pgbench 14
image. Unfortunately, I'm unable to provide the exact image as it is in a private repository./etc/containerd-stargz-grpc/config.toml
Available System Memory: ~16GB
The content of the stargz image is:
The issue only arises when prefetch is enabled (
noprefetch = false
).The culprit seems to be the 833M layer (digest:
sha256:09c35e8efcc4ec49fad18bf1cdf237fad3a683de32e612094b4d896ea3bf98b1
).The working theory is that this is caused by buffers backing the memory cached chunks being put inside a buffer pool after the cache is written to the filesystem
Ref:
stargz-snapshotter/fs/layer/layer.go
Line 191 in 481bc84
We found that a large number of 4M chunks were being created (~2900) that would be placed inside the buffer pool after being written out to the filesystem.
Our theory seems to be validated when we pass in
cache.Direct()
to theCache
function in:stargz-snapshotter/fs/layer/layer.go
Line 437 in a6e2491
Using
cache.Direct()
, memory usage doesn't spike (remains consistent) and cumulative memory usage is reduced by half.Graph showing memory usage with and without memcaching (note: w/ memcaching is only shown until the snapshotter crashes):
I would like to ask the containerd/snapshotter community, what the ideal way to go around fixing this would be? Thanks.
The text was updated successfully, but these errors were encountered: