[Discussion] Linux perf tool profiling with stargz-snapshotter #349

sumeetb25 · 2021-06-30T16:22:28Z

Hi All! I've been doing benchmarks to compare overall container execution times w/ and w/o estargz images. The benchmarking strategy is similar to https://github.com/Tintri/hello-bench/blob/master/hello.py. For https://hub.docker.com/_/rabbitmq image the overall execution time is 9.921 seconds w/o estargz and 12.7 seconds w/ estargz. (The perf numbers w/ estargz look similar for prefetch enabled + background_fetch enabled and prefetch enabled + background_fetch disabled.)

To dig further into this regression I was hoping to use linux perf tool. perf tool requires the programs to be built by enabling frame pointers. Reading golang/go#15840 it seems like golang enables frame pointers by default.

I tried to record perf trace by executing perf record -ag -F 99 -p $(pidof containerd-stargz-grpc). Unfortunately the perf report spits out hex addresses and not human readable stack trace.

63.54%     0.00%  containerd-star  containerd-stargz-grpc  [.] 0x00000000002eb338
            |
            ---0x6eb338
               |          
               |--47.49%--0x6eb616
               |          |          
               |          |--27.68%--0x6df2ff
               |          |          0x703585
               |          |          |          
               |          |          |--11.31%--0x655c49
               |          |          |          0x654008
               |          |          |          |          
               |          |          |          |--7.73%--0xe6c94
               |          |          |          |          0xddddf
               |          |          |          |          0x87f4a
               |          |          |          |          entry_SYSCALL_64
               |          |          |          |          do_syscall_64
               |          |          |          |          ksys_pread64
               |          |          |          |          vfs_read
               |          |          |          |          new_sync_read
               |          |          |          |          xfs_file_read_iter
               |          |          |          |          xfs_file_buffered_aio_read
               |          |          |          |          generic_file_read_iter

This problem only seems to exist for profiling stargz-snapshotter. I tried profiling containerd and it prints out human readable stack trace. I'm reaching out for help on how can I go about getting human readable stack traces for stargz snapshotter.

(I'm new to golang and my apologies for such a naive question).

The text was updated successfully, but these errors were encountered:

ktock · 2021-06-30T17:38:24Z

We strip debug information from the binary by default. Does removing - and -w from the following Makefile solve it?

stargz-snapshotter/Makefile

Line 24 in 7f024d4

    
           GO_LD_FLAGS=-ldflags '-s -w -X $(PKG)/version.Version=$(VERSION) -X $(PKG)/version.Revision=$(REVISION) $(GO_EXTRA_LDFLAGS)'

sumeetb25 · 2021-06-30T18:00:05Z

We strip debug information from the binary by default. Does removing - and -w from the following Makefile solve it?

stargz-snapshotter/Makefile

Line 24 in 7f024d4

GO_LD_FLAGS=-ldflags '-s -w -X $(PKG)/version.Version=$(VERSION) -X $(PKG)/version.Revision=$(REVISION) $(GO_EXTRA_LDFLAGS)'

Removing -s and -w did the trick. Thank you!

sumeetb25 · 2021-06-30T18:01:06Z

I'll keep the issue open until I am investigating the regression for rabbitmq image.

sumeetb25 · 2021-07-02T01:20:17Z

The overall execution time for OCI rabbit mq image (pulled from dockerhub) is 9.921 seconds and it jumps to 12.7 seconds on an optimized estargz image. While investigating this regression I observed that launches with estargz images consistently show a 2.5 second stall between two events.

+ echo '  Starting broker...2021-07-01 22:45:58.991 [info] <0.272.0> '
+ echo '2021-07-01 22:45:59.265 [debug] <0.283.0> Lager installed handler lager_backend_throttle into lager_event^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags: list of feature flags found:^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] drop_unroutable_metric^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] empty_basic_get_metric^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] implicit_default_bindings^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] maintenance_mode_status^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] quorum_queue^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] user_limits^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags:   [ ] virtual_host_metadata^M'
+ echo '2021-07-01 22:45:59.265 [info] <0.272.0> Feature flags: feature flag states written to disk: yes^M'
+ echo '2021-07-01 22:46:01.857 [info] <0.272.0> Running boot step pre_boot defined by app rabbit^M'
+ echo '2021-07-01 22:46:01.857 [info] <0.272.0> Running boot step rabbit_core_metrics defined by app rabbit^M'

Launching containers with vanilla OCI image this stall is consistently around ~1.1 seconds.

+ echo '  Starting broker...2021-07-01 13:54:32.617 [info] <0.272.0> '
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags: list of feature flags found:^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] drop_unroutable_metric^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] empty_basic_get_metric^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] implicit_default_bindings^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] maintenance_mode_status^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] quorum_queue^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] user_limits^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags:   [ ] virtual_host_metadata^M'
+ echo '2021-07-01 13:54:32.805 [info] <0.272.0> Feature flags: feature flag states written to disk: yes^M'
+ echo '2021-07-01 13:54:32.974 [debug] <0.283.0> Lager installed handler lager_backend_throttle into lager_event^M'
+ echo '2021-07-01 13:54:33.992 [info] <0.272.0> Running boot step pre_boot defined by app rabbit^M'

Following are the investigations I've done so far:

I verified no disk IO was taking longer than 5ms by running fileslower.py while the container launch was in progress.
Added following instrumentation to measure on-demand file access latency.

func (f *file) Read(ctx context.Context, dest []byte, off int64) (fuse.ReadResult, syscall.Errno) {
       start := time.Now()

        n, err := f.ra.ReadAt(dest, off)
        if err != nil && err != io.EOF {
                f.n.s.report(fmt.Errorf("failed to read node: %v", err))
                return nil, syscall.EIO
        }

        duration := time.Since(start)

           log.G(ctx).Infof("read_file=%v read_file_len=%v read_file_data_served=%v read_file_latency=%v ",
                                    f.n.name, len(dest), n, duration.Nanoseconds())
  
        return fuse.ReadResultData(dest[:n]), 0
}

Logs indicated no on-demand file access had a latency of higher than 2 ms.

I launched snapshotter by enabling GC tracing and validated that GC was not triggered when we observed stall in the container launch. Command to enable GC tracking: sudo GODEBUG=gctrace=1 ./stargz-snapshotter/out/containerd-stargz-grpc --log-level=debug >& logs &. Following is a snapshot of relevant logs from the snapshotter.

Following are the relevant sections of the log

time="2021-07-01T22:46:42.014738448Z" level=info msg="read_file=libc-2.31.so read_file_len=45056 read_file_data_served=45056 read_file_latency=79682 "
gc 28 @60.980s 0%: 0.16+3.5+0.027 ms clock, 15+0.17/53/20+2.6 ms cpu, 253->253->144 MB, 262 MB goal, 96 P
time="2021-07-01T22:46:42.541053558Z" level=debug msg="estargz open_file=etc/rabbitmq/enabled_plugins"

....

time="2021-07-01T22:48:42.277128643Z" level=debug msg="ReadAt file=&{name:usr/lib/x86_64-linux-gnu/ld-2.31.so digest:sha256:96493303ba8ba364a8da6b77fbb9f04d0f170cbecbc6bbacca616161bd0f0008 ra:0xc0006ad4d0 r:0xc000e2a000 cache:0xc0007421e0 gr:0xc0002d0150}\n"
gc 29 time="2021-07-01T22:48:42.277718539Z" level=info msg="read_file=ld-2.31.so read_file_len=102400 read_file_data_served=102400 read_file_latency=605507 "
 @181.237s 0%: 0.14+3.9+0.079 ms clock, 13+0.29/67/34+7.6 ms cpu, 183->184->118 MB, 288 MB goal, 96 P
time="2021-07-01T22:48:42.277968206Z" level=debug msg="estargz open_file=etc/ld.so.cache"

Nothing unusual in containerd logs
Ran perf tool and nothing abnormal stands out. (If it helps happy to share the flame graph)
Disabled background fetch to ensure there is no contention between background fetch tasks and container launch. The results with and w/o background fetch enabled are similar.
Added instrumentation to measure total time required to complete prefetch. Logs indicated it takes ~1.3 seconds to complete prefetch.

It'd be super helpful if someone could give some ideas about possible next steps to investigate this regression.

ktock · 2021-07-02T07:52:04Z

@sumeetb25 Could you provide the step of how have you created the optimized eStargz image?
Did you enable the optimization feature provided by ctr-remote ? https://github.com/containerd/stargz-snapshotter/blob/v0.6.4/docs/ctr-remote.md

sumeetb25 · 2021-07-02T14:14:23Z

Yeah I had enabled the optimization feature. Following is the command I used to create optimized eStargz image:

time sudo ./stargz-snapshotter/out/ctr-remote images optimize --oci $IMAGE $ESTARGZ_IMAGE

$IMAGE referred to the image pulled from docker hub by executing docker pull rabbitmq and then exporting it into containerd.

sumeetb25 · 2021-07-08T00:33:18Z

@ktock Please let me know if you have any pointers on where I can start digging next.

ktock · 2021-07-08T02:29:45Z

@sumeetb25 Some pointers you can deeper look at:

Which phase (pull, create, run) shows worse performance for estargz?
If pull takes long, how long does it take to complete Mount for all layers? And what is the bottleneck of each one?
If create takes long, how long does prefetch take? create waits for completion of prefetch so it can take long if the image contains lots of files marked as the targets of prefetch.
If run takes long, how is the performance of the filesystem of stargz snapshotter worse compared to overlayfs? (in the benchmarking script, containerd uses overlayfs snapshotter for non-estargz images)

sumeetb25 · 2021-07-20T00:36:34Z

@ktock Following are the image pull and run times for rabbitmq image

Pull time for OCI image = Time required to execute ctr i pull --user "$CR_USER:$CR_PASS" $IMAGE
Pull time for estargz optimized image = Time required to execute ctr run $IMAGE container-id and READY_LINE appearing on console

Run time for OCI image = Time required to execute ./stargz-snapshotter/out/ctr-remote images rpull --user "$CR_USER:$CR_PASS" $IMAGE
Run time for estargz optimized image = Time required to execute ./stargz-snapshotter/out/ctr-remote run --snapshotter=stargz $IMAGE container-id and READY_LINE appearing on console

READY_LINE="Server startup complete"

Image	Pull time	Run time
OCI	3.9 sec	6.5 sec
estargz optimized	1.2 sec	11.5 sec

One layer in estargz optimized image took ~455 ms to complete Mount operation. Other layers took negligible time to complete the Mount operation. I assume this is all as expected.

The conundrum is why is the runtime with estargz optimized image twice as high as OCI image.

Next I added some instrumentation to containerd and here are my observations

Task	Time taken by OCI image	Time taken by estargz optimized image	Code link
Create container	438 ms	1.5 sec	https://github.com/containerd/containerd/blob/main/cmd/ctr/commands/run/run.go#L161-L167
Create task	81 ms	80 ms	https://github.com/containerd/containerd/blob/main/cmd/ctr/commands/run/run.go#L185
Start task	13 ms	17.9 ms	https://github.com/containerd/containerd/blob/main/cmd/ctr/commands/run/run.go#L215

Instrumentation in stargz snapshotter suggested one layer in estargz optimized image took ~1.3 seconds to complete prefetch operation. This seems to explain the high latency for create container. Until this point all seems to work as expected.

The big difference between OCI image and estargz optimized is the fact that the first log line from the container appear after ~6 seconds of starting the task.

+ echo 07/20/2021_00:17:14:019219657 'Creating new task pid=8201'
+ echo 07/20/2021_00:17:14:105041374 'Created new task pid=8201 duration=80506739'
+ echo 07/20/2021_00:17:14:106352186 'Waiting for new task pid=8201'
+ echo 07/20/2021_00:17:14:107899686 'Wait for new task completed pid=8201'
+ echo 07/20/2021_00:17:14:109199518 'Starting new task pid=8201'
+ echo 07/20/2021_00:17:14:123260705 'Started new task pid=8201 duration=17938450'
+ echo 07/20/2021_00:17:14:124609299 'Waiting for status result pid=8201'
+ echo 07/20/2021_00:17:20:900223216 'Configuring logger redirection'

"Configuring logger redirection" is the first log line from the container and as seen from the above trace it appears after ~6 seconds delay (This is roughly the difference between the runtimes of OCI and estargz image).

The same trace for an OCI image looks as follows

+ echo 07/19/2021_16:06:33:963409654 'Creating new task pid=73886'
+ echo 07/19/2021_16:06:34:056134144 'Created new task pid=73886 duration=94511835'
+ echo 07/19/2021_16:06:34:057402312 'Waiting new task pid=73886'
+ echo 07/19/2021_16:06:34:058880356 'Wait for new task completed pid=73886'
+ echo 07/19/2021_16:06:34:060144344 'Starting new task pid=73886'
+ echo 07/19/2021_16:06:34:068169773 'Started new task pid=73886 duration=12125714'
+ echo 07/19/2021_16:06:34:069452348 'Waiting for status result pid=73886'
+ echo 07/19/2021_16:06:36:537389542 'Configuring logger redirection'

As seen from ^^ trace there is hardly any wait time between starting the task and container log lines appearing on console.

I'm exploring running the container with strace to get a better sense of why do we see delay of ~6 seconds with estargz optimized image.

Please do let me know if you have any recommendations.

rdpsin · 2021-10-06T17:36:24Z

Instrumentation added to go-fuse suggests the container is doing ~88600 filesystem operations before it starts up. Following is the breakdown by operation type:

<opcode_number> <opcode_name> (number_of_operations)
1 - lookup (4519)
3 - Getattr (2836)
5 - readlink (1589)
14 - open (2738)
15 - read (6518)
17 - statfs (20)
18 - release (2646)
22 - getxattr (729)
23 - listxattr (8)
25 - flush (10)
26 - init (10)
27 - opendir (1688)
29 - releasedir (1688)
34 - access (59772)
36 - interrupt (77)
40 - poll (10)
44 - readdirplus (3742)

Most (about 90%) of these are sequential operations. The latency of these operations adds up to ~6 seconds which is roughly equal to the time delta between OCI and estargz optimized image.

sumeetb25 closed this as completed Jun 30, 2021

sumeetb25 reopened this Jun 30, 2021

rdpsin mentioned this issue Oct 6, 2021

Store filesystem metadata on disk #415

Merged

1 task

ktock added the discussion label Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Linux perf tool profiling with stargz-snapshotter #349

[Discussion] Linux perf tool profiling with stargz-snapshotter #349

sumeetb25 commented Jun 30, 2021 •

edited

Loading

ktock commented Jun 30, 2021

sumeetb25 commented Jun 30, 2021

sumeetb25 commented Jun 30, 2021

sumeetb25 commented Jul 2, 2021

ktock commented Jul 2, 2021

sumeetb25 commented Jul 2, 2021

sumeetb25 commented Jul 8, 2021

ktock commented Jul 8, 2021

sumeetb25 commented Jul 20, 2021

rdpsin commented Oct 6, 2021

[Discussion] Linux perf tool profiling with stargz-snapshotter #349

[Discussion] Linux perf tool profiling with stargz-snapshotter #349

Comments

sumeetb25 commented Jun 30, 2021 • edited Loading

ktock commented Jun 30, 2021

sumeetb25 commented Jun 30, 2021

sumeetb25 commented Jun 30, 2021

sumeetb25 commented Jul 2, 2021

ktock commented Jul 2, 2021

sumeetb25 commented Jul 2, 2021

sumeetb25 commented Jul 8, 2021

ktock commented Jul 8, 2021

sumeetb25 commented Jul 20, 2021

rdpsin commented Oct 6, 2021

sumeetb25 commented Jun 30, 2021 •

edited

Loading