Replies: 4 comments 7 replies
-
As you say, clearly there is some instability here - the 20% -ve slowdown
for L3 logging is particularly weird. I suspect it has to do with time
keeping, e.g. NTP made the clock run slower.
I think we'll need to do multiple runs (8?) and take the median. I realise
that's painful, but I can't think of an alternative.
…On Thu, 13 Jun 2024 at 02:53, Aditya P. Gurajada ***@***.***> wrote:
HI, @gregthelaw <https://github.com/gregthelaw> --
I now have the test.sh and report generation productized under the 2
commits under PR #75 <#75> .
I ran a perf-test on AWS bare-metal instance with 72 cores.
Test configuration: 32-clients, 1Mil messages / client: Server: 1, 2, 4, 8
thread-count
Here are the post-processed results.
I was expecting to see increasing degradation due to concurrent threads
logging but the results don't reflect this behaviour.
Below each chunk of output, I have annotated my observations. Comments
welcome:
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=1*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~295.85 K ops/sec | 0.00 % | ~9.79 K ops/sec | 0.00 % | 32 Million |
| L3-logging (no LOC) | ~355.17 K ops/sec | 20.05 % | ~11.83 K ops/sec | 20.85 % | 32 Million |
| L3-fast logging (no LOC) | ~296.56 K ops/sec | 0.24 % | ~9.73 K ops/sec | -0.58 % | 32 Million |
| L3-fprintf() logging (no LOC) | ~270.67 K ops/sec | -8.51 % | ~8.87 K ops/sec | -9.44 % | 32 Million |
| L3-write() logging (no LOC) | ~188.06 K ops/sec | -36.43 % | ~6.09 K ops/sec | -37.77 % | 32 Million |
| L3-logging default LOC | ~296.63 K ops/sec | 0.26 % | ~9.76 K ops/sec | -0.31 % | 32 Million |
| L3-logging LOC-ELF | ~297.14 K ops/sec | 0.44 % | ~9.77 K ops/sec | -0.19 % | 32 Million |
| spdlog-logging | ~284.75 K ops/sec | -3.75 % | ~9.36 K ops/sec | -4.36 % | 32 Million |
| spdlog-backtrace-logging | ~283.64 K ops/sec | -4.13 % | ~9.31 K ops/sec | -4.90 % | 32 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- L3-logging is 20% faster than baseline: This is totally unexpected
and unheard of. I attribute this to some m/c instability [?]
- The remaining numbers of %-drop are "in line" what we've been seeing
elsewhere.
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=2*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~457.74 K ops/sec | 0.00 % | ~15.41 K ops/sec | 0.00 % | 16 Million |
| L3-logging (no LOC) | ~442.93 K ops/sec | -3.24 % | ~14.87 K ops/sec | -3.46 % | 16 Million |
| L3-fast logging (no LOC) | ~437.40 K ops/sec | -4.44 % | ~14.68 K ops/sec | -4.75 % | 16 Million |
| L3-fprintf() logging (no LOC) | ~451.28 K ops/sec | -1.41 % | ~15.31 K ops/sec | -0.62 % | 16 Million |
| L3-write() logging (no LOC) | ~238.83 K ops/sec | -47.82 % | ~7.75 K ops/sec | -49.70 % | 16 Million |
| L3-logging default LOC | ~439.79 K ops/sec | -3.92 % | ~14.76 K ops/sec | -4.20 % | 16 Million |
| L3-logging LOC-ELF | ~439.76 K ops/sec | -3.93 % | ~14.76 K ops/sec | -4.20 % | 16 Million |
| spdlog-logging | ~406.27 K ops/sec | -11.24 % | ~13.63 K ops/sec | -11.56 % | 16 Million |
| spdlog-backtrace-logging | ~457.64 K ops/sec | -0.02 % | ~15.63 K ops/sec | 1.43 % | 16 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- For 2-threads, the overall %-age drops are seemingly reasonable / as
expected.
- Between 1 and 2 server-threads, one would expect to see greater
degradation for fprintf() but it has gone down.
- write() perf has worsened - I guess this could be due to concurrent
threads logging [?]
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=4*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~695.46 K ops/sec | 0.00 % | ~24.44 K ops/sec | 0.00 % | 8 Million |
| L3-logging (no LOC) | ~673.46 K ops/sec | -3.16 % | ~23.68 K ops/sec | -3.11 % | 8 Million |
| L3-fast logging (no LOC) | ~658.55 K ops/sec | -5.31 % | ~22.99 K ops/sec | -5.94 % | 8 Million |
| L3-fprintf() logging (no LOC) | ~580.25 K ops/sec | -16.57 % | ~19.95 K ops/sec | -18.35 % | 8 Million |
| L3-write() logging (no LOC) | ~314.35 K ops/sec | -54.80 % | ~10.34 K ops/sec | -57.69 % | 8 Million |
| L3-logging default LOC | ~681.19 K ops/sec | -2.05 % | ~23.89 K ops/sec | -2.24 % | 8 Million |
| L3-logging LOC-ELF | ~664.07 K ops/sec | -4.51 % | ~23.27 K ops/sec | -4.79 % | 8 Million |
| spdlog-logging | ~504.25 K ops/sec | -27.49 % | ~17.06 K ops/sec | -30.18 % | 8 Million |
| spdlog-backtrace-logging | ~626.13 K ops/sec | -9.97 % | ~21.84 K ops/sec | -10.62 % | 8 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
- With 4 and, below, 8 threads, write() and spdlog logging performance
seems to go down quite a bit. This is consistent with the hypothesis that
concurrent logging degrades these schemes even more.
- But fprintf() degradation is fluctuating -- this may be due to
file-system / glibc caching?
-
*Performance comparison for NumClients=32, NumOps=32000000 (32 Million),
NumThreads=8*
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Run-Type | Server throughput | Srv:Drop | Client throughput | Cli:Drop | NumOps/thread |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
| Baseline - No logging | ~744.79 K ops/sec | 0.00 % | ~26.40 K ops/sec | 0.00 % | 4 Million |
| L3-logging (no LOC) | ~734.90 K ops/sec | -1.33 % | ~26.08 K ops/sec | -1.22 % | 4 Million |
| L3-fast logging (no LOC) | ~719.97 K ops/sec | -3.33 % | ~25.50 K ops/sec | -3.42 % | 4 Million |
| L3-fprintf() logging (no LOC) | ~681.84 K ops/sec | -8.45 % | ~23.86 K ops/sec | -9.61 % | 4 Million |
| L3-write() logging (no LOC) | ~293.85 K ops/sec | -60.55 % | ~9.63 K ops/sec | -63.51 % | 4 Million |
| L3-logging default LOC | ~720.68 K ops/sec | -3.24 % | ~25.44 K ops/sec | -3.66 % | 4 Million |
| L3-logging LOC-ELF | ~726.58 K ops/sec | -2.44 % | ~25.68 K ops/sec | -2.75 % | 4 Million |
| spdlog-logging | ~581.29 K ops/sec | -21.95 % | ~19.93 K ops/sec | -24.50 % | 4 Million |
| spdlog-backtrace-logging | ~758.00 K ops/sec | 1.77 % | ~27.20 K ops/sec | 3.01 % | 4 Million |
+-------------------------------+-------------------+----------+-------------------+----------+---------------+
*Summary Conclusions*:
- Overall, with increased concurrency other logging schemes show
greater degradation.
- We are not seeing the ~1% performance degradation with L3-logging.
It's always 3+%. Maybe a measurement anomaly [?]
- spdlog-backtrace logging is competitive with L3 and l3-fast logging,
in terms of degradation, but you cannot unpack the log-info post-the run.
(Backtrace only dumps while the server is running.)
- L3-fast logging still has some non-trivial degradation. You need to
investigate and address open issue #76
<#76>.
—
Reply to this email directly, view it on GitHub
<#79>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANHOS7EVS2L44EMCXQTGN5LZHDUO3AVCNFSM6AAAAABJHLUHUWVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWHAYTIMRZGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
https://undo.io | Record. Replay. Resolve.
Technical Paper: Fix Bugs Faster with Time Travel Debugging
<https://info.undo.io/time-travel-debugging-whitepaper>
|
Beta Was this translation helpful? Give feedback.
-
All very odd, isn't it? The numthreads=1 and numthreads=4 are reasonable. numthreads=2 is disappointing because, as you say, spdlog appears to be competitive. But on numthreads=8 we once again see spdlog and L3 all have negative slowdown - i.e. a speedup compared to baseline. This implies that we are still not able to measure accurately. Did you say the machine you're using has 72 physical cores? That is presumably a NUMA architecture, and weird shit can indeed happen there. Can you try with a smaller machine, specifically one that is not NUMA? And/or we may need to experiment with pinning threads to CPU cores - e.g. |
Beta Was this translation helpful? Give feedback.
-
I experimented with my laptop. I carefully shut down all other programs, plugged in, and deliberately throttled CPUs to try to avoid weird thermal/throttling issues.There is some bizarre effect with spdlog and two threads:
everything else looks vaguely sane. There are a few -ve slowdowns, but all under 1% so I think we conclude "no measurable difference if the change against baseline is < 1% either way." But spdlog multi-threaded is just weird: 4 and 8 threads are still less slowdown than 1, which makes no sense: there is a mutex, it will be always uncontended in the single threaded case. |
Beta Was this translation helpful? Give feedback.
-
Oh wait, hang on, it looks like maybe you need to configure spdlog differently for multithreaded. Do we do that? |
Beta Was this translation helpful? Give feedback.
-
HI, @gregthelaw --
Updated 15.Jun.2024:
I now have a new commit under PR #85 that can be used to run multiple iterations of client-server performance tests.
I had to massage
perf-report.py
to find the median value of metrics and generate the comparison report.Here are the reports:
Perf run parameters: 32-clients, 1 Million msgs/client, run 5 iterations with different server-thread configurations
Observations:
write()
logging and spdlog logging does go up.Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=1
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=2
**** (Using median value of metric across 5 iterations) ****
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=4
**** (Using median value of metric across 5 iterations) ****
Performance comparison for For 32 clients, 32000000 (32 Million) msgs, num_threads=8
**** (Using median value of metric across 5 iterations) ****
I now have the
test.sh
and report generation productized under the 2 commits under PR #75 .I ran a perf-test on AWS bare-metal instance with 72 cores.
Test configuration: 32-clients, 1Mil messages / client: Server: 1, 2, 4, 8 thread-count
Here are the post-processed results.
I was expecting to see increasing degradation due to concurrent threads logging but the results don't reflect this behaviour.
Below each chunk of output, I have annotated my observations. Comments welcome:
Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=1
Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=2
fprintf()
but it has gone down.write()
perf has worsened - I guess this could be due to concurrent threads logging [?]Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=4
write()
andspdlog
logging performance seems to go down quite a bit. This is consistent with the hypothesis that concurrent logging degrades these schemes even more.fprintf()
degradation is fluctuating -- this may be due to file-system / glibc caching?Performance comparison for NumClients=32, NumOps=32000000 (32 Million), NumThreads=8
Summary Conclusions:
spdlog-backtrace
logging is competitive with L3 and l3-fast logging, in terms of degradation, but you cannot unpack the log-info post-the run. (Backtrace only dumps while the server is running.)L3-fast
logging still has some non-trivial degradation. You need to investigate and address open issue L3-Fast logging interface improvements. #76.Beta Was this translation helpful? Give feedback.
All reactions