Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change CLI to employ multithreading by default #4211

Merged
merged 2 commits into from
Dec 12, 2024

Conversation

daniellerozenblit
Copy link
Contributor

@daniellerozenblit daniellerozenblit commented Dec 10, 2024

Summary

This PR modifies ZSTDCLI_NBTHREADS_DEFAULT so that multithreading is used by default when the library is compiled with multithreading support.

We use a conservative value of max(1, min(4, nbCores/4)) worker threads for this default value.

Testing

I benchmarked on an Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz machine with 80 logical cores. I observed that the default behavior is now similar to the behavior when explicitly doing multi-threaded compression using the -T flag.

~/zstd$ ./zstd -b3 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  358.2 MB/s,  891.2 MB/s
~/zstd$ ./zstd -b3 -T4 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  323.2 MB/s,  885.1 MB/s
~/zstd$ ./zstd -b3 -T2 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  266.1 MB/s,  885.0 MB/s
~/zstd$ ./zstd -b3 -T1 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66137723 (x3.205),  189.6 MB/s,  863.7 MB/s

programs/zstdcli.c Outdated Show resolved Hide resolved
programs/zstd.1.md Outdated Show resolved Hide resolved
@Cyan4973
Copy link
Contributor

Cyan4973 commented Dec 10, 2024

~/zstd$ ./zstd -b3 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  358.2 MB/s,  891.2 MB/s
~/zstd$ ./zstd -b3 -T4 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  323.2 MB/s,  885.1 MB/s

The difference between these 2 measurements is > 10%, which is quite substantial if they are expected to be equal.
Is it repeatable, or is it a one-off ?

Also:
I would expect -vv to specify clearly the nb of threads used.

@Cyan4973
Copy link
Contributor

Also as a follow-up,
it would be good to clarify the (potential) impact on the benchmark mode (-b).

@daniellerozenblit
Copy link
Contributor Author

~/zstd$ ./zstd -b3 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  358.2 MB/s,  891.2 MB/s
~/zstd$ ./zstd -b3 -T4 ~/silesia/silesia/*
 3# 12 files         : 211938580 ->  66216521 (x3.201),  323.2 MB/s,  885.1 MB/s

The difference between these 2 measurements is > 10%, which is quite substantial if they are expected to be equal. Is it repeatable, or is it a one-off ?

Also: I would expect -vv to specify clearly the nb of threads used.

I re-did this experiment, compressing silesia with the default number threads (which on my machine evaluates to 4) and then explicitly passing -T4 and got more consistent results.

Overall it seems that there is more noise in the benchmarks with multithreading, but the behavior looks overall as expected.

# -T4
 3# 12 files         : 211938580 ->  66216521 (x3.201),  310.7 MB/s,  861.7 MB/s
 3# 12 files         : 211938580 ->  66216521 (x3.201),  306.1 MB/s,  862.6 MB/s
 3# 12 files         : 211938580 ->  66216521 (x3.201),  318.3 MB/s,  868.3 MB/s

# default
 3# 12 files         : 211938580 ->  66216521 (x3.201),  316.1 MB/s,  871.2 MB/s
 3# 12 files         : 211938580 ->  66216521 (x3.201),  315.3 MB/s,  867.9 MB/s
 3# 12 files         : 211938580 ->  66216521 (x3.201),  307.5 MB/s,  866.8 MB/s

@@ -1311,7 +1311,7 @@ int main(int argCount, const char* argv[])
DISPLAYLEVEL(3, "Note: %d physical core(s) detected \n", nbWorkers);
}
}
DISPLAYLEVEL(2, "Compressing with %u worker threads \n", nbWorkers);
DISPLAYLEVEL(3, "Compressing with %u worker threads \n", nbWorkers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, level 3 seems better

@daniellerozenblit daniellerozenblit force-pushed the mt-cli branch 2 times, most recently from 611e4f6 to 2a8502e Compare December 11, 2024 22:04
@daniellerozenblit daniellerozenblit marked this pull request as draft December 11, 2024 22:25
… level >= 4, and add lower bound of 1 for the default number of threads
@daniellerozenblit daniellerozenblit marked this pull request as ready for review December 12, 2024 16:40
@daniellerozenblit daniellerozenblit merged commit 17beeb5 into facebook:dev Dec 12, 2024
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants