Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce parallelism for batch conversion: --parallel / -P #628

Merged
merged 8 commits into from
Jan 15, 2025

Conversation

yhatt
Copy link
Member

@yhatt yhatt commented Jan 14, 2025

This PR introduces --parallel / -P option to set the number of concurrency of parallel conversion for multiple files.

# Convert Markdown files in `./marp-slides` parallel (up to 5 workers: default)
marp --parallel 5 ./marp-slides

By applying the robust async process (async initialization is running only once, and all of successors will wait that initialization) to browser operations, the parallel processing could be achieved in a simple queue and workers.

Marp CLI will use up to 5 workers while converting multiple files in the queue by default. If used Marp CLI for converting multiple Markdowns into PDF/PPTX/image(s), this setting will make drastically faster conversion than previous. CLI user can tweak the number of concurrency by using --parallel or -P option.

Resolves #509.

Performance improvement

marp-parallel folder is including 30 empty Markdown files, and Marp CLI tries to convert every Markdown files into PDF.

CLI command Parallelism Total time Faster CPU usage
time marp --pdf --no-parallel ./marp-parallel 1 64.71 sec - 18%
time marp --pdf --parallel=5 ./marp-parallel 5 (default) 14.50 sec x4.46 67%
time marp --pdf --parallel=10 ./marp-parallel 10 8.565 sec x7.56 108%

Increasing the number of concurrency from 5 to 10, the conversion will become much faster, but the CPU usage may stick to 100% while conversion in my environment. For the balance between the speed and CPU, I decided the default concurrecny as 5.

In PDF/PPTX/image conversion, that number is similar to the meaning of "max browser tabs for operating internal conversion concurrency". So I think the best default of this value may not mean the number of CPUs, that is known as better default setting in common concurrency operations. (Even if the computer was 2 core, it's very popular case to open 3 and more tabs in the browser; That is a reason why the default number of concurrency of parallel conversion is a fixed number)

yhatt added 2 commits January 14, 2025 18:58
Set the maximum number of parallel conversion for multiple files as
`--parallel`/`-P` option.
@yhatt yhatt changed the title Allow parallelism for batch conversion Allow parallelism for batch conversion: --parallel / -P Jan 14, 2025
@yhatt yhatt changed the title Allow parallelism for batch conversion: --parallel / -P Introduce parallelism for batch conversion: --parallel / -P Jan 14, 2025
@yhatt yhatt merged commit a553d84 into main Jan 15, 2025
1 check passed
@yhatt yhatt deleted the parallel-batch-conversion branch January 15, 2025 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel Processing / Speed Up Generation?
1 participant