Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel sampling with threadpool #1252

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

mzegla
Copy link
Collaborator

@mzegla mzegla commented Nov 25, 2024

This PR implements the same functionality as: #1233, but in a different manner. Only one of them should be merged.

Since pipeline logic is executed on a single thread, there are periods of low CPU usage while pipeline is not executing inference, but some other logic like sampling which can take quite large fraction of time. Currently after inference is done we sample from each sequence group in a loop on a single thread which becomes an issue with sampling parameters that significantly extend sampling time for a single sequence group.

This PR extracts sampling logic for single sequence group into a separate method that can be executed independently from any other sequence group. In includes generic thread pool implementation that spawns certain amount of threads that are used to run sampling logic for different sequence groups in parallel.

Performance measurements confirm improvement especially for non greedy sampling and with high concurrency (the more sequence groups are scheduled for inference the more benefit from parallel sampling).

CVS-157230

@github-actions github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms no-match-files labels Nov 25, 2024
@ilya-lavrenov ilya-lavrenov self-assigned this Nov 26, 2024
@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Dec 4, 2024
@andrei-kochin andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025
post rebase adjustments

fix finish iteration

move currently_processed_tokens update

switch to async

experimental threadpool

remove access to shared struct in parallelized code

synchronize beam search part

refactor

extended timers

style
@mzegla mzegla force-pushed the parallel_sampling_threadpool branch from 0c26c92 to 8b7a92e Compare January 14, 2025 13:02
@github-actions github-actions bot added the category: visual language Visual language pipeline label Jan 14, 2025
@github-actions github-actions bot added the category: GHA CI based on Github actions label Jan 15, 2025
@iefode iefode self-requested a review January 16, 2025 08:24
@mzegla mzegla requested a review from ilya-lavrenov January 16, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GHA CI based on Github actions category: sampling Sampling / Decoding algorithms category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants