-
Notifications
You must be signed in to change notification settings - Fork 936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve/Redesign ParallelRunner
#4602
Comments
Not sure if we want to add this into the parent ticket. Other than datasets, it's tricky (or impossible) to make hooks working with ParallelRunner. |
Overview from Tech Design This issue proposes a series of changes to Kedro’s The main goals are:
SummaryPartitionedDataset Caching
OS-Specific Multiprocessing Defaults kedro-viz Hook Incompatibility Discussion:Polars + fork: The discussion highlighted that Polars' internal threading can conflict with the fork method on Unix-like systems, leading to hangs. Switching to spawn (or forkserver) is recommended to avoid these issues, aligning with Polars’ own guidance. Performance vs. Stability: While fork can be more efficient, it poses risks with certain C-extensions or Rust-based libraries. spawn is safer but slower to initialize new processes. Usage Data: Historical telemetry indicates Potential Deprecation: Some team members asked whether removing ParallelRunner is feasible, given its complexities. However, backward compatibility and the runner’s existing user base must be considered. |
Description
We have a growing list of issues related to the ParallelRunner. To effectively address them and improve the runner, we need a clear understanding of each issue, how the ParallelRunner's design contributes to them, and potential solutions.
The task here is to create a summary of the issues and a proposal on how to address them.
The text was updated successfully, but these errors were encountered: