Skip to content

Improved error handling in the PBSProProvider #3853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yadudoc
Copy link
Member

@yadudoc yadudoc commented May 8, 2025

Description

The PBSProProvider currently logs submit failures and returns None instead of raising an exception as expected. This results in Parsl repeatedly submitting jobs with little feedback to the user when misconfigured. This PR addresses this issue by raising a SubmitException which captures the stdout/err from the qsub command. The relevant info trickles up via the TooManyJobFailuresError as the examples below show:

Error raised when the PBSProProvider is misconfigured with a bad queue name:

parsl.jobs.errors.TooManyJobFailuresError: Error 1:
	Failed to start block 0: Cannot launch job parsl.HighThroughputExecutor.block-0.1746723134.8832731: Submit command 'qsub  -q debug5 -A AuroraGPT /home/yadunand/parsl/parsl/providers/pbspro/runinfo/001/submit_scripts/parsl.HighThroughputExecutor.block-0.1746723134.8832731' failed; recode=None, stdout=, stderr=qsub: Unknown queue

Error 2:
	Failed to start block 1: Cannot launch job parsl.HighThroughputExecutor.block-1.1746723135.5695264: Submit command 'qsub  -q debug5 -A AuroraGPT /home/yadunand/parsl/parsl/providers/pbspro/runinfo/001/submit_scripts/parsl.HighThroughputExecutor.block-1.1746723135.5695264' failed; recode=None, stdout=, stderr=qsub: Unknown queue

Error raised when the account is incorrect:

parsl.jobs.errors.TooManyJobFailuresError: Error 1:
	Failed to start block 0: Cannot launch job parsl.HighThroughputExecutor.block-0.1746723277.4842422: Submit command 'qsub  -q debug -A NonExistentAccount /home/yadunand/parsl/parsl/providers/pbspro/runinfo/002/submit_scripts/parsl.HighThroughputExecutor.block-0.1746723277.4842422' failed; recode=None, stdout=, stderr=qsub: Request rejected.  Reason: not found: Project NonExistentAccount

Error 2:
	Failed to start block 1: Cannot launch job parsl.HighThroughputExecutor.block-1.1746723277.8245358: Submit command 'qsub  -q debug -A NonExistentAccount /home/yadunand/parsl/parsl/providers/pbspro/runinfo/002/submit_scripts/parsl.HighThroughputExecutor.block-1.1746723277.8245358' failed; recode=None, stdout=, stderr=qsub: Request rejected.  Reason: not found: Project NonExistentAccount

Changed Behaviour

PBSProProvider when misconfigured will now raise an exception.

Fixes

Fixes #3793

Type of change

Choose which options apply, and delete the ones which do not apply.

  • Bug fix

@cms21
Copy link
Contributor

cms21 commented May 9, 2025

Issue #3793 concerns qstat commands executed by the PBSProProvider. This PR addresses a different issue with qsub commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve error reporting for PBSProProvider
2 participants