Skip to content

Conversation

tonykao8080
Copy link
Contributor

Summary:
torchx/cli/test:cmd_run_test - test_run_with_log (https://www.internalfb.com/intern/test/281475186013299?ref_report_id=0) regularly failed due to assertion on local_scheduler output is missing expected content. This is causing noise to oncall due to failed release test blocking torchx release. https://fburl.com/conveyor/a5u31rby

issue looked to be in the LogIterator abort early if content has not written: https://www.internalfb.com/code/fbsource/[922fd5827417][history]/fbcode/torchx/schedulers/local_scheduler.py?lines=1185-1189

The propose fixed is add a small delay before fp_log is setup.

Differential Revision: D80716088

…at reads early

Summary:
torchx/cli/test:cmd_run_test - test_run_with_log (https://www.internalfb.com/intern/test/281475186013299?ref_report_id=0) regularly failed due to assertion on local_scheduler output is missing expected content.  This is causing noise to oncall due to failed release test blocking torchx release. https://fburl.com/conveyor/a5u31rby


issue looked to be in the LogIterator abort early if content has not written: https://www.internalfb.com/code/fbsource/[922fd5827417][history]/fbcode/torchx/schedulers/local_scheduler.py?lines=1185-1189 

The propose fixed is add a small delay before fp_log is setup.

Differential Revision: D80716088
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 21, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D80716088

@tonykao8080 tonykao8080 merged commit 50da5af into pytorch:main Aug 21, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants