You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Jobs gfscleanup and gfsmetpg2g1 both depend upon the completion of gfsarch. As such, it is possible for both jobs to be concurrently running. This is problematic. gfscleanup removes the directory in which the gfsmetpg2g1 job is running.
This behavior was observed on Hera but likely impacts all machines.
What should have happened?
The gfsmetp suite of jobs should run to completion before gfscleanup removes the run directory.
What machines are impacted?
All or N/A, Hera
Steps to reproduce
clone g-w develop
set up g-w CI for GSI or JEDI ATM based DA
cycle to gfsmetp jobs
Additional information
A test of g-w CI C96C48_ufs_hybatmDA on Hera encountered the following scenario.
The 2024022400 gfsarch job completed. rocotorun submitted gfscleanup and gfsmetpg2g1. Both of these jobs have a single xml dependency. This single dependency is completion of gfsarch.
Jobs gfsmetgp2g1 and gfscleanup started at the same time, Mon Sep 9 17:03:20 UTC 2024 gfscleanup finished at Mon Sep 9 17:03:48 UTC 2024. One of the last actions gfscleanup does is to remove the top-level gfs run directory for the cycle
Unfortunately, gfsmetpg2g1 was running in /scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prtest/gfs.2024022400/metpg2g1.2502384. Removal of /scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prtest/gfs.2024022400 deleted the gfsmetpg2g1 run directory. Job gfsmetpg2g1 aborted at Mon Sep 9 17:03:52 UTC 2024 with the error messges
What is wrong?
Jobs gfscleanup and gfsmetpg2g1 both depend upon the completion of gfsarch. As such, it is possible for both jobs to be concurrently running. This is problematic. gfscleanup removes the directory in which the gfsmetpg2g1 job is running.
This behavior was observed on Hera but likely impacts all machines.
What should have happened?
The gfsmetp suite of jobs should run to completion before gfscleanup removes the run directory.
What machines are impacted?
All or N/A, Hera
Steps to reproduce
develop
Additional information
A test of g-w CI C96C48_ufs_hybatmDA on Hera encountered the following scenario.
The 2024022400 gfsarch job completed. rocotorun submitted gfscleanup and gfsmetpg2g1. Both of these jobs have a single xml dependency. This single dependency is completion of gfsarch.
Jobs gfsmetgp2g1 and gfscleanup started at the same time,
Mon Sep 9 17:03:20 UTC 2024
gfscleanup finished atMon Sep 9 17:03:48 UTC 2024
. One of the last actions gfscleanup does is to remove the top-level gfs run directory for the cycleUnfortunately, gfsmetpg2g1 was running in
/scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prtest/gfs.2024022400/metpg2g1.2502384
. Removal of/scratch1/NCEPDEV/stmp2/Russ.Treadon/RUNDIRS/prtest/gfs.2024022400
deleted the gfsmetpg2g1 run directory. Job gfsmetpg2g1 aborted atMon Sep 9 17:03:52 UTC 2024
with the error messgesDo you have a proposed solution?
No response
The text was updated successfully, but these errors were encountered: