Skip to content

Adding dynamic files #3754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 95 commits into
base: master
Choose a base branch
from

Conversation

astro-friedel
Copy link

Description

Parsl has an issue with files that are created by an app when they are not specified in the arguments when the app is called. For example:

@python_app
def process(inputs=[], outputs=[]):
    with open(inputs[0], 'r') as fh:
        lines = fh.readlines()
    for i in [1,2,3]:
        with open(f"dat.{i}.log", 'w') as wh:
            wh.write("xyz\n")
        outputs.append(File("dat.{i}.log"))

@python_app
def compact(inputs=[], outputs=[]):
    with open(outputs[0], 'w') as fh:
        for f in inputs:
            lines = open(inputs[i]).readlines()
            fh.write(lines)

outs = []
p = process(inputs=File("input.dat"), outputs=outs)
c = compact(inputs=p.outputs, outputs=File("compact.log"))

While process will properly write the log files to the list. compact is unlikely to see them. This is because Parsl sees the outputs from process as an empty list and does not know that any files are actually created. This causes the constructed DAG to not create any connection between process and compact, allowing them to run in parallel, instead of the expected serial.

To fix this I have created the DynamicFileList class. This class behaves just like a list, but is also a future. If outp in the above example is an instance of this class, then Parsl will know that there are files being created and will make a dependency in the DAG for compact, forcing it to not execute until process is complete.

There is also a new wrapper in apps called the bash_watcher. This wrapper was created as a way to implement using DynamicFileLists with bash_apps.

Changed Behaviour

Using the DynamicFileList class will allow for the user to write code which has an unknown number of outputs be properly tracked and linked, and run as expected.

Type of change

Choose which options apply, and delete the ones which do not apply.

  • New feature

initial code to handle file related monitoring messages
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants