Skip to content

[Core ML] Improve asset management #13560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 23, 2025
Merged

Conversation

cymbalrush
Copy link
Contributor

@cymbalrush cymbalrush commented Aug 20, 2025

Fix asset management issues in CoreML delegate

  • Do not use the temporary directory for asset storage.
  • Use the staging directory for storing temporary assets, and ensure it is cleaned up after use.
  • Perform aggressive cleanup of the temporary directory to avoid leftovers.
  • Guarantee proper cleanup of staging directories after operations to prevent stale files.

Copy link

pytorch-bot bot commented Aug 20, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/13560

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 0541ad9 with merge base 8aca197 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 20, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cymbalrush cymbalrush changed the title Improve asset management [Core ML] Improve asset management Aug 20, 2025
@cymbalrush cymbalrush force-pushed the fix_assets branch 3 times, most recently from 64ecf65 to 8f19e4a Compare August 20, 2025 20:30
@@ -254,6 +254,18 @@ BOOL is_asset_alive(NSMapTable<NSString *, ETCoreMLAsset *> *assets_in_use_map,

return assets;
}

NSURL * _Nullable move_to_directory(NSURL *url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to the pattern "[self moveURL:dstURL toUniqueURLInDirectory:self.trashDirectoryURL error:nil];"

Should we consolidate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use move_to_directory

NSFileManager *fileManager,
NSError * __autoreleasing *error) {
NSURL *dstURL = [directoryURL URLByAppendingPathComponent:[NSUUID UUID].UUIDString];
if (![fileManager moveItemAtURL:url toURL:dstURL error:error]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if url is nil when using moveItemAtURL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added checks.


// Remove any existing contents in the staging directory by moving them to the trash directory,
// then recreate a clean staging directory for new use.
move_to_directory([assetsDirectoryURL URLByAppendingPathComponent:@"staging"], managedTrashDirectoryURL, fileManager, nil);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we have two models?

In model2, the staging directory is being used during compilation, but then in model1 we execute this line and move the staging directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCoreMLAssetManager is only created once when the backend is loaded, and this happens atinit time.

Copy link
Contributor

@metascroy metascroy Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCoreMLAssetManager is created once for a single PTE file on delegate init, but what if there are multiple PTE files?

Do they all share the same staging directory? And can that lead to issues when one tries to delete the staging directory when another is compiling from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCoreMLAssetManager and ETCoreMLModelManager are only created once. You can put breakpoints to verify it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I load two ET programs on two separate processes? Won't each process create ETCoreMLModelManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a persistent identifier, but I haven’t identified a suitable one yet. For now, I have removed the deletion step. Although we clean the directories in the staging area, if the app is terminated before the cleanup occurs, stale directories may remain. However, since these directories are located in the caches directory, the OS will automatically clean them up when storage is low.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I guess we could clean up things in staging that have last touched time stamps > 1 day during asset manager creation.



// Always clean the trash directory to ensure a minimal footprint.
// The `trashQueue` is serialized, so only one cleanup will run at a time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if something is moving to the trashDirectory when it's being deleted by another thread?

For example, withTemporaryDirectory moves staging files to the trashDirectory after compilation, but then another model triggers cleanupTrashDirectory on another thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine trashQueue is serial and we get the list of directories to delete at the start of the method

  NSDirectoryEnumerator *enumerator = [fileManager enumeratorAtURL:self.trashDirectoryURL
                                          includingPropertiesForKeys:@[]
                                                             options:NSDirectoryEnumerationSkipsSubdirectoryDescendants
                                                        errorHandler:errorHandler];

}

// If store failed then we will load the model from compiledURL.
auto backingAsset = Asset::make(compiledModelURL, identifier, assetManager.fileManager, error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If store fails then it's an error there is no point recovering from it.


// If modelURL is not provided, write model files to the destination directory (dstURL)
// and obtain a URL pointing to them. Otherwise, use the provided modelURL.
modelURL = (modelURL == nil) ? ::write_model_files(dstURL, self.fileManager, identifier, modelAssetType.value(), inMemoryFS, error) : modelURL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would modelURL be non-null? Is it mainly the case where the model is already compiled in the PTE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Analyzer we also save the raw model and we don't want to write the model files again if it already exists.

return compiledModelURL;
}
}
}

#if ET_EVENT_TRACER_ENABLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are we removing the compile flag here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, moving it.

inMemoryFS:(const inmemoryfs::InMemoryFileSystem*)inMemoryFS
error:(NSError * __autoreleasing *)error {
NSString *identifier = @(metadata.identifier.c_str());
NSString *rawIdentifier = raw_model_identifier(identifier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does raw here mean? Is it to distinguish from the compiled asset (which gets the unqualified identified as its name)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this in D80715432.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this in D80715432.

@metascroy
Copy link
Contributor

Overall changes look great @cymbalrush

Running internal/external CI. It does look like the linter is failing. Can you run lint?

From ET repo:

pip install -r requirements-lintrunner.txt

And then run:

lintrunner -a

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this in D80715432.

@metascroy
Copy link
Contributor

Attemping land from phabricator

@metascroy metascroy merged commit de17042 into pytorch:main Aug 23, 2025
111 of 112 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants