-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate MLFF geometry optimization files to JSON Lines format for fast partial loading #231
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…lling them with 1.0 - scripts/evals/geo_opt.py add note explaining RMSD values are unitless
- after moving spacegroup comparison out of pred_vs_ref_struct_symmetry to analyze_geo_opt.py - calc_structure_distances now only calculates distance metrics between predicted and reference structures - fix tests to cover new functionality and ensure robustness against mismatched IDs
… 'distance' - Conditional execution of symmetry analysis and structure distance calculations based on the specified analysis type
…aster loading on debug runs uploaded to new figshare article: https://figshare.com/articles/dataset/28642406 old one is now deprecated: https://figshare.com/articles/dataset/28187999 symmetry and distance analysis files to be added to new article next
- update_yaml_at_path now allows reading from a YAML file when `data` is set to None, returning the value at the specified dotted path without modifying the file - tests to verify read-only functionality
…a dictionary - Refined `calc_geo_opt_metrics` to handle NaN values - more test coverage for both functions
…IDS` - Modified `calc_structure_distances` in `symmetry.py` to print a warning instead of raising an error when no shared IDs between predicted and reference structures - new test cases in `test_symmetry.py` to verify the new warning behavior and ensure proper handling of NaN values in distance calculations
…imization analysis in `test_analyze_geo_opt.py` - bump pre-commit hooks for ruff, eslint, and pyright
…analysis filename - updated analysis file paths to include structure counts in filenames for consistency - modified RMSD values to reflect updated config in several models
…`lines=True` for new JSON files in line-delimited format - Updated relevant scripts and models to ensure compatibility
…pdate figshare URLs in data-files.yml - remove DataFiles.wbm_cses_plus_init_structs altogether, usually you just need one or the other, not both initial and relaxed structures - change all references from `wbm_cses_plus_init_structs` to `wbm_initial_structures` and `wbm_computed_structure_entries` in scripts and models - enhance upload script with argparse for file selection
…g.md and PR template.md update data-files.yml to reflect changes in file paths from .json.bz2 to .jsonl.gz for WBM computed and initial structures, removal of wbm_cses_plus_init_structs
geo_opt
RMSD metricThe original trajectories can be found at: https://figshare.com/s/a629acbf3bed6a04b3ce?file=53060504
…ed structures in JSON Lines format
…y/structure/symmetry.py + matbench_discovery/data.py + tests/structure/test_symmetry.py keep only new paragraph in module doc str
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
does not fix #230 (yet). the symmetry analysis rerun for all models is pushed back to a subsequent PR. the current lineup of models was tested with mixture of
spglib==2.5.0
and severalmoyopy
versions so a rerun would have been in order anyway for consistency. the code to calculate RMSD metric is fixed in this PR but the existing metrics are not updated. my machine keeps crashing trying to rerun all models. may need to rerun models one by one over night.this PR makes debugging and iterating on future geometry optimization much faster by allowing to load just ~100 DFT/model-relaxed structures instead of all 257k in WBM on test runs
Unpolished conversion script for ML-relaxed structures from JSON to JSON Lines