-
Notifications
You must be signed in to change notification settings - Fork 728
Parallelizes MDAnalysis.analysis.InterRDF
and MDAnalysis.analysis.InterRDF_s
#4884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2025-01-20 20:35:32 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #4884 +/- ##
========================================
Coverage 93.86% 93.86%
========================================
Files 179 179
Lines 22217 22242 +25
Branches 3157 3159 +2
========================================
+ Hits 20853 20878 +25
Misses 902 902
Partials 462 462 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@tanishy7777 thanks for the prompt PR! I have some comments though:
Can we just sum it up separately though? I mean, make it a part of |
Got it! I have made |
when trying to make
I tried to find out why this is happening by looking at the result itself which is being aggregated along 'count' so the so I tried finding the dimensions of the arrays manually because it wasnt converting to a numpy array. so its not able to convert to a numpy array, because of inconsistent dimensions. I am not sure how to resolve this here |
based on the above comment #4884 (comment) I think we can mark because its not possible to convert array of inhomogenous dimensions to a numpy array and since Because after the self.results.count[i] / norm wont be possible as division is not supported between |
@tanishy7777 the class should be marked as non-parallelizable only if the algorithm to run it is not actually parallelizable, which I'm not yet convinced is the case for all the mentioned classes. But I think you're on the right path here, you just need to implement a custom aggregation function, instead of those implemented among Can you describe what kind of arrays you're trying to aggregate and can not find an appropriate function for? I didn't quite get it from your screenshots. |
it should be the same as if you'd run it without parallelization. And I assume you want to stack/sum/whatever along the dimension that corresponds to the timestep -- you can probably guess which one it is if you run it on some example with known number of frames. Example trajectories you can find in MDAnalysisTests. |
Got it. Will work on that! |
used a custom aggregator for
Thanks @marinegor for your help! |
analysis.rdf.InterRDF
and analysis.rdf.InterRDF_s
analysis.rdf.InterRDF
and analysis.rdf.InterRDF_s
MDAnalysis.InterRDF
and MDAnalysis.InterRDF_s
MDAnalysis.InterRDF
and MDAnalysis.InterRDF_s
MDAnalysis.analysis.InterRDF
and MDAnalysis.analysis.InterRDF_s
@orbeckst @RMeli @marinegor I think this is ready to be merged, can you please review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tanishy7777, sorry for the long review (again).
good work, thanks for your contribution! I've added my comments, main action points below:
- move your custom aggregation function from the class to a standalone function, name appropriately and test
- revert changes to
test_xds.py
andcore/selection.py
(I'm guessing they were introduced byblack
or smth) - make sure you don't need to track
self.volume_cum
, since you're trackingself.results.volume_cum
and assigningself.volume_cum
in_conclude
. I made suggestions regarding that but might have missed something; please make sure until_conclude
onlyself.results.volume_cum
is used.
Co-authored-by: Egor Marin <marinegor@fastmail.com>
Sorry for the late reply. I will get to work on these changes! I had semester examinations so was quite busy the last 2 weeks. |
…ysis into parallize_rdf
@marinegor Sorry for the delayed action on this PR again, college has been quite hectic. I have added the changes that you had requested. For the formatting issues caused by black, I merged develop into my branch so I think it should be resolved. Please let me know if any further changes are needed. Thanks for reviewing my changes and for your guidance! |
@tanishy7777 thanks for all your contributions. As you know, reviewer time is pretty precious. I asked @marinegor to look at PR #4896 already so please be understanding if this one may take longer. |
Thanks for you reply. Completely understandable, no worries at all. |
@tanishy7777 just as a heads-up: Egor told us he has very little availability, so we'll have to find other reviewers to continue here. As the first step where you can help is to just resolve the conflict to get the PR back into mergeable state. |
@p-j-smith would you be in a position to review and look after this parallelization PR? Egor has already done a lot of deep reviewing here but as he can't do it at the moment and we'd like to move this improvement along, it would be great to have your 👀 on it. I am tentatively assigning you but if you can't do it, please un-assign yourself and let me know with a comment + ping and then I'll be looking for someone else. Thanks!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @tanishy7777 - just a couple of small changes to simplify the tests. And please merge in the latest changes from develop
arrs = [ | ||
[np.ones((2, 2)), np.ones((2, 2))], | ||
[np.ones((2, 2)), np.ones((2, 2))], | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
@pytest.mark.parametrize( | ||
"classname,is_parallelizable", | ||
[ | ||
(mda.analysis.rdf.InterRDF, True), | ||
], | ||
) | ||
def test_class_is_parallelizable(classname, is_parallelizable): | ||
assert classname._analysis_algorithm_is_parallelizable == is_parallelizable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to parametrise this test, this would be equivalent:
def test_class_is_parallelizable():
assert mda.analysis.rdf.InterRDF._analysis_algorithm_is_parallelizable
@pytest.mark.parametrize( | ||
"classname,backends", | ||
[ | ||
( | ||
mda.analysis.rdf.InterRDF, | ||
( | ||
"serial", | ||
"multiprocessing", | ||
"dask", | ||
), | ||
), | ||
], | ||
) | ||
def test_supported_backends(classname, backends): | ||
assert classname.get_supported_backends() == backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, this doesn't need to be parametrised. Also, I'm not sure we need this test - if we want to update the supported backends at some point, we will also need to update the test. Or is that the point of the test?
@pytest.mark.parametrize( | ||
"classname,is_parallelizable", | ||
[ | ||
(mda.analysis.rdf.InterRDF_s, True), | ||
], | ||
) | ||
def test_class_is_parallelizable(classname, is_parallelizable): | ||
assert classname._analysis_algorithm_is_parallelizable == is_parallelizable | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"classname,backends", | ||
[ | ||
( | ||
mda.analysis.rdf.InterRDF_s, | ||
( | ||
"serial", | ||
"multiprocessing", | ||
"dask", | ||
), | ||
), | ||
], | ||
) | ||
def test_supported_backends(classname, backends): | ||
assert classname.get_supported_backends() == backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests don't need to be parametrised
Fixes #4675
Changes made in this Pull Request:
rdf.InterRDF
andrdf.InterRDF_s
TLDR: of the comments below: Initially I thought
rdf
isnt parallizable but turns out both classes inrdf
can be parallelized.PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4884.org.readthedocs.build/en/4884/