Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu: aarch64: hot fix for segfault in cached winograd #2151

Merged
merged 3 commits into from
Oct 9, 2024

Conversation

taoye9
Copy link
Contributor

@taoye9 taoye9 commented Oct 4, 2024

Description

Same PR as #2149 but to backport to release v3.6.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

…ution primitive

Signed-off-by: Ye Tao <ye.tao@arm.com>
@taoye9 taoye9 requested a review from a team as a code owner October 4, 2024 18:09
@theComputeKid theComputeKid changed the title cpu: aarch64: hot fix for segfault in cached winograd gradient convol… cpu: aarch64: hot fix for segfault in cached winograd Oct 4, 2024
Copy link
Contributor

@theComputeKid theComputeKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even at its worst, it is better than having a segfault and the wrong answer.

@@ -59,7 +59,7 @@ struct acl_wino_convolution_fwd_t : public primitive_t {
private:
status_t execute_forward(const exec_ctx_t &ctx) const;
const pd_t *pd() const { return (const pd_t *)primitive_t::pd().get(); }
std::unique_ptr<acl_obj_t<Op>> acl_obj_;
mutable std::unique_ptr<acl_obj_t<Op>> acl_obj_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not declare acl_obj_ directly in execute forward?
IIUC, this patch would still break in cache of concurrent execution.

Copy link
Contributor Author

@taoye9 taoye9 Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @mgouicem, i've added the mutex following your comment. would you like to take a review?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I am still confused by the use of the mutex. Why not make the object local to the execute function call? Each execute call would have his own object and this will remove any need for synchronization.

Furthermore, as currently implemented, I believe you might still have an issue in case of concurrent execution as only the initialization is mutex protected. For example if you have

t0: lock -> acl_obj init *, dtor previous -> unlock ->  acl_obj.get() ->            ...             -> using handle
t1:                                                     lock          -> acl_obj init, dtor previous *-> unlock -> acl_obj.get() -> ... 

In the above example, t0 will get some pointer to object upon acl_obj.get() call, but the underlying object will be destroyed by t1 acl_obj init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, @mgouicem, thanks for your help, i've changed the acl_obj to a local variable.

FYI, I'm kind of confused about what kind of synchronisation should be posed on oneDNN primitives and how the primitives cache works? could you please share some docs/info?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding synchronization, primitive.execute function should be const qualified. So because it is not supposed to change any state of the primitive object, there should be no sync necessary in execution.

We do have synchronizations happening upon primitive cache accesses (single writer, multiple readers). This should be transparent from primitive implementation.

Regarding scratchpad, when user managed, scratchpad is just another output from a primitive perspective (so user responsibility to sync when needed). When library managed, it is trickier as it is not thread-safe by default. You can find more info in the scratchpad devguide page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks mgouicem, that's really helpful!

Copy link
Contributor

@theComputeKid theComputeKid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to make it thread-safe.

… winograd conv

Signed-off-by: Ye Tao <ye.tao@arm.com>
@theComputeKid theComputeKid self-requested a review October 7, 2024 14:47
…v and winograd conv without lock.

Signed-off-by: Ye Tao <ye.tao@arm.com>
Change-Id: Ifb30292a8bfc5219c44515eb4d29b277a0f0b24a
Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/oncpuml/oneDNN/+/680780
Tested-by: svc_mongoosetron <svc_mongoosetron@arm.com>
Reviewed-by: Hamza Butt <hamza.butt@arm.com>
IP-Review: Hamza Butt <hamza.butt@arm.com>
@github-actions github-actions bot added platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64 backport labels Oct 8, 2024
@avmanerikar avmanerikar merged commit 3b8f5cd into uxlfoundation:rls-v3.6 Oct 9, 2024
23 checks passed
@vpirogov vpirogov added this to the v3.6 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport platform:cpu-aarch64 Codeowner: @oneapi-src/onednn-cpu-aarch64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants