-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu: aarch64: hot fix for segfault in cached winograd #2151
Conversation
…ution primitive Signed-off-by: Ye Tao <ye.tao@arm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even at its worst, it is better than having a segfault and the wrong answer.
@@ -59,7 +59,7 @@ struct acl_wino_convolution_fwd_t : public primitive_t { | |||
private: | |||
status_t execute_forward(const exec_ctx_t &ctx) const; | |||
const pd_t *pd() const { return (const pd_t *)primitive_t::pd().get(); } | |||
std::unique_ptr<acl_obj_t<Op>> acl_obj_; | |||
mutable std::unique_ptr<acl_obj_t<Op>> acl_obj_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not declare acl_obj_ directly in execute forward?
IIUC, this patch would still break in cache of concurrent execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @mgouicem, i've added the mutex following your comment. would you like to take a review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I am still confused by the use of the mutex. Why not make the object local to the execute function call? Each execute call would have his own object and this will remove any need for synchronization.
Furthermore, as currently implemented, I believe you might still have an issue in case of concurrent execution as only the initialization is mutex protected. For example if you have
t0: lock -> acl_obj init *, dtor previous -> unlock -> acl_obj.get() -> ... -> using handle
t1: lock -> acl_obj init, dtor previous *-> unlock -> acl_obj.get() -> ...
In the above example, t0 will get some pointer to object upon acl_obj.get() call, but the underlying object will be destroyed by t1 acl_obj init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, @mgouicem, thanks for your help, i've changed the acl_obj to a local variable.
FYI, I'm kind of confused about what kind of synchronisation should be posed on oneDNN primitives and how the primitives cache works? could you please share some docs/info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding synchronization, primitive.execute function should be const qualified. So because it is not supposed to change any state of the primitive object, there should be no sync necessary in execution.
We do have synchronizations happening upon primitive cache accesses (single writer, multiple readers). This should be transparent from primitive implementation.
Regarding scratchpad, when user managed, scratchpad is just another output from a primitive perspective (so user responsibility to sync when needed). When library managed, it is trickier as it is not thread-safe by default. You can find more info in the scratchpad devguide page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks mgouicem, that's really helpful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to make it thread-safe.
… winograd conv Signed-off-by: Ye Tao <ye.tao@arm.com>
…v and winograd conv without lock. Signed-off-by: Ye Tao <ye.tao@arm.com> Change-Id: Ifb30292a8bfc5219c44515eb4d29b277a0f0b24a Reviewed-on: https://eu-gerrit-1.euhpc.arm.com/c/oncpuml/oneDNN/+/680780 Tested-by: svc_mongoosetron <svc_mongoosetron@arm.com> Reviewed-by: Hamza Butt <hamza.butt@arm.com> IP-Review: Hamza Butt <hamza.butt@arm.com>
Description
Same PR as #2149 but to backport to release v3.6.
Checklist
General
make test
andmake test_benchdnn_*
) pass locally for each commit?