Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: generic: sycl: lnorm Intel GPU precision issues #2071

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/gpu/generic/sycl/ref_batch_normalization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,13 @@ status_t ref_batch_normalization_fwd_t::init(impl::engine_t *engine) {
= ::sycl::get_kernel_id<batch_normalization_fwd_kernel_vec_t>();
CHECK(create_kernel(engine, kid, &kernel_));
} else {
// Enabling the IEEE div compliant implementation
setenv("SYCL_PROGRAM_COMPILE_OPTIONS",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not thread-safe to use setenv (e.g. another concurrent call to the same primitive might unset it after this one sets it).

"-cl-fp32-correctly-rounded-divide-sqrt", 1);
const auto kid = ::sycl::get_kernel_id<
batch_normalization_fwd_kernel_vec_t1>();
CHECK(create_kernel(engine, kid, &kernel_));
unsetenv("SYCL_PROGRAM_COMPILE_OPTIONS");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, here it does not restore the env var to its original value.

}
return status::success;
}
Expand Down
4 changes: 4 additions & 0 deletions src/gpu/generic/sycl/ref_layer_normalizations.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -82,9 +82,13 @@ status_t ref_layer_normalization_fwd_t::init(impl::engine_t *engine) {
= ::sycl::get_kernel_id<layer_normalization_fwd_kernel_vec_t>();
CHECK(create_kernel(engine, kid, &kernel_));
} else {
// Enabling the IEEE div compliant implementation
setenv("SYCL_PROGRAM_COMPILE_OPTIONS",
"-cl-fp32-correctly-rounded-divide-sqrt", 1);
const auto kid = ::sycl::get_kernel_id<
layer_normalization_fwd_kernel_vec1_t>();
CHECK(create_kernel(engine, kid, &kernel_));
unsetenv("SYCL_PROGRAM_COMPILE_OPTIONS");
}
return status::success;
}
Expand Down
10 changes: 10 additions & 0 deletions tests/benchdnn/lnorm/lnorm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -450,7 +450,17 @@ void setup_cmp(compare::compare_t &cmp, const prb_t *prb, data_kind_t kind,
// exceeds `digits_f32`.
const int safe_digits = MAX2(0, digits_dt(dnnl_f32) - digits_dt(dt));
const float trh_coeff = (1 << safe_digits);
// The following code is a workaround because a SYCL compiler bug
// that is causing precision issues.
#ifdef DNNL_WITH_SYCL
float trh = trh_coeff
* ((kind == SRC || kind == DST
|| (kind == VAR && prb->dir & FLAG_FWD))
? 5e-7
: 0);
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed with the new option given to the kernel?

float trh = trh_coeff * ((kind == SRC || kind == DST) ? 5e-7 : 0);
#endif
if ((kind == SC || kind == SH) && prb->dir & FLAG_BWD)
trh = trh_coeff * 5e-6;
cmp.set_threshold(trh);
Expand Down