ci: Add initial regression test workflow #2356

Ryo-not-rio · 2025-01-08T12:41:29Z

Context:
We are lacking regression testing for oneDNN on aarch64, and we would like to have a precommit pipeline, a nightly pipeline on main and a manual pipeline that would enable us to run certain benchdnn tests as regression tests. This PR covers the precommit pipeline where the PR branch is tested against the main branch.

The benchdnn test cases are very selective to avoid false positives, and are chosen from the most expensive layers in selected PyTorch models. The list of tests is stored in ./tests/regression/inputs, and will be updated as newer models come out. For now, I've added a general matmul case and a convolution case.

A known issue is the fluctuations in performance using aws machines, which could cause false positives. To minimize this, the threshold of each test can be adjusted. Additionally, we have added a consistency check before the actual regression test where we compare the performance of benchdnn 20 times on main against main, which should have little to no difference in performance. If this consistency check fails, we raise a warning to the user and ignore the result of the regression test

dzarukin · 2025-01-20T21:56:47Z

Hi @Ryo-not-rio, could you please describe what you are trying to achieve and why did you choose specifically this way of resolution?

Ryo-not-rio · 2025-01-21T10:38:37Z

@dzarukin I've updated the PR message with the information

theComputeKid · 2025-01-21T12:56:53Z

Will this require further changes once #2388 goes in?

.github/workflows/ci-aarch64.yml

Ryo-not-rio · 2025-01-21T13:13:12Z

Will this require further changes once #2388 goes in?

It shouldn't need to, there's no merge conflicts (for now)

dzarukin · 2025-01-21T16:38:17Z

@dzarukin I've updated the PR message with the information

Hi @Ryo-not-rio, thank you for adding more information. I feel there are better ways to do what you are trying to achieve. As you could know this repo supports github actions with custom syntax. It seems that supporting that flow for aarch64 using dedicated private infrastructure underneath for performance validation purpose would be more beneficial and will give you more flexible control over desired content and outcome.
I just don't see much value in the proposed solution for most users of the library as it tries to solve pretty narrow scope on a very limited set of platforms (Aarch64 Linux only with shell scripts?) and either can be done in a dedicated private repo, or through the way I've described above.

Ryo-not-rio · 2025-01-22T14:00:22Z

@dzarukin I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression. This includes patches for non-aarch64 machines which inadvertently causes a regression and I think the best way to do that is to add to the existing CI.

We already have unit testing specifically for aarch64 and the regression tests do not add a massive overhead so I am struggling to understand the problem with extending them. The scripts we have added are also not architecture specific so it can be reused to create regression tests for other architectures as well.

If you are worried about the execution time of the CI, these tests only adds about 5 minutes to the existing ci-aarch64 workflow: https://github.com/oneapi-src/oneDNN/actions/runs/12888105593/job/35932228418?pr=2356

dzarukin · 2025-01-22T17:39:52Z

@dzarukin I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression. This includes patches for non-aarch64 machines which inadvertently causes a regression and I think the best way to do that is to add to the existing CI.

We already have unit testing specifically for aarch64 and the regression tests do not add a massive overhead so I am struggling to understand the problem with extending them. The scripts we have added are also not architecture specific so it can be reused to create regression tests for other architectures as well.

If you are worried about the execution time of the CI, these tests only adds about 5 minutes to the existing ci-aarch64 workflow: https://github.com/oneapi-src/oneDNN/actions/runs/12888105593/job/35932228418?pr=2356

I have certain doubts about the methods used to solve the problem that can't be generally solved.
What's going to happen from administrative point of view if:

A sporadic PR triggers a problem due to machine instability which can be caused by many reason (extra job running, frequency changing, software updates, etc). Would that trigger block the PR promotion? If yes, how the developer supposed to address this gap?

Other related questions:

The coverage seems to be unified and hard-coded (at least I failed to find a way where it can be customized). Who decides on what to cover and its importance?
Once the coverage eats all of the machine time bandwidth, what's going to happen?

As I mentioned earlier, every team working on oneDNN solves performance validation on their own way because performance targets and priority workloads are different for each dedicated hardware, and the way to measure what's important is also inconsistent between architectures.

So I kindly suggest to re-think the current approach and build something locally to run on a regular basis than trying to implement something on a common ground.

.github/workflows/ci-aarch64.yml

tests/regression/bench_regression.sh

tests/regression/benchdnn_comparison.py

vpirogov · 2025-01-31T01:23:28Z

tests/regression/inputs/conv

@@ -0,0 +1,22 @@
+# *******************************************************************************


Consider using existing input files from tests/benchdnn/inputs, say conv/shapes_resnet_50 and matmul/shapes_bert_large.

If you really need specific set for AArch64 add something like inputs/conv/perf_aarch64.

For the precommit tests, we really want a very few specific tests so would it not make more sense to have these tests easily accessible and in the same place within .github/automation?

If you are referring to benchdnn input files then the rationale for keeping them in inputs directory is to make manually reproducing issues with benchdnn a bit more straightforward. Compare:

benchdnn --matmul --batch=inputs/matmul/perf_aarch64 benchdnn --matmul --batch=<REPO_ROOT>/.github/automation/perf_aarch64

Also helps with reuse and maintenance of these input files. If you strongly feel that these should sit in automation I'm fine with that as well.

In the case where the precommit performance tests fail, the failed tests would be shown to the user so there wouldn't be a need to run the full batches. In the case the user wants to run all the performance tests at once, they can run bench_performance.sh so I would prefer to have these files at an easily accessible place since we're likely to update them

vpirogov · 2025-01-31T01:27:39Z

I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression.

Based on our experience avoiding regressions requires extensive coverage and infrastructure that can run relevant tests on demand. Running few testing points on every PR is unlikely to catch regressions. This consideration is not a blocker for this PR though, we can try and see how this system works.

.github/workflows/ci-aarch64.yml

.github/automation/performance/benchdnn_comparison.py

.github/automation/performance/bench_performance.sh

vpirogov · 2025-02-05T17:01:03Z

@Ryo-not-rio, do you know why AArch64 CI did not trigger on this PR?

vpirogov · 2025-02-06T00:17:27Z

Ok, now AArch64 CI did trigger, but all performance testing part was skipped.

vpirogov · 2025-02-06T23:01:51Z

Consider adding a commit with intentional regression to test that the check actually catches it.

Ryo-not-rio · 2025-02-11T18:03:25Z

@vpirogov Thanks for your comments, I have confirmed the pipeline works with a regression and this PR is ready for review.

cc: @theComputeKid

Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from 45d6ff0 to bb8cf2c Compare January 10, 2025 10:50

Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from 715a454 to a92ddf3 Compare January 20, 2025 18:49

Ryo-not-rio marked this pull request as ready for review January 20, 2025 21:06

Ryo-not-rio requested review from a team as code owners January 20, 2025 21:06

theComputeKid requested changes Jan 21, 2025

View reviewed changes

.github/workflows/ci-aarch64.yml Outdated Show resolved Hide resolved

github-actions bot added devops Github automation component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 21, 2025

Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from fef1c8a to faeb73d Compare January 21, 2025 13:33

Ryo-not-rio force-pushed the regression-ci branch 6 times, most recently from 580acbd to b8f249d Compare January 27, 2025 15:44

vpirogov reviewed Jan 31, 2025

View reviewed changes

.github/workflows/ci-aarch64.yml Outdated Show resolved Hide resolved

vpirogov reviewed Jan 31, 2025

View reviewed changes

tests/regression/bench_regression.sh Outdated Show resolved Hide resolved

vpirogov reviewed Jan 31, 2025

View reviewed changes

tests/regression/benchdnn_comparison.py Outdated Show resolved Hide resolved

vpirogov reviewed Jan 31, 2025

View reviewed changes

theComputeKid requested changes Feb 5, 2025

View reviewed changes

.github/workflows/ci-aarch64.yml Show resolved Hide resolved

.github/workflows/ci-aarch64.yml Show resolved Hide resolved

Sqvid reviewed Feb 5, 2025

View reviewed changes

Ryo-not-rio force-pushed the regression-ci branch from dd18669 to f826628 Compare February 5, 2025 15:32

Ryo-not-rio requested review from a team as code owners February 5, 2025 15:32

Ryo-not-rio force-pushed the regression-ci branch from f826628 to 4467d9d Compare February 5, 2025 15:32

Ryo-not-rio removed request for a team February 5, 2025 15:33

vpirogov reviewed Feb 5, 2025

View reviewed changes

.github/automation/performance/bench_performance.sh Outdated Show resolved Hide resolved

Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from ef9d493 to ef04481 Compare February 5, 2025 16:48

Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from cea7384 to 17e7e19 Compare February 6, 2025 15:28

Ryo-not-rio added 3 commits February 10, 2025 12:42

ci: Add initial regression test workflow

128434d

ci: use t-test for regression testing

a60a9b7

ci: refactor regression tests

3a74b4e

Ryo-not-rio force-pushed the regression-ci branch 3 times, most recently from 5c122b6 to b949ce3 Compare February 11, 2025 16:58

ci: initial regression test

79ef6f0

Ryo-not-rio force-pushed the regression-ci branch from b949ce3 to 79ef6f0 Compare February 11, 2025 17:01

vpirogov approved these changes Feb 11, 2025

View reviewed changes

theComputeKid approved these changes Feb 13, 2025

View reviewed changes

theComputeKid merged commit 7dce153 into uxlfoundation:main Feb 13, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Add initial regression test workflow #2356

ci: Add initial regression test workflow #2356

Ryo-not-rio commented Jan 8, 2025 •

edited

Loading

dzarukin commented Jan 20, 2025

Ryo-not-rio commented Jan 21, 2025

theComputeKid commented Jan 21, 2025

Ryo-not-rio commented Jan 21, 2025

dzarukin commented Jan 21, 2025

Ryo-not-rio commented Jan 22, 2025 •

edited

Loading

dzarukin commented Jan 22, 2025

vpirogov Jan 31, 2025 •

edited

Loading

Ryo-not-rio Feb 5, 2025

vpirogov Feb 5, 2025

Ryo-not-rio Feb 11, 2025

vpirogov commented Jan 31, 2025 •

edited

Loading

vpirogov commented Feb 5, 2025

vpirogov commented Feb 6, 2025

vpirogov commented Feb 6, 2025

Ryo-not-rio commented Feb 11, 2025

		@@ -0,0 +1,22 @@
		# *******************************************************************************

ci: Add initial regression test workflow #2356

ci: Add initial regression test workflow #2356

Conversation

Ryo-not-rio commented Jan 8, 2025 • edited Loading

dzarukin commented Jan 20, 2025

Ryo-not-rio commented Jan 21, 2025

theComputeKid commented Jan 21, 2025

Ryo-not-rio commented Jan 21, 2025

dzarukin commented Jan 21, 2025

Ryo-not-rio commented Jan 22, 2025 • edited Loading

dzarukin commented Jan 22, 2025

vpirogov Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

Ryo-not-rio Feb 5, 2025

Choose a reason for hiding this comment

vpirogov Feb 5, 2025

Choose a reason for hiding this comment

Ryo-not-rio Feb 11, 2025

Choose a reason for hiding this comment

vpirogov commented Jan 31, 2025 • edited Loading

vpirogov commented Feb 5, 2025

vpirogov commented Feb 6, 2025

vpirogov commented Feb 6, 2025

Ryo-not-rio commented Feb 11, 2025

Ryo-not-rio commented Jan 8, 2025 •

edited

Loading

Ryo-not-rio commented Jan 22, 2025 •

edited

Loading

vpirogov Jan 31, 2025 •

edited

Loading

vpirogov commented Jan 31, 2025 •

edited

Loading