Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Add initial regression test workflow #2356

Merged
merged 4 commits into from
Feb 13, 2025

Conversation

Ryo-not-rio
Copy link
Contributor

@Ryo-not-rio Ryo-not-rio commented Jan 8, 2025

Context:
We are lacking regression testing for oneDNN on aarch64, and we would like to have a precommit pipeline, a nightly pipeline on main and a manual pipeline that would enable us to run certain benchdnn tests as regression tests. This PR covers the precommit pipeline where the PR branch is tested against the main branch.

The benchdnn test cases are very selective to avoid false positives, and are chosen from the most expensive layers in selected PyTorch models. The list of tests is stored in ./tests/regression/inputs, and will be updated as newer models come out. For now, I've added a general matmul case and a convolution case.

A known issue is the fluctuations in performance using aws machines, which could cause false positives. To minimize this, the threshold of each test can be adjusted. Additionally, we have added a consistency check before the actual regression test where we compare the performance of benchdnn 20 times on main against main, which should have little to no difference in performance. If this consistency check fails, we raise a warning to the user and ignore the result of the regression test

@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from 45d6ff0 to bb8cf2c Compare January 10, 2025 10:50
@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from 715a454 to a92ddf3 Compare January 20, 2025 18:49
@Ryo-not-rio Ryo-not-rio marked this pull request as ready for review January 20, 2025 21:06
@Ryo-not-rio Ryo-not-rio requested review from a team as code owners January 20, 2025 21:06
@dzarukin
Copy link
Contributor

Hi @Ryo-not-rio, could you please describe what you are trying to achieve and why did you choose specifically this way of resolution?

@Ryo-not-rio
Copy link
Contributor Author

@dzarukin I've updated the PR message with the information

@theComputeKid
Copy link
Contributor

Will this require further changes once #2388 goes in?

@Ryo-not-rio
Copy link
Contributor Author

Will this require further changes once #2388 goes in?

It shouldn't need to, there's no merge conflicts (for now)

@github-actions github-actions bot added devops Github automation component:tests Codeowner: @oneapi-src/onednn-arch labels Jan 21, 2025
@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from fef1c8a to faeb73d Compare January 21, 2025 13:33
@dzarukin
Copy link
Contributor

@dzarukin I've updated the PR message with the information

Hi @Ryo-not-rio, thank you for adding more information. I feel there are better ways to do what you are trying to achieve. As you could know this repo supports github actions with custom syntax. It seems that supporting that flow for aarch64 using dedicated private infrastructure underneath for performance validation purpose would be more beneficial and will give you more flexible control over desired content and outcome.
I just don't see much value in the proposed solution for most users of the library as it tries to solve pretty narrow scope on a very limited set of platforms (Aarch64 Linux only with shell scripts?) and either can be done in a dedicated private repo, or through the way I've described above.

@Ryo-not-rio
Copy link
Contributor Author

Ryo-not-rio commented Jan 22, 2025

@dzarukin I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression. This includes patches for non-aarch64 machines which inadvertently causes a regression and I think the best way to do that is to add to the existing CI.

We already have unit testing specifically for aarch64 and the regression tests do not add a massive overhead so I am struggling to understand the problem with extending them. The scripts we have added are also not architecture specific so it can be reused to create regression tests for other architectures as well.

If you are worried about the execution time of the CI, these tests only adds about 5 minutes to the existing ci-aarch64 workflow: https://github.com/oneapi-src/oneDNN/actions/runs/12888105593/job/35932228418?pr=2356

@dzarukin
Copy link
Contributor

@dzarukin I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression. This includes patches for non-aarch64 machines which inadvertently causes a regression and I think the best way to do that is to add to the existing CI.

We already have unit testing specifically for aarch64 and the regression tests do not add a massive overhead so I am struggling to understand the problem with extending them. The scripts we have added are also not architecture specific so it can be reused to create regression tests for other architectures as well.

If you are worried about the execution time of the CI, these tests only adds about 5 minutes to the existing ci-aarch64 workflow: https://github.com/oneapi-src/oneDNN/actions/runs/12888105593/job/35932228418?pr=2356

I have certain doubts about the methods used to solve the problem that can't be generally solved.
What's going to happen from administrative point of view if:

  • A sporadic PR triggers a problem due to machine instability which can be caused by many reason (extra job running, frequency changing, software updates, etc). Would that trigger block the PR promotion? If yes, how the developer supposed to address this gap?

Other related questions:

  • The coverage seems to be unified and hard-coded (at least I failed to find a way where it can be customized). Who decides on what to cover and its importance?
  • Once the coverage eats all of the machine time bandwidth, what's going to happen?

As I mentioned earlier, every team working on oneDNN solves performance validation on their own way because performance targets and priority workloads are different for each dedicated hardware, and the way to measure what's important is also inconsistent between architectures.

So I kindly suggest to re-think the current approach and build something locally to run on a regular basis than trying to implement something on a common ground.

@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 6 times, most recently from 580acbd to b8f249d Compare January 27, 2025 15:44
@@ -0,0 +1,22 @@
# *******************************************************************************
Copy link
Contributor

@vpirogov vpirogov Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using existing input files from tests/benchdnn/inputs, say conv/shapes_resnet_50 and matmul/shapes_bert_large.

If you really need specific set for AArch64 add something like inputs/conv/perf_aarch64.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the precommit tests, we really want a very few specific tests so would it not make more sense to have these tests easily accessible and in the same place within .github/automation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are referring to benchdnn input files then the rationale for keeping them in inputs directory is to make manually reproducing issues with benchdnn a bit more straightforward. Compare:

benchdnn --matmul --batch=inputs/matmul/perf_aarch64
benchdnn --matmul --batch=<REPO_ROOT>/.github/automation/perf_aarch64

Also helps with reuse and maintenance of these input files. If you strongly feel that these should sit in automation I'm fine with that as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where the precommit performance tests fail, the failed tests would be shown to the user so there wouldn't be a need to run the full batches. In the case the user wants to run all the performance tests at once, they can run bench_performance.sh so I would prefer to have these files at an easily accessible place since we're likely to update them

@vpirogov
Copy link
Contributor

vpirogov commented Jan 31, 2025

I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression.

Based on our experience avoiding regressions requires extensive coverage and infrastructure that can run relevant tests on demand. Running few testing points on every PR is unlikely to catch regressions. This consideration is not a blocker for this PR though, we can try and see how this system works.

@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from ef9d493 to ef04481 Compare February 5, 2025 16:48
@vpirogov
Copy link
Contributor

vpirogov commented Feb 5, 2025

@Ryo-not-rio, do you know why AArch64 CI did not trigger on this PR?

@vpirogov
Copy link
Contributor

vpirogov commented Feb 6, 2025

Ok, now AArch64 CI did trigger, but all performance testing part was skipped.

@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 2 times, most recently from cea7384 to 17e7e19 Compare February 6, 2025 15:28
@vpirogov
Copy link
Contributor

vpirogov commented Feb 6, 2025

Consider adding a commit with intentional regression to test that the check actually catches it.

@Ryo-not-rio Ryo-not-rio force-pushed the regression-ci branch 3 times, most recently from 5c122b6 to b949ce3 Compare February 11, 2025 16:58
@Ryo-not-rio
Copy link
Contributor Author

@vpirogov Thanks for your comments, I have confirmed the pipeline works with a regression and this PR is ready for review.

cc: @theComputeKid

@theComputeKid theComputeKid merged commit 7dce153 into uxlfoundation:main Feb 13, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops Github automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants