-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Add initial regression test workflow #2356
Conversation
45d6ff0
to
bb8cf2c
Compare
715a454
to
a92ddf3
Compare
Hi @Ryo-not-rio, could you please describe what you are trying to achieve and why did you choose specifically this way of resolution? |
@dzarukin I've updated the PR message with the information |
Will this require further changes once #2388 goes in? |
It shouldn't need to, there's no merge conflicts (for now) |
fef1c8a
to
faeb73d
Compare
Hi @Ryo-not-rio, thank you for adding more information. I feel there are better ways to do what you are trying to achieve. As you could know this repo supports github actions with custom syntax. It seems that supporting that flow for aarch64 using dedicated private infrastructure underneath for performance validation purpose would be more beneficial and will give you more flexible control over desired content and outcome. |
@dzarukin I understand that aarch64 may be a less frequent use-case, however we would like a check in the CI which would prevent the merging of any PR which accidentally causes a regression. This includes patches for non-aarch64 machines which inadvertently causes a regression and I think the best way to do that is to add to the existing CI. We already have unit testing specifically for aarch64 and the regression tests do not add a massive overhead so I am struggling to understand the problem with extending them. The scripts we have added are also not architecture specific so it can be reused to create regression tests for other architectures as well. If you are worried about the execution time of the CI, these tests only adds about 5 minutes to the existing ci-aarch64 workflow: https://github.com/oneapi-src/oneDNN/actions/runs/12888105593/job/35932228418?pr=2356 |
I have certain doubts about the methods used to solve the problem that can't be generally solved.
Other related questions:
As I mentioned earlier, every team working on oneDNN solves performance validation on their own way because performance targets and priority workloads are different for each dedicated hardware, and the way to measure what's important is also inconsistent between architectures. So I kindly suggest to re-think the current approach and build something locally to run on a regular basis than trying to implement something on a common ground. |
580acbd
to
b8f249d
Compare
tests/regression/inputs/conv
Outdated
@@ -0,0 +1,22 @@ | |||
# ******************************************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using existing input files from tests/benchdnn/inputs
, say conv/shapes_resnet_50
and matmul/shapes_bert_large
.
If you really need specific set for AArch64 add something like inputs/conv/perf_aarch64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the precommit tests, we really want a very few specific tests so would it not make more sense to have these tests easily accessible and in the same place within .github/automation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are referring to benchdnn input files then the rationale for keeping them in inputs directory is to make manually reproducing issues with benchdnn a bit more straightforward. Compare:
benchdnn --matmul --batch=inputs/matmul/perf_aarch64
benchdnn --matmul --batch=<REPO_ROOT>/.github/automation/perf_aarch64
Also helps with reuse and maintenance of these input files. If you strongly feel that these should sit in automation I'm fine with that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the precommit performance tests fail, the failed tests would be shown to the user so there wouldn't be a need to run the full batches. In the case the user wants to run all the performance tests at once, they can run bench_performance.sh
so I would prefer to have these files at an easily accessible place since we're likely to update them
Based on our experience avoiding regressions requires extensive coverage and infrastructure that can run relevant tests on demand. Running few testing points on every PR is unlikely to catch regressions. This consideration is not a blocker for this PR though, we can try and see how this system works. |
dd18669
to
f826628
Compare
f826628
to
4467d9d
Compare
ef9d493
to
ef04481
Compare
@Ryo-not-rio, do you know why AArch64 CI did not trigger on this PR? |
Ok, now AArch64 CI did trigger, but all performance testing part was skipped. |
cea7384
to
17e7e19
Compare
Consider adding a commit with intentional regression to test that the check actually catches it. |
5c122b6
to
b949ce3
Compare
b949ce3
to
79ef6f0
Compare
@vpirogov Thanks for your comments, I have confirmed the pipeline works with a regression and this PR is ready for review. cc: @theComputeKid |
Context:
We are lacking regression testing for oneDNN on aarch64, and we would like to have a precommit pipeline, a nightly pipeline on main and a manual pipeline that would enable us to run certain benchdnn tests as regression tests. This PR covers the precommit pipeline where the PR branch is tested against the main branch.
The benchdnn test cases are very selective to avoid false positives, and are chosen from the most expensive layers in selected PyTorch models. The list of tests is stored in
./tests/regression/inputs
, and will be updated as newer models come out. For now, I've added a general matmul case and a convolution case.A known issue is the fluctuations in performance using aws machines, which could cause false positives. To minimize this, the threshold of each test can be adjusted. Additionally, we have added a consistency check before the actual regression test where we compare the performance of benchdnn 20 times on main against main, which should have little to no difference in performance. If this consistency check fails, we raise a warning to the user and ignore the result of the regression test