-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky compliance checks #291
base: master
Are you sure you want to change the base?
Fix flaky compliance checks #291
Conversation
Previously, the compliance check seeks for the strings "exited with code 127" or "exit status 127" in stdout + stderr. It turns out that these strings might not be present in rare cases. In order to workaround this, this commit directly checks the exit code of the relevant container with docker-compose --exit-code-from flag. The flag implies --abort-on-container-exit. When running the actual test case, the interop runner checks whether the test case is supported by an implementation. The same method cannot be applied there because we only get an exit code from a single service. However, the downside of not detecting unsupported test case is not severe, it just results in failed test. In contrast, the failed compliance check skips all test cases for the particular client and server combination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the string not be present in rare cases?
@@ -130,12 +130,12 @@ def _check_impl_is_compliant(self, name: str) -> bool: | |||
"DOWNLOADS=" + downloads_dir.name + " " | |||
'SCENARIO="simple-p2p --delay=15ms --bandwidth=10Mbps --queue=25" ' | |||
"CLIENT=" + self._implementations[name]["image"] + " " | |||
"docker-compose up --timeout 0 --abort-on-container-exit -V sim client" | |||
"docker-compose up --timeout 0 --exit-code-from client -V sim client" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re starting multiple containers, and we can’t know which one exits first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does sim container exit before client container? I think a compliant client exits first with exit code 127, then docker-compose kills sim.
I do not know. docker-compose might do some funny stuff, and/or due to race condition. |
The manifestations of this issue in the recent run are: https://github.com/marten-seemann/quic-interop-runner/actions/runs/3169239332/jobs/5161033391
https://github.com/marten-seemann/quic-interop-runner/actions/runs/3169239332/jobs/5161033427
But ngtcp2 server is fully compliant in the other combinations. |
365141c
to
e73ec56
Compare
I see the same issue too, when I run locally. Spurious "non-compliant" errors that usually go away next run. |
@larseggert Does this PR fix the problem? |
Previously, the compliance check seeks for the strings "exited with
code 127" or "exit status 127" in stdout + stderr. It turns out that
these strings might not be present in rare cases. In order to
workaround this, this commit directly checks the exit code of the
relevant container with docker-compose --exit-code-from flag. The
flag implies --abort-on-container-exit.
When running the actual test case, the interop runner checks whether
the test case is supported by an implementation. The same method
cannot be applied there because we only get an exit code from a single
service. However, the downside of not detecting unsupported test case
is not severe, it just results in failed test. In contrast, the
failed compliance check skips all test cases for the particular client
and server combination.