-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revive rhel-9.3 branch #19734
Merged
Merged
revive rhel-9.3 branch #19734
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cherry-picked from 61c14cb
Cherry-picked from 7b8bfe1
This is quite literally what it is defined to do. This races with the CDP driver finishing the command, so sometimes it would fail the test on throwing that RuntimeError. Cherry-picked from a324b3f
Just janking out the disk will leave debris behind in /dev. Cherry-picked from 5e1bafc
Stop predenting that we can accurately predict the next `OnBoot` timer in TestServices.testTimerSession. It is very much *not* "now + 200 minutes", but "200 minutes after the current VM booted" (which may be long-running in Testing Farm or our CI machinery). As this is a neverending race condition in evenings, and we don't test the accuracy of systemd here, relax the check to just ensure that it happens today or tomorrow. Cherry-picked from 2f149fd
It is not clear what exactly keeps /dev/sda1 busy when the kernel tries to read the new partition table. It can't be the artificial processes and services started by the test itself since unmounting and locking have already succeeded at that point. This bug happens only in quite specific conditions, and can't be expected to ever get fixed. So let's do what every user would do as well: Retry the dialog. Cherry-picked from 530c70d
martinpitt
force-pushed
the
r93-revive
branch
from
December 12, 2023 03:57
d014093
to
6e4641e
Compare
Failures in current rhel-9.3 branch:
I pushed a first set of cherry-picks to adjust tests for the current services image and fix some of the worse flakes. |
…tion Since commit 49ee017, the step that takes long is already the `Browser.open()`, as that loads the packages and frame from the remote machine, and the timeout now happens in `waitPageLoad()`. Still, loading the frame also takes a while, so keep the long timeout for enter-page() as well. Cherry-picked from b28914f.
In our CI the testLogs sometimes does not get a `START` and then fails. From the failed log it looks like this is due to journal being rotated. The user test case already sleeps to let journalctl settle so now we unify this approach. Nov 08 13:13:09 ubuntu systemd[1]: Started test.service - Test Service. Nov 08 13:13:09 ubuntu test-service[2427]: START Nov 08 13:13:09 ubuntu systemd-journald[271]: Received client request to rotate journal. Nov 08 13:19:51 ubuntu systemd-journald[271]: Received client request to rotate journal. Nov 08 13:19:52 ubuntu test-service[2480]: START Nov 08 13:19:57 ubuntu test-service[2480]: WORKING Cherry-picked from 107c855
Close the crypto policy dialog after checking the default value. Leaving it open and clicking around on the main page is cheating and prone to race conditions, and will fail with the next commit. Cherry-picked from 8a93308
Tests like TestStorageUsed.testTeardownRetry run processes that keep a scsi_debug block device mount busy. If they fail on some assertion in the middle, the generic storage cleanup (umount, rmmod scsi_debug) fails, and the following tests get broken. Add an `fuser` kill loop to prevent that. Also show all stdout output from these commands. We don't need it returned in the code, it's more useful for developers in the test output. Cherry-picked from a45210a
This test barely makes it within the default 10 minutes timeout. From what I see most of the time is spent by waiting for multiple reboots of the machine. Locally this took almost 7 minutes to run so for CI we can bump this timeout to 20 minutes. Cherry-picked from d2ccfc0
With the impending services image refresh [1] and the new Samba container, user creation is not instantaneous any more. Add a retry loop. [1] cockpit-project/bots#4885 Cherry-picked from 7f12811
ldapmodify is not available in the quay.io/samba.org/samba-ad-server container, and it has serious trouble to authenticate. But the newer Samba now supports `samba-tool user edit`. Use that with a on-interactive edit script instead. Cherry-picked from b88436b
Adjust the data host CSS selector. The new services image auto-enables the PCP plugin, so that hack can go. Unfortunately the new version now tries to download the plugin catalog in the background, and there is no working way to disable that. This breaks the test at a random place. Anticipate, wait for, and ignore that error. Cherry-picked from 7c04205
Use the officially recommended /status route, which we expect to actually succeed (unlike /candlepin, which is just a redirect). Add curl `--fail` to ensure a non-zero exit code while it fails. Cherry-picked from b2f0b4f
Apparently recent Samba/AD is a bit slower now. Cherry-picked from ff0c229
First wait for the realm user to exist before using it in chown. D'oh! Cherry-picked from fdca31b
Cherry-picked from b76a50f
Cherry-picked from 9da9229
In most cases this is fast, but quite often Samba takes annoyingly long to answer. Make the timeout consistent and enforce this with helper functions, except for the instance in TestPackageInstall as that doesn't derive from CommonTests. Cherry-picked from 9da9229
Restarting sssd in a loop is prone to run into > systemd[1]: sssd.service: Start request repeated too quickly. > systemd[1]: sssd.service: Failed with result 'start-limit-hit'. Cherry-picked from 68d2eb7
With 30 seconds we are running into occasional timeout failures. Cherry-picked from 6ef43c6
Restarting sssd that often causes state corruption, as it often cannot initialize in 5s. It's also too much fiddling with the OS -- joining a domain should make the users available automatically, otherwise this is a bug. This works fine with IPA, and doesn't regess AD either. testUnqualifiedUsers() already does it that way, too. Cherry-picked from c055b47
The current service image's samba container does not look at that any more, and we also stopped using `ldapmodify`. Cherry-picked from 4727d48
https://bugzilla.redhat.com/show_bug.cgi?id=1839805 got fixed long ago. Cherry-picked from 77a329c
Password authentication sometimes fails on the first try. Cherry-picked from a61bb41
Grab the candlepin server's CA and install it both into rhsm and the general system (for `curl`). This tests subscription-manager more realistically, without having to yell "insecure" all the time. Also simplify and robustify the waiting loop. Previously, the loop could just end with 200 failures, and the test would go on. Now it will timeout. Also lower the 6 minute timeout to the default 2 minute -- starting up candlepin only takes a few seconds on our current image. Cherry-picked from 564717f
…-project#19667) Later Grafana versions [1] fixed the page crash on "Failed to fetch plugins from catalog", and just log it to the console now. That will make the "wait for false" loop timeout and eventually fail. If that happens, then all is actually well. [1] cockpit-project/bots#5601 Cherry-picked from de7ab98
With the latest service refresh [1] Grafana now handles being offline correctly. [1] cockpit-project/bots#5601 Cherry-picked from e8e4bda
When e.g. TestStorageswap.test fails in the middle, the active swap partition on the scsi_debug driver will prevent the module removal, and break all subsequent tests. Helps with cockpit-project#19683 Cherry-picked from 6c3986d
martinpitt
force-pushed
the
r93-revive
branch
from
December 12, 2023 09:19
6e4641e
to
c17189f
Compare
jelly
approved these changes
Dec 12, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, assuming storage goes ✔️
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We have some more z-stream updates to do. Resuscitate this branch first to bring CI back to green. These are only clean backported test fixes, code changes will happen after this. This needs an updated naughty pattern from cockpit-project/bots#5667 to "fix" the NBDE test