revive rhel-9.3 branch #19734

martinpitt · 2023-12-12T03:07:37Z

We have some more z-stream updates to do. Resuscitate this branch first to bring CI back to green. These are only clean backported test fixes, code changes will happen after this. This needs an updated naughty pattern from cockpit-project/bots#5667 to "fix" the NBDE test

Cherry-picked from 61c14cb

Cherry-picked from 7b8bfe1

This is quite literally what it is defined to do. This races with the CDP driver finishing the command, so sometimes it would fail the test on throwing that RuntimeError. Cherry-picked from a324b3f

Just janking out the disk will leave debris behind in /dev. Cherry-picked from 5e1bafc

Stop predenting that we can accurately predict the next `OnBoot` timer in TestServices.testTimerSession. It is very much *not* "now + 200 minutes", but "200 minutes after the current VM booted" (which may be long-running in Testing Farm or our CI machinery). As this is a neverending race condition in evenings, and we don't test the accuracy of systemd here, relax the check to just ensure that it happens today or tomorrow. Cherry-picked from 2f149fd

It is not clear what exactly keeps /dev/sda1 busy when the kernel tries to read the new partition table. It can't be the artificial processes and services started by the test itself since unmounting and locking have already succeeded at that point. This bug happens only in quite specific conditions, and can't be expected to ever get fixed. So let's do what every user would do as well: Retry the dialog. Cherry-picked from 530c70d

martinpitt · 2023-12-12T03:58:04Z

Failures in current rhel-9.3 branch:

I pushed a first set of cherry-picks to adjust tests for the current services image and fix some of the worse flakes.

…tion Since commit 49ee017, the step that takes long is already the `Browser.open()`, as that loads the packages and frame from the remote machine, and the timeout now happens in `waitPageLoad()`. Still, loading the frame also takes a while, so keep the long timeout for enter-page() as well. Cherry-picked from b28914f.

In our CI the testLogs sometimes does not get a `START` and then fails. From the failed log it looks like this is due to journal being rotated. The user test case already sleeps to let journalctl settle so now we unify this approach. Nov 08 13:13:09 ubuntu systemd[1]: Started test.service - Test Service. Nov 08 13:13:09 ubuntu test-service[2427]: START Nov 08 13:13:09 ubuntu systemd-journald[271]: Received client request to rotate journal. Nov 08 13:19:51 ubuntu systemd-journald[271]: Received client request to rotate journal. Nov 08 13:19:52 ubuntu test-service[2480]: START Nov 08 13:19:57 ubuntu test-service[2480]: WORKING Cherry-picked from 107c855

Close the crypto policy dialog after checking the default value. Leaving it open and clicking around on the main page is cheating and prone to race conditions, and will fail with the next commit. Cherry-picked from 8a93308

Tests like TestStorageUsed.testTeardownRetry run processes that keep a scsi_debug block device mount busy. If they fail on some assertion in the middle, the generic storage cleanup (umount, rmmod scsi_debug) fails, and the following tests get broken. Add an `fuser` kill loop to prevent that. Also show all stdout output from these commands. We don't need it returned in the code, it's more useful for developers in the test output. Cherry-picked from a45210a

This test barely makes it within the default 10 minutes timeout. From what I see most of the time is spent by waiting for multiple reboots of the machine. Locally this took almost 7 minutes to run so for CI we can bump this timeout to 20 minutes. Cherry-picked from d2ccfc0

With the impending services image refresh [1] and the new Samba container, user creation is not instantaneous any more. Add a retry loop. [1] cockpit-project/bots#4885 Cherry-picked from 7f12811

ldapmodify is not available in the quay.io/samba.org/samba-ad-server container, and it has serious trouble to authenticate. But the newer Samba now supports `samba-tool user edit`. Use that with a on-interactive edit script instead. Cherry-picked from b88436b

Adjust the data host CSS selector. The new services image auto-enables the PCP plugin, so that hack can go. Unfortunately the new version now tries to download the plugin catalog in the background, and there is no working way to disable that. This breaks the test at a random place. Anticipate, wait for, and ignore that error. Cherry-picked from 7c04205

Use the officially recommended /status route, which we expect to actually succeed (unlike /candlepin, which is just a redirect). Add curl `--fail` to ensure a non-zero exit code while it fails. Cherry-picked from b2f0b4f

Apparently recent Samba/AD is a bit slower now. Cherry-picked from ff0c229

First wait for the realm user to exist before using it in chown. D'oh! Cherry-picked from fdca31b

Cherry-picked from b76a50f

Cherry-picked from 9da9229

In most cases this is fast, but quite often Samba takes annoyingly long to answer. Make the timeout consistent and enforce this with helper functions, except for the instance in TestPackageInstall as that doesn't derive from CommonTests. Cherry-picked from 9da9229

Restarting sssd in a loop is prone to run into > systemd[1]: sssd.service: Start request repeated too quickly. > systemd[1]: sssd.service: Failed with result 'start-limit-hit'. Cherry-picked from 68d2eb7

With 30 seconds we are running into occasional timeout failures. Cherry-picked from 6ef43c6

Restarting sssd that often causes state corruption, as it often cannot initialize in 5s. It's also too much fiddling with the OS -- joining a domain should make the users available automatically, otherwise this is a bug. This works fine with IPA, and doesn't regess AD either. testUnqualifiedUsers() already does it that way, too. Cherry-picked from c055b47

The current service image's samba container does not look at that any more, and we also stopped using `ldapmodify`. Cherry-picked from 4727d48

https://bugzilla.redhat.com/show_bug.cgi?id=1839805 got fixed long ago. Cherry-picked from 77a329c

Password authentication sometimes fails on the first try. Cherry-picked from a61bb41

Grab the candlepin server's CA and install it both into rhsm and the general system (for `curl`). This tests subscription-manager more realistically, without having to yell "insecure" all the time. Also simplify and robustify the waiting loop. Previously, the loop could just end with 200 failures, and the test would go on. Now it will timeout. Also lower the 6 minute timeout to the default 2 minute -- starting up candlepin only takes a few seconds on our current image. Cherry-picked from 564717f

…-project#19667) Later Grafana versions [1] fixed the page crash on "Failed to fetch plugins from catalog", and just log it to the console now. That will make the "wait for false" loop timeout and eventually fail. If that happens, then all is actually well. [1] cockpit-project/bots#5601 Cherry-picked from de7ab98

With the latest service refresh [1] Grafana now handles being offline correctly. [1] cockpit-project/bots#5601 Cherry-picked from e8e4bda

When e.g. TestStorageswap.test fails in the middle, the active swap partition on the scsi_debug driver will prevent the module removal, and break all subsequent tests. Helps with cockpit-project#19683 Cherry-picked from 6c3986d

jelly

Thanks, assuming storage goes ✔️

martinpitt mentioned this pull request Dec 12, 2023

Revert "testmap: Retire cockpit rhel-9.3 branch" cockpit-project/bots#5667

Merged

martinpitt and others added 6 commits December 12, 2023 04:28

test: Factorize cockpit-ws startup in TestClient

d523a49

Cherry-picked from 61c14cb

test: Use standard logout test API in check-client

2904ed6

Cherry-picked from 7b8bfe1

test: Accept destroyed execution context when clicking logout

e68c736

This is quite literally what it is defined to do. This races with the CDP driver finishing the command, so sometimes it would fail the test on throwing that RuntimeError. Cherry-picked from a324b3f

test: Cleanup volume group in one non-destructive test

3f35d12

Just janking out the disk will leave debris behind in /dev. Cherry-picked from 5e1bafc

martinpitt force-pushed the r93-revive branch from d014093 to 6e4641e Compare December 12, 2023 03:57

martinpitt added the backport apply a commit from master to a stable branch label Dec 12, 2023

martinpitt and others added 20 commits December 12, 2023 10:13

test: Close modal in TestSystemInfo.testCryptoPolicies

458ea56

Close the crypto policy dialog after checking the default value. Leaving it open and clicking around on the main page is cheating and prone to race conditions, and will fail with the next commit. Cherry-picked from 8a93308

test: Allow samba user creation to take some time

3ddec5f

With the impending services image refresh [1] and the new Samba container, user creation is not instantaneous any more. Add a retry loop. [1] cockpit-project/bots#4885 Cherry-picked from 7f12811

test: Robustify waiting for candlepin

57d050b

Use the officially recommended /status route, which we expect to actually succeed (unlike /candlepin, which is just a redirect). Add curl `--fail` to ensure a non-zero exit code while it fails. Cherry-picked from b2f0b4f

test: Increase timeout for contacting AD domain

735ce1d

Apparently recent Samba/AD is a bit slower now. Cherry-picked from ff0c229

test: Fix waiting for IdM user

775e363

First wait for the realm user to exist before using it in chown. D'oh! Cherry-picked from fdca31b

test: use become_superuser helper for switching access

5d6e0b0

Cherry-picked from b76a50f

test: Drop obsolete RHEL 8.7 special case

6056a19

Cherry-picked from 9da9229

test: Avoid sssd.service restart limit failure in check-system-realms

1828ba4

Restarting sssd in a loop is prone to run into > systemd[1]: sssd.service: Start request repeated too quickly. > systemd[1]: sssd.service: Failed with result 'start-limit-hit'. Cherry-picked from 68d2eb7

test: Increaese IPA leave timeout

b9005d8

With 30 seconds we are running into occasional timeout failures. Cherry-picked from 6ef43c6

test: Drop obsolete INSECURELDAP hack

d4a29ae

The current service image's samba container does not look at that any more, and we also stopped using `ldapmodify`. Cherry-picked from 4727d48

test: Drop obsolete sssd hack

b342c7b

https://bugzilla.redhat.com/show_bug.cgi?id=1839805 got fixed long ago. Cherry-picked from 77a329c

test: Retry auth in checkClientCertAuthentication

2f13d9e

Password authentication sometimes fails on the first try. Cherry-picked from a61bb41

martinpitt and others added 4 commits December 12, 2023 10:13

test: Drop plugin update check crash hack

27ef800

With the latest service refresh [1] Grafana now handles being offline correctly. [1] cockpit-project/bots#5601 Cherry-picked from e8e4bda

test: Disable busy swap on scsi_debug

c17189f

When e.g. TestStorageswap.test fails in the middle, the active swap partition on the scsi_debug driver will prevent the module removal, and break all subsequent tests. Helps with cockpit-project#19683 Cherry-picked from 6c3986d

martinpitt force-pushed the r93-revive branch from 6e4641e to c17189f Compare December 12, 2023 09:19

martinpitt marked this pull request as ready for review December 12, 2023 09:19

martinpitt requested review from jelly and mvollmer December 12, 2023 09:20

jelly approved these changes Dec 12, 2023

View reviewed changes

martinpitt merged commit b0df0c5 into cockpit-project:rhel-9.3 Dec 12, 2023
22 checks passed

martinpitt deleted the r93-revive branch December 12, 2023 13:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revive rhel-9.3 branch #19734

revive rhel-9.3 branch #19734

martinpitt commented Dec 12, 2023 •

edited

Loading

martinpitt commented Dec 12, 2023 •

edited

Loading

jelly left a comment •

edited

Loading

revive rhel-9.3 branch #19734

revive rhel-9.3 branch #19734

Conversation

martinpitt commented Dec 12, 2023 • edited Loading

martinpitt commented Dec 12, 2023 • edited Loading

jelly left a comment • edited Loading

Choose a reason for hiding this comment

martinpitt commented Dec 12, 2023 •

edited

Loading

martinpitt commented Dec 12, 2023 •

edited

Loading

jelly left a comment •

edited

Loading