job-runner: testing runs isolated to their own container #6017

allisonkarlitskaya · 2024-03-05T13:14:31Z

Introduce job-runner, a new asyncio python application designed to replace a good chunk of our backend CI infra:

tests-invoke
make-checkout
s3-streamer
(eventually, maybe) run-queue

and dramatically simplify others:

issue-scan
tests-scan

while making it easier to play with our testing infrastructure locally.

The main new feature vs. the existing stuff is that each testing job is run in its own one-time-use container. The image for that container is under control of the revision being tested, allowing for gating of task container upgrades (avoiding the usual wave of broken pixels, linting changes, etc.).

Instead of running our jobs from the giant blobs of shell pasted together in issue-scan or tests-scan we consume them as the new "Job" json format, already introduced in #5932.

The new approach also makes a conscious effort to reduce the amount of exposure of secrets to test runs (although we have some more work to do here).

TODO:

Includes workflows: Update to podman 4 in cockpituous workflow #6018 to fix tests
Needs tasks: Add job-runner.toml configuration cockpituous#586 ; once that lands, revert the "TEMP HACK" commit
Needs tasks: Bind the host's podman API socket cockpituous#583 deployed to prod (it isn't yet) before landing, so that it will actually work
Needs tasks: Add integration test for image-refresh via issue-scan cockpituous#588 to ensure we don't break image refreshes
Add host name to log, like the old s3-streamer
~~cidfile crash after successful runs https://paste.centos.org/view/01fcdfc8 with podman-remote; this breaks attachments, thus a blocker~~ not reproducible, ignore

lib/aio/s3.py

allisonkarlitskaya · 2024-03-05T13:56:01Z

job-runner

+        await run_job(job, ctx)
+
+if __name__ == '__main__':
+    asyncio.run(main())


lib/aio/util.py

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

allisonkarlitskaya · 2024-03-05T14:59:57Z

Random note: while hacking on this, despite all kinds of exceptions and irregular exits for all kinds of reasons (including many programming errors), I haven't seen a single leaked container in over two weeks (when I got the finally: block working properly to remove the container image).

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017. We can drop the CONTAINER_HOST environment variable we added in the last commit — we can provide that information via job-runner.toml instead.

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt

First round of review. We need to talk 😁

checkout-and-run

lib/aio/gitresolve.py

lib/aio/job.py

lib/aio/local.py

lib/aio/jsonutil.py

martinpitt · 2024-03-06T06:58:27Z

test_pr() failed. The inner bots tests succeeded (and the statuses bubble is green here), but the output check failed:

ERROR: did not find 'Running on:.*cockpituous' in '[test log]`

That used to be done by the old s3-streamer:

s3-streamer:                Running on:    {platform.node()}

While we can of course change the string format, the node where job-runner runs on is vitally important for maintenance. So this needs to be put back. I'll add a TODO entry to the description for this.

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt · 2024-03-06T07:19:35Z

I adjusted the expected test output in cockpit-project/cockpituous#587 and added a FIXUP commit here which adds back 'Running on:". The integration test passes now.

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

lib/aio/job.py

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt · 2024-03-06T18:15:15Z

This failure looks very relevant:

Error: statfs /tmp/tmp7wu7vgej/checkout-and-run: no such file or directory

Fortunately it already happens in test_mock_pr(), and it effortlessly reproduces in COCKPIT_BOTS_BRANCH=job-runner tasks/run-local.sh.

This is because we are using podman-remote in that test (and will in prod), so that temp dir will indeed not exist.

lib/aio/job.py

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt

Thanks! I ignored the design complexity aspect now, and looked at some implementation questions/bugs.

job-runner.toml

lib/aio/job.py

lib/aio/s3.py

lib/aio/util.py

job-runner.toml

This will allow us to enable job-runner in cockpit-project/bots#6017. Do this both for our production bots as well as our run-local.sh integration tests.

Introduce job-runner, a new asyncio python application designed to replace a good chunk of our backend CI infra: - `tests-invoke` - `make-checkout` - `s3-streamer` - (eventually, maybe) `run-queue` and dramatically simplify others: - `issue-scan` - `tests-scan` while making it easier to play with our testing infrastructure locally. The main new feature vs. the existing stuff is that each testing job is run in its own one-time-use container. The image for that container is under control of the revision being tested, allowing for gating of task container upgrades (avoiding the usual wave of broken pixels, linting changes, etc.). Instead of running our jobs from the giant blobs of shell pasted together in `issue-scan` or `tests-scan` we consume them as the new "Job" json format, already introduced in #5932. The new approach also makes a conscious effort to reduce the amount of exposure of secrets to test runs (although we have some more work to do here). This commit introduces job-runner (along with its helper script, `checkout-and-run`) along with tests. Nothing uses this yet.

If we have a 'job' attribute in our payload, ignore the command blob and dispatch the JSON via `job-runner`.

martinpitt

Let's do this! ⭐ 🚀 🏁

allisonkarlitskaya requested a review from martinpitt March 5, 2024 13:14

github-advanced-security bot found potential problems Mar 5, 2024

View reviewed changes

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 5, 2024

tasks: Create job-runner.toml in run-local.sh

684ef69

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt mentioned this pull request Mar 5, 2024

tasks: Add job-runner.toml configuration cockpit-project/cockpituous#586

Merged

5 tasks

martinpitt added the blocked label Mar 5, 2024

allisonkarlitskaya force-pushed the job-runner branch from dc240da to a50e606 Compare March 5, 2024 13:55

allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 5, 2024

tasks: Create job-runner.toml in run-local.sh

5d88983

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

This comment was marked as resolved.

Sign in to view

martinpitt force-pushed the job-runner branch from b1475a3 to b9e6dab Compare March 5, 2024 18:35

This comment was marked as resolved.

Sign in to view

martinpitt force-pushed the job-runner branch from b9e6dab to 5465e59 Compare March 6, 2024 03:57

This comment was marked as outdated.

Sign in to view

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

c5d3f0f

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

7f5f305

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt force-pushed the job-runner branch from 5465e59 to 739c52c Compare March 6, 2024 05:28

martinpitt requested changes Mar 6, 2024

View reviewed changes

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

55d15a0

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

dac8b3c

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt reviewed Mar 6, 2024

View reviewed changes

lib/aio/job.py Outdated Show resolved Hide resolved

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

a77bece

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024

tasks: Create job-runner.toml in run-local.sh

57c8282

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

allisonkarlitskaya force-pushed the job-runner branch from f649627 to 5d35d6f Compare March 6, 2024 16:38

allisonkarlitskaya requested a review from martinpitt March 6, 2024 16:39

allisonkarlitskaya force-pushed the job-runner branch 3 times, most recently from 72caf52 to 0e1ed24 Compare March 6, 2024 17:04

allisonkarlitskaya force-pushed the job-runner branch 3 times, most recently from f2edd19 to 7c23a58 Compare March 6, 2024 19:03

github-advanced-security bot found potential problems Mar 6, 2024

View reviewed changes

lib/aio/job.py Fixed Show fixed Hide fixed

allisonkarlitskaya force-pushed the job-runner branch from 7c23a58 to da49455 Compare March 6, 2024 19:10

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024

tasks: Create job-runner.toml in run-local.sh

f220d73

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024

tasks: Create job-runner.toml in run-local.sh

8fd5876

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024

tasks: Create job-runner.toml in run-local.sh

e0ea762

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt mentioned this pull request Mar 7, 2024

run-local.sh: Various cleanups, prepare for job-runner cockpit-project/cockpituous#589

Merged

allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024

tasks: Create job-runner.toml in run-local.sh

886c4a9

This still needs to happen for production, but allows us to start testing cockpit-project/bots#6017.

martinpitt requested changes Mar 7, 2024

View reviewed changes

martinpitt reviewed Mar 7, 2024

View reviewed changes

job-runner.toml Show resolved Hide resolved

martinpitt marked this pull request as ready for review March 7, 2024 10:01

martinpitt marked this pull request as draft March 7, 2024 10:01

allisonkarlitskaya force-pushed the job-runner branch from da49455 to 85f5244 Compare March 7, 2024 10:35

allisonkarlitskaya force-pushed the job-runner branch from 85f5244 to 543a95e Compare March 7, 2024 14:52

allisonkarlitskaya force-pushed the job-runner branch from 366b7bf to f58bff5 Compare March 7, 2024 15:22

allisonkarlitskaya added 2 commits March 7, 2024 16:43

run-queue: invoke job-runner for Job objects

043aaf0

If we have a 'job' attribute in our payload, ignore the command blob and dispatch the JSON via `job-runner`.

allisonkarlitskaya force-pushed the job-runner branch from f58bff5 to 043aaf0 Compare March 7, 2024 15:43

martinpitt removed the blocked label Mar 7, 2024

martinpitt marked this pull request as ready for review March 7, 2024 15:44

martinpitt approved these changes Mar 7, 2024

View reviewed changes

allisonkarlitskaya merged commit 6aa2b40 into main Mar 7, 2024
6 checks passed

allisonkarlitskaya deleted the job-runner branch March 7, 2024 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

job-runner: testing runs isolated to their own container #6017

job-runner: testing runs isolated to their own container #6017

allisonkarlitskaya commented Mar 5, 2024 •

edited by martinpitt

Loading

allisonkarlitskaya Mar 5, 2024

allisonkarlitskaya commented Mar 5, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

martinpitt left a comment

martinpitt commented Mar 6, 2024

martinpitt commented Mar 6, 2024

martinpitt commented Mar 6, 2024 •

edited

Loading

martinpitt left a comment

martinpitt left a comment

job-runner: testing runs isolated to their own container #6017

job-runner: testing runs isolated to their own container #6017

Conversation

allisonkarlitskaya commented Mar 5, 2024 • edited by martinpitt Loading

allisonkarlitskaya Mar 5, 2024

Choose a reason for hiding this comment

allisonkarlitskaya commented Mar 5, 2024

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as outdated.

martinpitt left a comment

Choose a reason for hiding this comment

martinpitt commented Mar 6, 2024

martinpitt commented Mar 6, 2024

martinpitt commented Mar 6, 2024 • edited Loading

martinpitt left a comment

Choose a reason for hiding this comment

martinpitt left a comment

Choose a reason for hiding this comment

allisonkarlitskaya commented Mar 5, 2024 •

edited by martinpitt

Loading

martinpitt commented Mar 6, 2024 •

edited

Loading