Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job-runner: testing runs isolated to their own container #6017

Merged
merged 2 commits into from
Mar 7, 2024

Conversation

allisonkarlitskaya
Copy link
Member

@allisonkarlitskaya allisonkarlitskaya commented Mar 5, 2024

Introduce job-runner, a new asyncio python application designed to replace a good chunk of our backend CI infra:

  • tests-invoke
  • make-checkout
  • s3-streamer
  • (eventually, maybe) run-queue

and dramatically simplify others:

  • issue-scan
  • tests-scan

while making it easier to play with our testing infrastructure locally.

The main new feature vs. the existing stuff is that each testing job is run in its own one-time-use container. The image for that container is under control of the revision being tested, allowing for gating of task container upgrades (avoiding the usual wave of broken pixels, linting changes, etc.).

Instead of running our jobs from the giant blobs of shell pasted together in issue-scan or tests-scan we consume them as the new "Job" json format, already introduced in #5932.

The new approach also makes a conscious effort to reduce the amount of exposure of secrets to test runs (although we have some more work to do here).

TODO:

job-runner Dismissed
await run_job(job, ctx)

if __name__ == '__main__':
asyncio.run(main())

Check warning

Code scanning / CodeQL

Use of the return value of a procedure Warning

The result of
main
is used even though it is always None.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lolwat

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 5, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 5, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
@allisonkarlitskaya
Copy link
Member Author

Random note: while hacking on this, despite all kinds of exceptions and irregular exits for all kinds of reasons (including many programming errors), I haven't seen a single leaked container in over two weeks (when I got the finally: block working properly to remove the container image).

@martinpitt

This comment was marked as resolved.

@martinpitt

This comment was marked as resolved.

allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 5, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.

We can drop the CONTAINER_HOST environment variable we added in the last
commit — we can provide that information via job-runner.toml instead.
@martinpitt

This comment was marked as outdated.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of review. We need to talk 😁

@martinpitt
Copy link
Member

test_pr() failed. The inner bots tests succeeded (and the statuses bubble is green here), but the output check failed:

ERROR: did not find 'Running on:.*cockpituous' in '[test log]`

That used to be done by the old s3-streamer:

s3-streamer:                Running on:    {platform.node()}

While we can of course change the string format, the node where job-runner runs on is vitally important for maintenance. So this needs to be put back. I'll add a TODO entry to the description for this.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
@martinpitt
Copy link
Member

I adjusted the expected test output in cockpit-project/cockpituous#587 and added a FIXUP commit here which adds back 'Running on:". The integration test passes now.

martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 6, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
@allisonkarlitskaya allisonkarlitskaya force-pushed the job-runner branch 3 times, most recently from 72caf52 to 0e1ed24 Compare March 6, 2024 17:04
@martinpitt
Copy link
Member

martinpitt commented Mar 6, 2024

This failure looks very relevant:

Error: statfs /tmp/tmp7wu7vgej/checkout-and-run: no such file or directory

Fortunately it already happens in test_mock_pr(), and it effortlessly reproduces in COCKPIT_BOTS_BRANCH=job-runner tasks/run-local.sh.

This is because we are using podman-remote in that test (and will in prod), so that temp dir will indeed not exist.

@allisonkarlitskaya allisonkarlitskaya force-pushed the job-runner branch 3 times, most recently from f2edd19 to 7c23a58 Compare March 6, 2024 19:03
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
allisonkarlitskaya pushed a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This still needs to happen for production, but allows us to start
testing cockpit-project/bots#6017.
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I ignored the design complexity aspect now, and looked at some implementation questions/bugs.

@martinpitt martinpitt marked this pull request as ready for review March 7, 2024 10:01
@martinpitt martinpitt marked this pull request as draft March 7, 2024 10:01
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This will allow us to enable job-runner in
cockpit-project/bots#6017.

Do this both for our production bots as well as our run-local.sh
integration tests.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This will allow us to enable job-runner in
cockpit-project/bots#6017.

Do this both for our production bots as well as our run-local.sh
integration tests.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This will allow us to enable job-runner in
cockpit-project/bots#6017.

Do this both for our production bots as well as our run-local.sh
integration tests.
martinpitt added a commit to cockpit-project/cockpituous that referenced this pull request Mar 7, 2024
This will allow us to enable job-runner in
cockpit-project/bots#6017.

Do this both for our production bots as well as our run-local.sh
integration tests.
Introduce job-runner, a new asyncio python application designed to
replace a good chunk of our backend CI infra:
 - `tests-invoke`
 - `make-checkout`
 - `s3-streamer`
 - (eventually, maybe) `run-queue`

and dramatically simplify others:
 - `issue-scan`
 - `tests-scan`

while making it easier to play with our testing infrastructure locally.

The main new feature vs. the existing stuff is that each testing job is
run in its own one-time-use container.  The image for that container is
under control of the revision being tested, allowing for gating of task
container upgrades (avoiding the usual wave of broken pixels, linting
changes, etc.).

Instead of running our jobs from the giant blobs of shell pasted
together in `issue-scan` or `tests-scan` we consume them as the new
"Job" json format, already introduced in #5932.

The new approach also makes a conscious effort to reduce the amount of
exposure of secrets to test runs (although we have some more work to do
here).

This commit introduces job-runner (along with its helper script,
`checkout-and-run`) along with tests.  Nothing uses this yet.
If we have a 'job' attribute in our payload, ignore the command blob and
dispatch the JSON via `job-runner`.
@martinpitt martinpitt removed the blocked label Mar 7, 2024
@martinpitt martinpitt marked this pull request as ready for review March 7, 2024 15:44
Copy link
Member

@martinpitt martinpitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this! ⭐ 🚀 🏁

@allisonkarlitskaya allisonkarlitskaya merged commit 6aa2b40 into main Mar 7, 2024
6 checks passed
@allisonkarlitskaya allisonkarlitskaya deleted the job-runner branch March 7, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants