Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for agent rogue processes in integration tests #4225

Merged

Conversation

pchila
Copy link
Member

@pchila pchila commented Feb 8, 2024

This commit adds assertions at the beginning and at the end of Fixture.Install() checking that no agent processes are running before installing or after uninstalling.

If any elastic-agent processes are detected, the test is marked as failed but it still runs to completion.

What does this PR do?

This PR adds assertions for elastic-agent processes not running before installing or after uninstalling elastic agent in our integration tests.

Why is it important?

This is needed to highlight any test that leaves agent processes running (usually watchers) which may influence other tests runs and cause flakyness.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team Testing labels Feb 8, 2024
@pchila pchila self-assigned this Feb 8, 2024
@pchila pchila requested a review from a team as a code owner February 8, 2024 07:56
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pchila pchila requested review from rdner and blakerouse February 8, 2024 07:56
Copy link
Contributor

mergify bot commented Feb 8, 2024

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@pchila pchila force-pushed the add-check-for-agent-processes-in-integration-tests branch from 4f09456 to 9e52ff7 Compare February 8, 2024 20:30
@pchila pchila requested a review from leehinman February 8, 2024 20:30
@pchila pchila force-pushed the add-check-for-agent-processes-in-integration-tests branch from 9e52ff7 to 32df916 Compare February 8, 2024 20:31
Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not fail the test as soon we detect there it another agent running, it'll lead to unexpected behaviour in the test anyway. Besides failing the test will block the PR anyway, so I'd say fail fast is better.

Also perhaps it could be more optimistic and do not fail the test when it fails getting the process information. I see it limits the information we have, but it isn't a failure in the test

@pchila
Copy link
Member Author

pchila commented Feb 9, 2024

I don't see why not fail the test as soon we detect there it another agent running, it'll lead to unexpected behaviour in the test anyway. Besides failing the test will block the PR anyway, so I'd say fail fast is better.

Also perhaps it could be more optimistic and do not fail the test when it fails getting the process information. I see it limits the information we have, but it isn't a failure in the test

Originally this was to try and correlate with logs I saw in test runs that did not have this guard.

I want to fail the test because it attracts attention (nobody ever looks at the green ones) but if a test is still leaking or adding processes between start and end I want to see it, hence I let it run to completion.

There was another part to this on the branch I cherry-picked this from where we try to clean up at the end of every test that installed agent after asserting no agent processes but that may be overkill for the moment

Regarding failing if I cannot get process information, if that happens it may be a problem with the setup of the test and I wanted to bring that out too ;)

@pchila pchila force-pushed the add-check-for-agent-processes-in-integration-tests branch 2 times, most recently from 36c8de0 to 04598fd Compare February 9, 2024 18:04
@pchila pchila force-pushed the add-check-for-agent-processes-in-integration-tests branch from 04598fd to b887cdd Compare February 12, 2024 07:46
This commit adds assertions at the beginning and at the end of
Fixture.Install() checking that no agent processes are running before
installing or after uninstalling.
If any elastic-agent processes are detected, the test is marked as
failed but it still runs to completion.
@pchila pchila force-pushed the add-check-for-agent-processes-in-integration-tests branch from b887cdd to 155c3ba Compare February 13, 2024 07:25
Copy link

Quality Gate passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No Coverage information No data about Coverage
No Duplication information No data about Duplication

See analysis details on SonarQube

@pchila pchila merged commit bdefb2c into elastic:main Feb 13, 2024
1 check passed
@pchila pchila deleted the add-check-for-agent-processes-in-integration-tests branch February 13, 2024 15:02
@pchila pchila added backport-v8.12.0 Automated backport with mergify and removed backport-skip labels Feb 21, 2024
mergify bot pushed a commit that referenced this pull request Feb 21, 2024
* Check for agent rogue processes in integration tests

This commit adds assertions at the beginning and at the end of
Fixture.Install() checking that no agent processes are running before
installing or after uninstalling.
If any elastic-agent processes are detected, the test is marked as
failed but it still runs to completion.

(cherry picked from commit bdefb2c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.12.0 Automated backport with mergify enhancement New feature or request skip-changelog Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants