Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(agent): make sure phase exit codes are in the 0-255 range #385

Merged
merged 2 commits into from
Oct 21, 2024

Conversation

boukeas
Copy link
Contributor

@boukeas boukeas commented Oct 21, 2024

Description

When a process is terminated due to a "stop condition checker", it ends up receiving a SIGKILL and the returncode for a process that receives a SIGKILL (signal 9) is -9. That returncode is used as the exit status of the phase, which means that exit statuses reported by Testflinger can be negative, rather than in the expected range of 0-255.

This PR ensures that phase exit codes are in the 0-255 range.

Tests

In a local Testflinger deployment, this job was used for testing, deliberately timing out:

job_queue: myqueue
output_timeout: 60
test_data:
  test_cmds: |
    echo "Starting"
    sleep 120
    echo "Ending"

Here are the results when the agent was deployed from main:

{
    "job_state": "complete",
    "test_output": "************************************************\n* Starting testflinger test phase on agent-007 *\n************************************************\n2024-10-21 10:54:54,795 myqueue INFO: DEVICE CONNECTOR: BEGIN testrun\n\nERROR: Output timeout reached! (60s)\n",
    "test_status": -9
}

And here are the results when the agent was deployed from this branch:

{
    "job_state": "complete",
    "test_output": "************************************************\n* Starting testflinger test phase on agent-007 *\n************************************************\n2024-10-21 10:52:13,757 myqueue INFO: DEVICE CONNECTOR: BEGIN testrun\n\nERROR: Output timeout reached! (60s)\n",
    "test_status": 247
}

@boukeas boukeas requested a review from plars October 21, 2024 10:05
Copy link
Collaborator

@plars plars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but can we add a small unit test for this? Just something to ensure that when we timeout, that it gets the expected (now positive) _status for the phase should suffice.

@boukeas boukeas requested a review from plars October 21, 2024 11:06
Copy link
Collaborator

@plars plars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks!

@boukeas boukeas merged commit 610c6ef into main Oct 21, 2024
2 checks passed
@boukeas boukeas deleted the return-valid-exitcodes branch October 21, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants