Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing tried to download an incorrect artifact #4275

Closed
rdner opened this issue Feb 19, 2024 · 6 comments · Fixed by #4276
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@rdner
Copy link
Member

rdner commented Feb 19, 2024

Failing test case

TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing

Error message

404 when downloading an artifact

Build

https://buildkite.com/elastic/elastic-agent/builds/7244#018db45f-064c-4ba3-a4eb-eb93b04613f1

OS

Linux, Mac, Windows

Stacktrace and notes

upgrade_gpg_test.go:143: 
        	Error Trace:	C:/Users/windows/agent/testing/integration/upgrade_gpg_test.go:143
        	Error:      	Received unexpected error:
        	            	failed to start agent upgrade to version "8.12.2": exit status 1
        	            	Error: Failed trigger upgrade of daemon: failed download of agent binary: unable to download package: 3 errors occurred:
        	            		* package 'C:\Program Files\Elastic\Agent\data\elastic-agent-8.13.0-SNAPSHOT-f23c27\downloads\elastic-agent-8.12.2-windows-x86_64.zip' not found: open C:\Program Files\Elastic\Agent\data\elastic-agent-8.13.0-SNAPSHOT-f23c27\downloads\elastic-agent-8.12.2-windows-x86_64.zip: The system cannot find the file specified.
        	            		* call to 'https://snapshots.elastic.co/8.13.0-772867d3/downloads/beats/elastic-agent/elastic-agent-8.12.2-windows-x86_64.zip' returned unsuccessful status code: 404
        	            		* call to 'https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-windows-x86_64.zip' returned unsuccessful status code: 404
        	            	
        	            	
        	            	For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.13/fleet-troubleshooting.html
        	Test:       	TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing
        	Messages:   	perform upgrade failed
    fixture_install.go:179: [test TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing] Inside fixture cleanup function
    fixture_install.go:195: collecting diagnostics; test failed

note the artifact URL is incorrect and contains 2 different versions in it.

@rdner rdner added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. labels Feb 19, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@rdner rdner self-assigned this Feb 19, 2024
@rdner
Copy link
Member Author

rdner commented Feb 19, 2024

Looks like the fixture was setup properly:

{
  "Time": "2024-02-17T00:37:16.633253Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    upgrade_gpg_test.go:122: Testing Elastic Agent upgrade from 8.13.0-SNAPSHOT to 8.12.2...\n"
}
{
  "Time": "2024-02-17T00:37:40.9782958Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    fixture.go:632: >> running binary with: [C:\\Users\\windows\\AppData\\Local\\Temp\\TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing607891235\\001\\elastic-agent-8.13.0-SNAPSHOT-windows-x86_64\\elastic-agent.exe version --binary-only --yaml]\n"
}
{
  "Time": "2024-02-17T00:37:41.2865001Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    fixture.go:632: >> running binary with: [C:\\Users\\windows\\AppData\\Local\\Temp\\TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing607891235\\002\\elastic-agent-8.12.2-windows-x86_64\\elastic-agent.exe version --binary-only --yaml]\n"
}
{
  "Time": "2024-02-17T00:37:41.5485431Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    fixture.go:632: >> running binary with: [C:\\Users\\windows\\AppData\\Local\\Temp\\TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing607891235\\001\\elastic-agent-8.13.0-SNAPSHOT-windows-x86_64\\elastic-agent.exe install --force --non-interactive]\n"
}
{
  "Time": "2024-02-17T00:38:16.2139117Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    upgrader.go:279: Upgrading from version \"8.13.0-SNAPSHOT-f23c279e1c13095118d60253c245bfe9add68f4f\" to version \"8.12.2-d0b9e4834be63cadeca3c84c6d6cf1c034b5b35a\"\n"
}
{
  "Time": "2024-02-17T00:38:16.2139117Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    fixture.go:632: >> running binary with: [C:\\Program Files\\Elastic\\Agent\\elastic-agent.exe upgrade 8.12.2 --pgp abcDEFgGIN PGP PUBLIC KEY BLOCK-----\n"
}

The error is coming from the agent itself when it tries to upgrade from 8.13.0-SNAPSHOT to 8.12.2 and cannot download the artifacts.

@rdner
Copy link
Member Author

rdner commented Feb 19, 2024

The only difference with a previously passed test I could find is this:

Failing test

{
  "Time": "2024-02-17T00:38:16.2139117Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    upgrader.go:279: Upgrading from version \"8.13.0-SNAPSHOT-f23c279e1c13095118d60253c245bfe9add68f4f\" to version \"8.12.2-d0b9e4834be63cadeca3c84c6d6cf1c034b5b35a\"\n"
}

Passing test

{
  "Time": "2024-02-16T00:45:24.3896816Z",
  "Action": "output",
  "Package": "github.com/elastic/elastic-agent/testing/integration(windows-amd64-2022-upgrade)(sudo)",
  "Test": "TestStandaloneUpgradeWithGPGFallbackOneRemoteFailing",
  "Output": "    upgrader.go:279: Upgrading from version \"8.13.0-SNAPSHOT-f23c279e1c13095118d60253c245bfe9add68f4f\" to version \"8.12.2-SNAPSHOT-44c88d0ef5014dc7e8db82876bca91763664c7f2\"\n"
}

So, the failing test is trying to fetch the release 8.12.2 version, the passing test used the snapshot version 8.12.2-SNAPSHOT.

@rdner
Copy link
Member Author

rdner commented Feb 19, 2024

@pierrehilbert have we seen this before, is it related to our release process perhaps?

@rdner
Copy link
Member Author

rdner commented Feb 19, 2024

I think I found the cause, the test setups the source URI as an empty string here:

upgradetest.WithSourceURI(""),

When running the upgrade the agent is checking if the source URI is empty and replaces it with the value from settings:

sourceURI = u.sourceURI(sourceURI)

func (u *Upgrader) sourceURI(retrievedURI string) string {
if retrievedURI != "" {
return retrievedURI
}
return u.settings.SourceURI
}

I suppose in this case it would be https://snapshots.elastic.co/8.13.0-772867d3/downloads/beats and this source URI has no 8.12.2 version indeed.

What is surprising is that the test passed before and it looks like the upgrade command has a fallback to the released version which didn't work because the version is not released yet.

I think we should check if the version is released in the test case before running the upgrade.

@rdner
Copy link
Member Author

rdner commented Feb 19, 2024

The root cause of this is that https://artifacts-api.elastic.co/v1/versions/ returns 8.12.2 as a released version (without -SNAPSHOT suffix) and it's not on https://artifacts.elastic.co/downloads/beats/elastic-agent/ yet.

Either we should not put versions on the API until they're on our CDN or introduce an additional check in

func getUpgradableVersions(ctx context.Context, vList *tools.VersionList, upgradeToVersion string, currentMajorVersions int, previousMajorVersions int) ([]*version.ParsedSemVer, error) {
parsedUpgradeToVersion, err := version.ParseVersion(upgradeToVersion)
if err != nil {
return nil, fmt.Errorf("upgradeToVersion %q is not a valid version string: %w", upgradeToVersion, err)
}
currentMajor := parsedUpgradeToVersion.Major()
var currentMajorSelected, previousMajorSelected int
sortedParsedVersions := make(version.SortableParsedVersions, 0, len(vList.Versions))
for _, v := range vList.Versions {
pv, err := version.ParseVersion(v)
if err != nil {
return nil, fmt.Errorf("invalid version %q retrieved from artifact API: %w", v, err)
}
sortedParsedVersions = append(sortedParsedVersions, pv)
}
if len(sortedParsedVersions) == 0 {
return nil, errors.New("parsed versions list is empty")
}
// normally the output of the versions returned by artifact API is already sorted in ascending order,
// we want to sort in descending orders, so we sort them
sort.Sort(sort.Reverse(sortedParsedVersions))
// If the only available build of the most recent version is a snapshot it is unreleased.
// This is always true on main and true until the first release of each minor version branch.
mostRecentVersion := sortedParsedVersions[0]
mostRecentIsUnreleased := mostRecentVersion.IsSnapshot()
var upgradableVersions []*version.ParsedSemVer
for _, parsedVersion := range sortedParsedVersions {
if currentMajorSelected == currentMajorVersions && previousMajorSelected == previousMajorVersions {
// we got all the versions we need, break the loop
break
}
if !parsedVersion.Less(*parsedUpgradeToVersion) {
// skip as testing version is less than version to upgrade from
continue
}
isPrevMinor := (parsedUpgradeToVersion.Major() == parsedVersion.Major()) &&
(parsedUpgradeToVersion.Minor()-parsedVersion.Minor()) == 1
if parsedVersion.IsSnapshot() {
// Allow returning the snapshot build of the previous minor if the current version is unreleased.
// In this situation the previous minor branch may also be unreleased immediately after feature freeze.
if !mostRecentIsUnreleased || !isPrevMinor {
continue
}
} else {
// Skip the non-snapshot build of the previous minor since it might only be available at
// staging.elastic.co which is not a default binary download location.
if mostRecentIsUnreleased && isPrevMinor {
continue
}
}
if parsedVersion.Major() == currentMajor && currentMajorSelected < currentMajorVersions {
upgradableVersions = append(upgradableVersions, parsedVersion)
currentMajorSelected++
continue
}
if parsedVersion.Major() < currentMajor && previousMajorSelected < previousMajorVersions {
upgradableVersions = append(upgradableVersions, parsedVersion)
previousMajorSelected++
continue
}
}
return upgradableVersions, nil
}
that tries the URL and skips the version if the artifact is not there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants