Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: TestFakeInputSuite/TestManager_OutputChange – Should be true #4286

Closed
rdner opened this issue Feb 20, 2024 · 3 comments · Fixed by #4290
Closed

[Flaky Test]: TestFakeInputSuite/TestManager_OutputChange – Should be true #4286

rdner opened this issue Feb 20, 2024 · 3 comments · Fixed by #4290
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@rdner
Copy link
Member

rdner commented Feb 20, 2024

Failing test case

TestFakeInputSuite/TestManager_OutputChange

Error message

Should be true

Build

https://buildkite.com/elastic/elastic-agent/builds/7268#018dc3d2-0f4b-42c2-bf41-1274db4e77e9

OS

Windows

Stacktrace and notes

=== Failed
=== FAIL: pkg/component/runtime TestFakeInputSuite/TestManager_OutputChange (3.51s)
{"log.level":"info","@timestamp":"2024-02-20T00:22:50.346Z","log.origin":{"file.name":"runtime/manager.go","file.line":213},"message":"Starting grpc control protocol listener on port 6789 with max_message_size 104857600","ecs.version":"1.6.0"}
    manager_fake_input_test.go:3276: component fake-0 state changed: {State:STARTING Message:Starting: spawned pid '508' Units:map[{UnitType:input UnitID:fake-input-0-0}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage:Starting unitPayload:map[] configStateIdx:0 err:<nil>} {UnitType:input UnitID:fake-input-0-1}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage:Starting unitPayload:map[] configStateIdx:0 err:<nil>} {UnitType:input UnitID:fake-input-0-2}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage:Starting unitPayload:map[] configStateIdx:0 err:<nil>}] Features:<nil> FeaturesIdx:0 Component:<nil> ComponentIdx:0 VersionInfo:{Name: Meta:map[] BuildHash:} expectedUnits:map[{UnitType:0 UnitID:fake-input-0-0}:{state:2 configStateIdx:1 config:0xc000a83710 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-1}:{state:2 configStateIdx:1 config:0xc0004d6ea0 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-2}:{state:2 configStateIdx:1 config:0xc000a838c0 err:<nil> logLevel:0}] expectedFeatures:<nil> expectedFeaturesIdx:1 expectedComponent:<nil> expectedComponentIdx:1}
{"log.level":"info","@timestamp":"2024-02-20T00:22:52.199Z","log.logger":"component.runtime.fake-0","log.origin":{"file.name":"runtime/manager.go","file.line":680},"message":"control checkin v2 protocol has chunking enabled","ecs.version":"1.6.0"}
    manager_fake_input_test.go:3276: component fake-0 state changed: {State:HEALTHY Message:Healthy: communicating with pid '508' Units:map[{UnitType:input UnitID:fake-input-0-0}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage: unitPayload:map[] configStateIdx:0 err:<nil>} {UnitType:input UnitID:fake-input-0-1}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage: unitPayload:map[] configStateIdx:0 err:<nil>} {UnitType:input UnitID:fake-input-0-2}:{State:STARTING Message:Starting: spawned pid '508' Payload:map[] unitState:0 unitMessage: unitPayload:map[] configStateIdx:0 err:<nil>}] Features:<nil> FeaturesIdx:0 Component:<nil> ComponentIdx:0 VersionInfo:{Name:fake Meta:map[input:fake] BuildHash:} expectedUnits:map[{UnitType:0 UnitID:fake-input-0-0}:{state:2 configStateIdx:1 config:0xc000a83710 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-1}:{state:2 configStateIdx:1 config:0xc0004d6ea0 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-2}:{state:2 configStateIdx:1 config:0xc000a838c0 err:<nil> logLevel:0}] expectedFeatures:<nil> expectedFeaturesIdx:1 expectedComponent:<nil> expectedComponentIdx:1}
    manager_fake_input_test.go:3276: component fake-0 state changed: {State:HEALTHY Message:Healthy: communicating with pid '508' Units:map[{UnitType:input UnitID:fake-input-0-0}:{State:HEALTHY Message:Fake Healthy 0-0 Payload:map[] unitState:2 unitMessage:Fake Healthy 0-0 unitPayload:map[] configStateIdx:1 err:<nil>} {UnitType:input UnitID:fake-input-0-1}:{State:HEALTHY Message:Fake Healthy 0-1 Payload:map[] unitState:2 unitMessage:Fake Healthy 0-1 unitPayload:map[] configStateIdx:1 err:<nil>} {UnitType:input UnitID:fake-input-0-2}:{State:HEALTHY Message:Fake Healthy 0-1 Payload:map[] unitState:2 unitMessage:Fake Healthy 0-1 unitPayload:map[] configStateIdx:1 err:<nil>}] Features:<nil> FeaturesIdx:1 Component:<nil> ComponentIdx:1 VersionInfo:{Name:fake Meta:map[input:fake] BuildHash:} expectedUnits:map[{UnitType:0 UnitID:fake-input-0-0}:{state:2 configStateIdx:1 config:0xc000a83710 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-1}:{state:2 configStateIdx:1 config:0xc0004d6ea0 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-2}:{state:2 configStateIdx:1 config:0xc000a838c0 err:<nil> logLevel:0}] expectedFeatures:<nil> expectedFeaturesIdx:1 expectedComponent:<nil> expectedComponentIdx:1}
{"log.level":"debug","@timestamp":"2024-02-20T00:22:52.835Z","log.origin":{"file.name":"runtime/manager.go","file.line":779},"message":"Stopping component \"fake-0\"","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2024-02-20T00:22:53.037Z","log.origin":{"file.name":"runtime/manager.go","file.line":830},"message":"component \"fake-0\" stopped.","ecs.version":"1.6.0"}
    manager_fake_input_test.go:3341: [subErrCh0] received: <nil>
    manager_fake_input_test.go:3284: component fake-1 state changed: {State:STARTING Message:Starting: spawned pid '3384' Units:map[{UnitType:input UnitID:fake-input-1-0}:{State:STARTING Message:Starting: spawned pid '3384' Payload:map[] unitState:0 unitMessage:Starting unitPayload:map[] configStateIdx:0 err:<nil>} {UnitType:input UnitID:fake-input-1-1}:{State:STARTING Message:Starting: spawned pid '3384' Payload:map[] unitState:0 unitMessage:Starting unitPayload:map[] configStateIdx:0 err:<nil>}] Features:<nil> FeaturesIdx:0 Component:<nil> ComponentIdx:0 VersionInfo:{Name: Meta:map[] BuildHash:} expectedUnits:map[{UnitType:0 UnitID:fake-input-1-0}:{state:2 configStateIdx:1 config:0xc0080234d0 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-1-1}:{state:2 configStateIdx:1 config:0xc008023830 err:<nil> logLevel:0}] expectedFeatures:<nil> expectedFeaturesIdx:1 expectedComponent:<nil> expectedComponentIdx:1}
    manager_fake_input_test.go:3276: component fake-0 state changed: {State:STOPPED Message:Stopped: pid '508' exited with code '1' Units:map[{UnitType:input UnitID:fake-input-0-0}:{State:STOPPED Message:Stopped: pid '508' exited with code '1' Payload:map[] unitState:2 unitMessage:Fake Healthy 0-0 unitPayload:map[] configStateIdx:1 err:<nil>} {UnitType:input UnitID:fake-input-0-1}:{State:STOPPED Message:Stopped: pid '508' exited with code '1' Payload:map[] unitState:2 unitMessage:Fake Healthy 0-1 unitPayload:map[] configStateIdx:1 err:<nil>} {UnitType:input UnitID:fake-input-0-2}:{State:STOPPED Message:Stopped: pid '508' exited with code '1' Payload:map[] unitState:2 unitMessage:Fake Healthy 0-1 unitPayload:map[] configStateIdx:1 err:<nil>}] Features:<nil> FeaturesIdx:1 Component:<nil> ComponentIdx:1 VersionInfo:{Name:fake Meta:map[input:fake] BuildHash:} expectedUnits:map[{UnitType:0 UnitID:fake-input-0-0}:{state:2 configStateIdx:1 config:0xc000a83710 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-1}:{state:2 configStateIdx:1 config:0xc0004d6ea0 err:<nil> logLevel:0} {UnitType:0 UnitID:fake-input-0-2}:{state:2 configStateIdx:1 config:0xc000a838c0 err:<nil> logLevel:0}] expectedFeatures:<nil> expectedFeaturesIdx:1 expectedComponent:<nil> expectedComponentIdx:1}
    manager_fake_input_test.go:3341: [subErrCh0] received: <nil>
{"log.level":"debug","@timestamp":"2024-02-20T00:22:53.331Z","log.origin":{"file.name":"runtime/manager.go","file.line":779},"message":"Stopping component \"fake-1\"","ecs.version":"1.6.0"}
    manager_fake_input_test.go:3370:
        	Error Trace:	C:/buildkite-agent/builds/bk-agent-prod-gcp-1708387354325171423/elastic/elastic-agent/pkg/component/runtime/manager_fake_input_test.go:3370
        	Error:      	Should be true
        	Test:       	TestFakeInputSuite/TestManager_OutputChange
@rdner rdner added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. labels Feb 20, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@rdner rdner changed the title [Flaky Test]: <test_case> – <short_error_message> [Flaky Test]: TestFakeInputSuite/TestManager_OutputChange – Should be true Feb 20, 2024
@AndersonQ
Copy link
Member

The following is my analysis:

This test aims that once a component model is running, upon receiving a new component model, which replaces a component, the component from the 1st model is stopped before the new component starts. In other words:

  • 1st config:
    • component A -> input-a
  • 2nd config:
    • component B -> input-b

The manager starts with '1st config' then receives '2nd config'. Once it receives '2nd config', component A should stop before component B starts.

The test failed because from the test's perspective component B started before component A stopped. However, most likely all happened in the expected order. Due to the concurrent nature of how components are started and stopped and also how the state is reported, the test likely got the information out of order.

When updating the component model, Manager.update does the actual update, starting and stopping components as needed. It first stops the components, then starts the new ones. Stopping a component is async as well as the subscription which reads and sends state changes to the test.

In addition to that, even thought the test receives state changes for both components in the same goroutine, it isn't guaranteed it'll always read first the state of the 1st component then the state of the second.

Given all the above I conclude it's expected, and not a failure for the test, to see the state changes out of order. It's better to check all the necessary transitions happened. That means, using the example from the beginning, component A is stopped and component B has started, ignoring the order.

@cmacknz
Copy link
Member

cmacknz commented Feb 20, 2024

That means, using the example from the beginning, component A is stopped and component B has started, ignoring the order.

The analysis of what is happening is great, but this suggestion defeats the purpose of the test, which is a regression test for a bug that caused real problems. We do need to verify that component A and component B are started and stopped in the correct order.

The original change was #2226 which is linked back to a support case with a failed upgrade because we attempted to start a second instance of endpoint before the first one had actually stopped. We need to maintain this property of the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants