Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.6](backport #2003) Add configurable numbness for component restarts #2025

Merged
merged 2 commits into from
Dec 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 37 additions & 37 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5598,6 +5598,43 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : golang.org/x/time
Version: v0.3.0
Licence type (autodetected): BSD-3-Clause
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/golang.org/x/time@v0.3.0/LICENSE:

Copyright (c) 2009 The Go Authors. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : golang.org/x/tools
Version: v0.1.9
Expand Down Expand Up @@ -15430,43 +15467,6 @@ THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : golang.org/x/time
Version: v0.0.0-20210723032227-1f47c861a9ac
Licence type (autodetected): BSD-3-Clause
--------------------------------------------------------------------------------

Contents of probable licence file $GOMODCACHE/golang.org/x/time@v0.0.0-20210723032227-1f47c861a9ac/LICENSE:

Copyright (c) 2009 The Go Authors. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


--------------------------------------------------------------------------------
Dependency : golang.org/x/xerrors
Version: v0.0.0-20200804184101-5ec99f83aff1
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ require (
golang.org/x/lint v0.0.0-20210508222113-6edffad5e616
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8
golang.org/x/time v0.3.0
golang.org/x/tools v0.1.9
google.golang.org/grpc v1.46.0
google.golang.org/protobuf v1.28.0
Expand Down Expand Up @@ -128,7 +129,6 @@ require (
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 // indirect
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
golang.org/x/text v0.3.7 // indirect
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac // indirect
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20220426171045-31bebdecfb46 // indirect
Expand Down
3 changes: 2 additions & 1 deletion go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1636,8 +1636,9 @@ golang.org/x/time v0.0.0-20191024005414-555d28b269f0/go.mod h1:tRJNPiyCQ0inRvYxb
golang.org/x/time v0.0.0-20200416051211-89c76fbcd5d1/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20200630173020-3af7569d3a1e/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20210220033141-f8bda1e9f3ba/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac h1:7zkz7BUtwNFFqcowJ+RIgu2MaV/MapERkDIy+mwPyjs=
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/time v0.3.0 h1:rg5rLMjNzMS1RkNLzCG38eapWhnYLFYXDXj2gOlr8j4=
golang.org/x/time v0.3.0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
golang.org/x/tools v0.0.0-20180221164845-07fd8470d635/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20180828015842-6cd1fcedba52/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
Expand Down
33 changes: 29 additions & 4 deletions pkg/component/runtime/command.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ import (
"time"

"go.uber.org/zap/zapcore"
"golang.org/x/time/rate"

"github.com/elastic/elastic-agent-client/v7/pkg/client"

Expand Down Expand Up @@ -72,6 +73,7 @@ type CommandRuntime struct {
state ComponentState
lastCheckin time.Time
missedCheckins int
restartBucket *rate.Limiter
}

// NewCommandRuntime creates a new command runtime for the provided component.
Expand Down Expand Up @@ -99,6 +101,9 @@ func NewCommandRuntime(comp component.Component, logger *logger.Logger, monitor
c.logStd = createLogWriter(c.current, c.getCommandSpec(), c.getSpecType(), c.getSpecBinaryName(), ll, unitLevels, logSourceStdout)
ll, unitLevels = getLogLevels(comp) // don't want to share mapping of units (so new map is generated)
c.logErr = createLogWriter(c.current, c.getCommandSpec(), c.getSpecType(), c.getSpecBinaryName(), ll, unitLevels, logSourceStderr)

c.restartBucket = newRateLimiter(cmdSpec.RestartMonitoringPeriod, cmdSpec.MaxRestartsPerPeriod)

return c, nil
}

Expand Down Expand Up @@ -352,7 +357,6 @@ func (c *CommandRuntime) stop(ctx context.Context) error {

// cleanup reserved resources related to monitoring
defer c.monitor.Cleanup(c.current.ID) //nolint:errcheck // this is ok

cmdSpec := c.getCommandSpec()
go func(info *process.Info, timeout time.Duration) {
t := time.NewTimer(timeout)
Expand Down Expand Up @@ -391,9 +395,14 @@ func (c *CommandRuntime) startWatcher(info *process.Info, comm Communicator) {
func (c *CommandRuntime) handleProc(state *os.ProcessState) bool {
switch c.actionState {
case actionStart:
// should still be running
stopMsg := fmt.Sprintf("Failed: pid '%d' exited with code '%d'", state.Pid(), state.ExitCode())
c.forceCompState(client.UnitStateFailed, stopMsg)
if c.restartBucket != nil && c.restartBucket.Allow() {
stopMsg := fmt.Sprintf("Suppressing FAILED state due to restart for '%d' exited with code '%d'", state.Pid(), state.ExitCode())
c.forceCompState(client.UnitStateStopped, stopMsg)
} else {
// report failure only if bucket is full of restart events
stopMsg := fmt.Sprintf("Failed: pid '%d' exited with code '%d'", state.Pid(), state.ExitCode())
c.forceCompState(client.UnitStateFailed, stopMsg)
}
return true
case actionStop, actionTeardown:
// stopping (should have exited)
Expand Down Expand Up @@ -535,3 +544,19 @@ func dirPath(path string) process.CmdOption {
return nil
}
}

func newRateLimiter(restartMonitoringPeriod time.Duration, maxEventsPerPeriod int) *rate.Limiter {
if restartMonitoringPeriod <= 0 || maxEventsPerPeriod <= 0 {
return nil
}

freq := restartMonitoringPeriod.Seconds()
events := float64(maxEventsPerPeriod)
perSecond := events / freq
if perSecond > 0 {
bucketSize := rate.Limit(perSecond)
return rate.NewLimiter(bucketSize, maxEventsPerPeriod)
}

return nil
}
46 changes: 46 additions & 0 deletions pkg/component/runtime/command_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
// or more contributor license agreements. Licensed under the Elastic License;
// you may not use this file except in compliance with the Elastic License.

package runtime

import (
"testing"
"time"

"github.com/stretchr/testify/require"
)

func TestAddToBucket(t *testing.T) {
testCases := map[string]struct {
bucketSize int
add int
addSleep time.Duration
shouldBlock bool
}{
"no error": {1, 0, 1 * time.Millisecond, false},
"error within limit": {1, 1, 1 * time.Millisecond, false},
"errors > than limit but across timespans": {1, 2, 80 * time.Millisecond, false},
"errors > than limit within timespans, exact bucket size": {2, 2, 2 * time.Millisecond, false},
"errors > than limit within timespans, off by one": {2, 3, 2 * time.Millisecond, true},
"errors > than limit within timespans": {2, 4, 2 * time.Millisecond, true},
}

for name, tc := range testCases {
t.Run(name, func(t *testing.T) {
dropRate := 50 * time.Millisecond
b := newRateLimiter(dropRate, tc.bucketSize)

blocked := false
b.Allow()
<-time.After(dropRate + 20*time.Millisecond) // init ticker

for i := 0; i < tc.add; i++ {
wasBlocked := !b.Allow()
blocked = blocked || wasBlocked
<-time.After(tc.addSleep)
}
require.Equal(t, tc.shouldBlock, blocked)
})
}
}
10 changes: 6 additions & 4 deletions pkg/component/spec.go
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,12 @@ type RuntimePreventionSpec struct {

// CommandSpec is the specification for an input that executes as a subprocess.
type CommandSpec struct {
Args []string `config:"args,omitempty" yaml:"args,omitempty"`
Env []CommandEnvSpec `config:"env,omitempty" yaml:"env,omitempty"`
Timeouts CommandTimeoutSpec `config:"timeouts" yaml:"timeouts"`
Log CommandLogSpec `config:"log" yaml:"log"`
Args []string `config:"args,omitempty" yaml:"args,omitempty"`
Env []CommandEnvSpec `config:"env,omitempty" yaml:"env,omitempty"`
Timeouts CommandTimeoutSpec `config:"timeouts" yaml:"timeouts"`
Log CommandLogSpec `config:"log" yaml:"log"`
RestartMonitoringPeriod time.Duration `config:"restart_monitoring_period,omitempty" yaml:"restart_monitoring_period,omitempty"`
MaxRestartsPerPeriod int `config:"maximum_restarts_per_period,omitempty" yaml:"maximum_restarts_per_period,omitempty"`
}

// CommandEnvSpec is the specification that defines environment variables that will be set to execute the subprocess.
Expand Down
14 changes: 8 additions & 6 deletions specs/auditbeat.spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@ inputs:
- kafka
- logstash
- redis
command:
args: &args
command: &command
restart_monitoring_period: 5s
maximum_restarts_per_period: 1
timeouts:
restart: 1s
args:
- "-E"
- "setup.ilm.enabled=false"
- "-E"
Expand All @@ -37,11 +41,9 @@ inputs:
description: "Audit File Integrity"
platforms: *platforms
outputs: *outputs
command:
args: *args
command: *command
- name: audit/system
description: "Audit System"
platforms: *platforms
outputs: *outputs
command:
args: *args
command: *command
14 changes: 8 additions & 6 deletions specs/cloudbeat.spec.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@ inputs:
- kafka
- logstash
- redis
command:
args: &args
command: &command
restart_monitoring_period: 5s
maximum_restarts_per_period: 1
timeouts:
restart: 1s
args:
- "-E"
- "setup.ilm.enabled=false"
- "-E"
Expand All @@ -35,11 +39,9 @@ inputs:
description: "CIS Kubernetes monitoring"
platforms: *platforms
outputs: *outputs
command:
args: *args
command: *command
- name: cloudbeat/cis_eks
description: "CIS elastic Kubernetes monitoring"
platforms: *platforms
outputs: *outputs
command:
args: *args
command: *command
Loading