-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Under load, the breaker doesn’t guarantee the exact number of failures before switching to the open state. #91
Comments
A proposal for handling peak loads and traffic spikes.Let's look at above and sum what we have.
So how can we tune Scenario 2 to behave more like Scenario 1? Let’s figure out what we need to change. Peak load flattening algorithm.
Or even more easy if we don't add a rate limiter to the sony CB:
In both the tradeoffs will be:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Circuit Breaker doesn’t track how many requests are sent. If too many go at once, the service gets overloaded and response time grows a lot. As a result, you only get timeouts — no request finishes successfully.
If you increase the timeout, the queue grows — potentially unlimited, consuming memory.
🧪 Example Scenario
In my case, PostgreSQL responded in about 72 ms without load, but under load, response times increased to over 30–40 seconds due to an exhausted connection pool and huge queue.
We simulate a failing service:
Running a stress test with k6 at 1000 RPS, the first error will only be reported after 30 seconds —
after 30,000 in-flight requests!
Why? Because the Circuit Breaker transitions state based on results, not calls in flight or slow calls.
Until a response is received, no failure is registered:
1. ✅ When CB Works fine
2. ❌ When It Breaks (e.g. overloaded protected resource)
💡 Possible Improvements
Track slow requests and trigger state to open
resilience4j uses SLOW_CALL_RATE_THRESHOLD
* Configures a threshold in percentage. The CircuitBreaker considers a call as slow when
* the call duration is greater than slowCallDurationThreshold(Duration). When the
* percentage of slow calls is equal to or greater than the threshold, the CircuitBreaker
* transitions to open and starts short-circuiting calls.
Monitor In-Flight Load
Use a ratio of
in-flight / total
or just limit in-flight.Example from Elasticsearch
Predictive Backpressure
Track error thresholds over time.
Proactively reject new requests before overload.
See the comment with the proposal — it’s a tough but necessary way to handle this properly.
Yes, I know throttling and rate-limiting aren’t typically part of a circuit breaker, but we live in the real world where smart
solutions are required.
📝 Documentation highlight needed. This behavior should be clearly documented.
Copy-pasted from a linked resource in repo README
📝 Artefacts
circuit breaker config for postgres used:
k6 stress config used:
The text was updated successfully, but these errors were encountered: