-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v24.3.x] Fix stepping down on timeout #24773
Open
mmaslankaprv
wants to merge
8
commits into
redpanda-data:v24.3.x
Choose a base branch
from
mmaslankaprv:manual-backport-24590-v24.3.x-659
base: v24.3.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[v24.3.x] Fix stepping down on timeout #24773
mmaslankaprv
wants to merge
8
commits into
redpanda-data:v24.3.x
from
mmaslankaprv:manual-backport-24590-v24.3.x-659
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The `raft::reply_result::follower_busy` is indicating that the follower was unable to process the heartbeat fast enough to generate a response. Renaming the reply from `timeout` will make it less confusing for the reader and differentiate the error code from an RPC timeout. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 6a1e34b)
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 95a29db)
Wired raft RPC service handler into Raft fixture to make the tests more accurate and cover the service code with tests. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 5f69d9b)
Propagating timeout to the node sending RPC request is crucial for accurate testing of Raft implementation. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 7d33bb5)
Added a wrapper around the `storage::log` allowing us to inject storage layer failures in Raft fixture tests. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit f04995a)
When follower is busy it may fail fast processing full heartbeat requests sent by the leader. In this case a follower RPC handler sets the `follower_busy` result in heartbeat_reply. Leader should still treat a follower replica as online in this case. The replica hosting node must be online to reply with the `follower_busy` error. This way we prevent to eager leader step downs when follower replicas are slow. Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 8b57b42)
Signed-off-by: Michał Maślanka <michal@redpanda.com> (cherry picked from commit 67e7c6e)
Fixed previously unstable test. Now the test simply blocks append entry requests from leader instead of relying on uncertain timeouts. Added waiting for enqueue of replicate requests to make sure the requests landed in the buffer before the leadership changed. Signed-off-by: Michał Maślanka <michal@redpanda.com>
mmaslankaprv
force-pushed
the
manual-backport-24590-v24.3.x-659
branch
from
January 10, 2025 16:14
f056ac4
to
db89ec7
Compare
CI test resultstest results on build#60601
|
bharathv
approved these changes
Jan 10, 2025
@@ -91,20 +91,31 @@ TEST_P_CORO(monitor_test_fixture, truncation_detection) { | |||
|
|||
for (auto& [id, node] : nodes()) { | |||
if (id == leader) { | |||
node->on_dispatch( | |||
[](model::node_id, raft::msg_type) { return ss::sleep(3s); }); | |||
node->on_dispatch([](model::node_id, raft::msg_type mt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this diff be in the dev
too? This is not specific to 24.3.x, right?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport of PR #24590