Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pendingTimeouts may be incorrect in the HashedWheelTimer #3173

Open
1 task done
gagaradio opened this issue Mar 24, 2025 · 5 comments · May be fixed by #3174
Open
1 task done

[BUG] pendingTimeouts may be incorrect in the HashedWheelTimer #3173

gagaradio opened this issue Mar 24, 2025 · 5 comments · May be fixed by #3174
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@gagaradio
Copy link

gagaradio commented Mar 24, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

If a Timeout in HashedWheelTimer has been transferred to a bucket and is canceled before it is actually expired in the goal tick, pendingTimeouts may be repeatedly decremented.
If the user sets a positive maxPendingTimeouts, this bug means that the maximum number of tasks can be potentially increased.

Expected Behavior

The pendingTimeouts in HashedWheelTimer can correctly reflect the number of waiting tasks.

Steps To Reproduce

The following test code reproduces the bug.

final HashedWheelTimer timer = new HashedWheelTimer();
final CountDownLatch barrier = new CountDownLatch(1);
// A total of 11 timeouts with the same delay are submitted here, and they will be processed in the same tick.
timer.newTimeout(t -> {
    barrier.countDown();
    Thread.sleep(1000);
}, 200, TimeUnit.MILLISECONDS);
List<Timeout> timeouts = new ArrayList<>();
for (int i = 0; i < 10; i++) {
    timeouts.add(timer.newTimeout(createNoOpTimerTask(), 200, TimeUnit.MILLISECONDS));
}
barrier.await();
// The simulation here is that the timeout has been transferred to a bucket and is canceled before it is
// actually expired in the goal tick.
timeouts.forEach(Timeout::cancel);
Thread.sleep(2000);
assertEquals(0, timer.pendingTimeouts());
timer.stop();

Got the following test failures and ended up with a negative pendingTimeouts.

Expected :0
Actual   :-10

Environment

HertzBeat version(s): The latest master branch.

Debug logs

No response

Anything else?

No response

@gagaradio gagaradio added the bug Something isn't working label Mar 24, 2025
@gagaradio gagaradio changed the title [BUG] pendingTimeouts of the HashedWheelTimer may be potentially magnified [BUG] pendingTimeouts may be incorrect in the HashedWheelTimer Mar 24, 2025
@tomsun28 tomsun28 added the good first issue Good for newcomers label Mar 25, 2025
@PengJingzhao
Copy link
Contributor

I was about to create a PR to fix this bug, but I found that the bug no longer exists. So, are you still encountering this issue?

Image

@gagaradio
Copy link
Author

I was about to create a PR to fix this bug, but I found that the bug no longer exists. So, are you still encountering this issue?

Image

Hi. We may not be talking about the same situation. What I encountered was: A Timeout has been transferred to a bucket and is manually cancelled by the user before it is actually expired in the goal tick.
This may be rare, but it does happen, especially when there are a lot of tasks in the same bucket.

@harshita2626
Copy link

This appears to be a bug in the HashedWheelTimer implementation where canceled timeouts can cause incorrect pendingTimeouts counts, potentially leading to negative values.

Root Cause

1.Multiple timeouts are scheduled for the same tick

2.These timeouts get transferred to their target bucket

3.They are canceled before the wheel actually processes that bucket

4.The pendingTimeouts counter is decremented for each cancellation, but wasn't properly accounted for in this state

Impact
1.Incorrect Metrics: pendingTimeouts becomes negative, failing to accurately reflect the actual state

2.Resource Control Bypass: If maxPendingTimeouts is set, this bug could allow exceeding the intended limit

Suggested Fixes
Option 1: Modify the cancellation logic

public void cancel() {
    // Only decrement if the timeout hasn't been transferred to a bucket yet
    if (bucket == null) {
        timer.pendingTimeouts.decrementAndGet();
    }
// Rest of cancellation logic...

}
Option 2: Adjust bucket transfer accounting

// When transferring timeouts to buckets:
void transferTimeoutsToBuckets() {
    for (int i = 0; i < 100000; i++) {
        Timeout timeout = timeouts.poll();
        if (timeout == null) {
            break;
        }
        if (timeout.state() != CANCELLED) {
            // Only count non-canceled timeouts
            long calculated = timeout.deadline / tickDuration;
            timeout.remainingRounds = (calculated - tick) / wheel.length;
            final long ticks = Math.max(calculated, tick);
            int stopIndex = (int) (ticks & mask);
            Bucket bucket = wheel[stopIndex];
            bucket.addTimeout(timeout);
        }
    }
}

@gagaradio
Copy link
Author

Hi @harshita2626,

Thank you for your reply. In my opinion, the root cause of this bug is that the method is not cohesive enough and the responsibility is not single enough.

Consider the following code. remove(timeout) is called before timeout.expire(), which loses the atomicity of the task state judgment in timeout.expire(). In addition, expireTimeouts(long) includes handling of the cancel action, but the right thing to do is concentrate the cancel handling in processCancelledTasks().

void expireTimeouts(long deadline) {
    HashedWheelTimeout timeout = head;
    while (timeout != null) {
        HashedWheelTimeout next = timeout.next;
        if (timeout.remainingRounds <= 0) {
            // The remove operation is performed and the `pendingTimeouts` is decreased before the task state is checked.
            next = remove(timeout);
            if (timeout.deadline <= deadline) {
                timeout.expire();
            } else {
                throw new IllegalStateException(String.format(
                        "timeout.deadline (%d) > deadline (%d)", timeout.deadline, deadline));
            }
        } else if (timeout.isCancelled()) {
            // The handling of cancel action is scattered in `processCancelledTasks()` and here.
            next = remove(timeout);
        } else {
            timeout.remainingRounds--;
        }
        timeout = next;
    }
}

For this, I have fixed and created a pr #3174 .

@harshita2626
Copy link

okay !good to know that.Thank you for sharing your insightful analysis

Hi @harshita2626,

Thank you for your reply. In my opinion, the root cause of this bug is that the method is not cohesive enough and the responsibility is not single enough.

Consider the following code. remove(timeout) is called before timeout.expire(), which loses the atomicity of the task state judgment in timeout.expire(). In addition, expireTimeouts(long) includes handling of the cancel action, but the right thing to do is concentrate the cancel handling in processCancelledTasks().

void expireTimeouts(long deadline) {
    HashedWheelTimeout timeout = head;
    while (timeout != null) {
        HashedWheelTimeout next = timeout.next;
        if (timeout.remainingRounds <= 0) {
            // The remove operation is performed and the `pendingTimeouts` is decreased before the task state is checked.
            next = remove(timeout);
            if (timeout.deadline <= deadline) {
                timeout.expire();
            } else {
                throw new IllegalStateException(String.format(
                        "timeout.deadline (%d) > deadline (%d)", timeout.deadline, deadline));
            }
        } else if (timeout.isCancelled()) {
            // The handling of cancel action is scattered in `processCancelledTasks()` and here.
            next = remove(timeout);
        } else {
            timeout.remainingRounds--;
        }
        timeout = next;
    }
}

For this, I have fixed and created a pr #3174 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
4 participants