Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ThreadPoolMergeSchedulerTests testMergeSourceWithFollowUpMergesRunSequentially #126050

Merged
merged 4 commits into from
Apr 2, 2025

Conversation

albertzaharovits
Copy link
Contributor

Fixes #125639
Relates #120869

@albertzaharovits albertzaharovits added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 1, 2025
@albertzaharovits albertzaharovits self-assigned this Apr 1, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0 labels Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@albertzaharovits
Copy link
Contributor Author

FWIW here's the stack trace for the failure:

"TEST-ThreadPoolMergeSchedulerTests.testMergeSourceWithFollowUpMergesRunSequentially-seed#[43ADDD8E872EA775]" ID=3471 WAITING on java.util.concurrent.CountDownLatch$Sync@6e886e21
	at java.base@24/jdk.internal.misc.Unsafe.park(Native Method)
	- waiting on java.util.concurrent.CountDownLatch$Sync@6e886e21
	at java.base@24/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)
	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:789)
	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1138)
	at java.base@24/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler.close(ThreadPoolMergeScheduler.java:483)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests.testMergeSourceWithFollowUpMergesRunSequentially(ThreadPoolMergeSchedulerTests.java:187)
	at java.base@24/java.lang.invoke.LambdaForm$DMH/0x000000000a000c00.invokeVirtual(LambdaForm$DMH)
	at java.base@24/java.lang.invoke.LambdaForm$MH/0x000000000a120800.invoke(LambdaForm$MH)
	at java.base@24/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)
	at java.base@24/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154)
	at java.base@24/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base@24/java.lang.reflect.Method.invoke(Method.java:565)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1763)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
	at app//com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at app//com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
	at app//org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
	at app//org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
	at app//org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
	at app//org.junit.rules.RunRules.evaluate(RunRules.java:20)
	at app//com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
	at app//com.carrotsearch.randomizedtesting.ThreadLeakControl$$Lambda/0x000000000a39d5b0.run(Unknown Source)
	at java.base@24/java.lang.Thread.runWith(Thread.java:1460)
	at java.base@24/java.lang.Thread.run(Thread.java:1447)

"elasticsearch[test][merge][T#1]" ID=3476 WAITING on java.util.concurrent.Semaphore$NonfairSync@29f76e49
	at java.base@24/jdk.internal.misc.Unsafe.park(Native Method)
	- waiting on java.util.concurrent.Semaphore$NonfairSync@29f76e49
	at java.base@24/java.util.concurrent.locks.LockSupport.park(LockSupport.java:223)
	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:789)
	at java.base@24/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1138)
	at java.base@24/java.util.concurrent.Semaphore.acquire(Semaphore.java:318)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests.lambda$testMergeSourceWithFollowUpMergesRunSequentially$1(ThreadPoolMergeSchedulerTests.java:228)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeSchedulerTests$$Lambda/0x000000000ad4c208.answer(Unknown Source)
	at app//org.mockito.internal.stubbing.StubbedInvocationMatcher.answer(StubbedInvocationMatcher.java:42)
	at app//org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:103)
	at app//org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
	at app//org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)
	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:82)
	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:56)
	at app//org.mockito.internal.creation.bytebuddy.MockMethodInterceptor$DispatcherDefaultingToRealMethod.interceptAbstract(MockMethodInterceptor.java:161)
	at app//org.apache.lucene.index.MergeScheduler$MergeSource$MockitoMock$KxPeFHhn.merge(Unknown Source)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler.doMerge(ThreadPoolMergeScheduler.java:267)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeScheduler$MergeTask.run(ThreadPoolMergeScheduler.java:363)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.runMergeTask(ThreadPoolMergeExecutorService.java:195)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService.lambda$enqueueMergeTaskExecution$3(ThreadPoolMergeExecutorService.java:167)
	at app//org.elasticsearch.index.engine.ThreadPoolMergeExecutorService$$Lambda/0x000000000ad490c8.run(Unknown Source)
	at app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977)
	at java.base@24/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095)
	at java.base@24/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619)
	at java.base@24/java.lang.Thread.runWith(Thread.java:1460)
	at java.base@24/java.lang.Thread.run(Thread.java:1447)
	Locked synchronizers:
	- java.util.concurrent.ThreadPoolExecutor$Worker@70409d3f

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Not entirely certain how it fixes the specific failure but looks good regardless.

@albertzaharovits
Copy link
Contributor Author

Not entirely certain how it fixes the specific failure but looks good regardless.

Yeah, it was a head-scratcher.
There was a race between the test thread and the merge thread(s). The merge thread uses the runMergeIdx variable to verify that the follow-up merges are executed in the expected order, and the test thread uses that same variable to know when all merges finished executing. But runMergeIdx can mean either the index of the merge currently running or of the one that just finished, depending on when it's checked.
The test dead-locks when the test thread thinks all merges are done, but there is a last one that still needs to run.

@albertzaharovits albertzaharovits merged commit e934600 into elastic:main Apr 2, 2025
17 checks passed
@albertzaharovits albertzaharovits deleted the fix-125639 branch April 2, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team >test Issues or PRs that are addressing/adding tests v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] ThreadPoolMergeSchedulerTests testMergeSourceWithFollowUpMergesRunSequentially failing
3 participants