Skip to content

Conversation

sanych-sun
Copy link
Member

@sanych-sun sanych-sun commented Sep 5, 2025

Some CSOT tests are require pre-heat connection pool to have a predictable test outcome. The suggestion is to use minPoolSize together with new setting awaitMinPoolSize that makes unified test runner to await until pool will be populated before run the actual test case.

Please complete the following before merging:

  • Update changelog.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, and sharded
    clusters).

@sanych-sun sanych-sun requested review from a team as code owners September 5, 2025 23:38
@sanych-sun sanych-sun requested review from jmikola and removed request for a team September 5, 2025 23:38
@@ -2,7 +2,7 @@

description: "timeoutMS can be configured on a MongoClient"

schemaVersion: "1.9"
schemaVersion: "1.26"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prestonvasquez I've updated this file only with the new approach to pre-heat the pool. Should we apply similar settings to more tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest updating the runCursorCommand "Non-tailable cursor lifetime remaining timeoutMS applied to getMore if timeoutMode is unset" test, which would close DRIVERS-3134 (PR https://github.com/mongodb/specifications/pull/1769/files).

Otherwise, I will collect a list of candidates during Go Driver's second implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the test and list of candidates would be much appreciated.

Copy link
Member

@prestonvasquez prestonvasquez Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this functionality to any CSOT test that (1) doesn't require threads,(2) relies on a failpoint, and (3) uses timeoutMS either as an operation argument or on the client. The following seem relevant:

  • retryability-timeoutMS.yml
  • command-execution.jyml
  • non-tailable-cursors.yml
  • error-transformations.yml
  • convenient-transactions.yml
  • sessions-inherit-timeoutMS.yml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -2,7 +2,7 @@

description: "timeoutMS can be configured on a MongoClient"

schemaVersion: "1.9"
schemaVersion: "1.26"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest updating the runCursorCommand "Non-tailable cursor lifetime remaining timeoutMS applied to getMore if timeoutMode is unset" test, which would close DRIVERS-3134 (PR https://github.com/mongodb/specifications/pull/1769/files).

Otherwise, I will collect a list of candidates during Go Driver's second implementation.

Comment on lines 547 to 549
- `awaitMinPoolSize`: Optional boolean. If `true`, the unified spec runner must wait for the connection pool to be
populated for all servers according to the `minPoolSize` option. If `false`, not specified, or if minPoolSize
equals 0, there is no need to wait for any specific pool state.
Copy link
Member

@prestonvasquez prestonvasquez Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drivers that asynchronously construct the pool size with minPoolSize after client construction could make this check block indefinitely. We should prescribe an appropriate amount of time to await warming the connection, perhaps 500ms? From experimentation, actual connection time seems to be around 50-200ms on local networks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put limit to 1 second, we can tune it in future if needed. In case if this is not enough, should we throw and report test as failed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, thank you

Copy link
Contributor

@baileympearson baileympearson Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a downside to setting a higher limit? Worst case, the tests takes a bit longer to fail. Best case, we avoid some flakiness and modifications of these tests if 1s proves to be too short.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could give a recommended timeoutMS based on the 20-500ms observation for 8.1 and let drivers update as needed. Does that seem reasonable? The important distinction is that we don't block indefinitely listening to the event monitor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly but I do think that if we're debating "is the timeout high enough" and there aren't downsides of setting it higher, we should just set it higher now to avoid test updates and test syncs in the future if the timeout value doesn't end up being high enough.

Copy link
Member Author

@sanych-sun sanych-sun Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it resilient we might want to use formula like: waitLimit = minPoolSize * 10s. This way we will automatically extent the limit if there should be more connection to establish.
@prestonvasquez @baileympearson WDYT? Should I add this to spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baileympearson I've updated text to wait for minPoolSize * 10s. I'm positive if connection could not be established during 10seconds - then we really have a problem, and there is no sense to wait any more. Does it sounds as reliable solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me. 10s seems plenty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could specify the value in the spec test. For example, instead of an awaitMinPoolSize boolean, we could add an awaitMinPoolSizeMS integer.

E.g. to wait for 10 seconds:

awaitMinPoolSizeMS: 10000

@@ -2,7 +2,7 @@

description: "timeoutMS can be configured on a MongoClient"

schemaVersion: "1.9"
schemaVersion: "1.26"
Copy link
Member

@prestonvasquez prestonvasquez Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this functionality to any CSOT test that (1) doesn't require threads,(2) relies on a failpoint, and (3) uses timeoutMS either as an operation argument or on the client. The following seem relevant:

  • retryability-timeoutMS.yml
  • command-execution.jyml
  • non-tailable-cursors.yml
  • error-transformations.yml
  • convenient-transactions.yml
  • sessions-inherit-timeoutMS.yml

@sanych-sun sanych-sun requested review from prestonvasquez and matthewdale and removed request for prestonvasquez September 12, 2025 18:08
Copy link
Contributor

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants