Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't double increment session usage count in BrowserCrawler #2908

Merged
merged 3 commits into from
Apr 2, 2025

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Apr 1, 2025

Changes to the session handling in #1709 caused the same session in BrowserCrawler to be marked as good with two different calls. This removes the additional call (keeping only the one in BasicCrawler), aligning the BrowserCrawler behavior with the other crawler instances.

Closes #2851

@barjin barjin requested a review from Copilot April 1, 2025 12:15
@barjin barjin self-assigned this Apr 1, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR resolves an issue where sessions in BrowserCrawler were being marked as good twice, leading to a double increment of the session usage count. The changes remove the duplicate call to session.markGood in BrowserCrawler and add a new test to verify the correct handling.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
test/core/crawlers/browser_crawler.test.ts Added a test to verify that the session usage count increments correctly.
packages/browser-crawler/src/internals/browser-crawler.ts Removed the duplicate call to session.markGood to prevent double increment.

@github-actions github-actions bot added this to the 111th sprint - Tooling team milestone Apr 1, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 1, 2025
maxPoolSize: 1,
},
requestHandler: async ({ session }) => {
// @ts-expect-error Accessing private property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since usageCount is private, should we even test for this? This whole is rather a code quality nit to me than a huge fix with user impact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing private stuff does not make much sense. However, maybe we could add a read only getter for that property - it might be helpful for introspection and what not

Copy link
Contributor Author

@barjin barjin Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit adds public getters for the Session fields that appear in the Session.getState() anyway.

While I like that we got rid of many ts-ignore-errors in the tests (indicative of the fact that these fields shouldn't have been private), I'm not completely sold on the code quality (e.g. underscoring some private fields - those with the getters - while keeping the rest as is). To a naked eye, it looks quite arbitrary and IMO bit messy.

Do you think this is acceptable way to achieve this, or you have a better pattern for those in mind?

@barjin barjin assigned B4nan and unassigned B4nan Apr 1, 2025
@barjin barjin requested review from B4nan and janbuchar April 2, 2025 06:31
Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a feeling there was a reason for this duplicity, but no idea what exactly 🙃 Have you checked the git history when it was added?

@barjin
Copy link
Contributor Author

barjin commented Apr 2, 2025

when it was added?

It's been there since the advent of BrowserCrawler since 2020.

The reason was that originally BasicCrawler and BrowserCrawler were touching different session instances:

BasicCrawler.runHandler(context) {
  const session = newSession();
  const context = { ... session ...  };
  await BrowserCrawler.runHandler(context) {
     ...
     context.session = newSession();
     ...
     context.session.markGood();
  }

  session.markGood();
}

Note that the session in the BasicCrawler handler was not referencing the same object as context.session at the end of the function.

I fixed this (made all BasicCrawler references to session explicitly into context.session in #1709). Back then I missed the rogue markGood call in BrowserCrawler. This PR fixes this omission.

@barjin barjin force-pushed the fix/browser/session-usage-increment branch from 8d63a65 to ea50f52 Compare April 2, 2025 09:41
@janbuchar
Copy link
Contributor

Nice detective work @barjin!

@barjin barjin merged commit 3107e55 into master Apr 2, 2025
9 checks passed
@barjin barjin deleted the fix/browser/session-usage-increment branch April 2, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Browser-based crawlers double increment session.usageCount
3 participants