-
Notifications
You must be signed in to change notification settings - Fork 788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't double increment session usage count in BrowserCrawler
#2908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR resolves an issue where sessions in BrowserCrawler were being marked as good twice, leading to a double increment of the session usage count. The changes remove the duplicate call to session.markGood in BrowserCrawler and add a new test to verify the correct handling.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
test/core/crawlers/browser_crawler.test.ts | Added a test to verify that the session usage count increments correctly. |
packages/browser-crawler/src/internals/browser-crawler.ts | Removed the duplicate call to session.markGood to prevent double increment. |
maxPoolSize: 1, | ||
}, | ||
requestHandler: async ({ session }) => { | ||
// @ts-expect-error Accessing private property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since usageCount
is private, should we even test for this? This whole is rather a code quality nit to me than a huge fix with user impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing private stuff does not make much sense. However, maybe we could add a read only getter for that property - it might be helpful for introspection and what not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit adds public getters for the Session
fields that appear in the Session.getState()
anyway.
While I like that we got rid of many ts-ignore-error
s in the tests (indicative of the fact that these fields shouldn't have been private
), I'm not completely sold on the code quality (e.g. underscoring some private fields - those with the getters - while keeping the rest as is). To a naked eye, it looks quite arbitrary and IMO bit messy.
Do you think this is acceptable way to achieve this, or you have a better pattern for those in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got a feeling there was a reason for this duplicity, but no idea what exactly 🙃 Have you checked the git history when it was added?
It's been there since the advent of BrowserCrawler since 2020. The reason was that originally BasicCrawler.runHandler(context) {
const session = newSession();
const context = { ... session ... };
await BrowserCrawler.runHandler(context) {
...
context.session = newSession();
...
context.session.markGood();
}
session.markGood();
} Note that the I fixed this (made all |
8d63a65
to
ea50f52
Compare
Nice detective work @barjin! |
Changes to the session handling in #1709 caused the same session in
BrowserCrawler
to be marked as good with two different calls. This removes the additional call (keeping only the one inBasicCrawler
), aligning theBrowserCrawler
behavior with the other crawler instances.Closes #2851