Skip to content

Conversation

nitpicker55555
Copy link
Collaborator

@nitpicker55555 nitpicker55555 commented Aug 22, 2025

Description

Current implemented SOM screenshot:
index_0828223614_som

other open source implementation
截屏2025-08-27 21 42 16

This pr generated SOM screenshot
index_0828213427_som

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Replaces manual SVG overlay with a labeler-driven SOM injection path, adds hierarchy-based clickable filtering and a fullVisualMode sentinel, centralizes snapshot retrieval via getSnapshotForAction, wires fullVisualMode through Python↔TS, improves TS server readiness/logging, and tightens popup/DOM-stability and click/new-tab handling.

Changes

Cohort / File(s) Summary
TS toolkit core (actions / snapshot / overlay)
camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts
Adds fullVisualMode wiring and getSnapshotForAction; refactors getSomScreenshot to parse clickables (including [active]), apply filterClickableByHierarchy, and call SomScreenshotInjected.captureOptimized(..., undefined); removes manual SVG overlay helpers; adds runtime logging and timing.
TS session logic (AI snapshot / click / popup / DOM stability)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts
Adds snapshot parsing/indexing (role/text), preserves/merges coordinates, implements waitForDOMStability, improves performClick with popup/new-tab listener and optional forced-blank behavior, and uses viewport-relative bounding logic.
TS injected SOM + parsers/filters
camel/toolkits/hybrid_browser_toolkit/ts/src/som-screenshot-injected.ts, camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts, camel/toolkits/hybrid_browser_toolkit/ts/src/parent-child-filter.ts
New SomScreenshotInjected.captureOptimized injects labels into the page and returns VisualMarkResult with timing; new parseSnapshotHierarchy + filterClickableByHierarchy build snapshot hierarchy and prune clickables; new filterParentChildElements/ElementInfo provide parent–child filtering utilities.
TS config & WS server surface
camel/toolkits/hybrid_browser_toolkit/ts/src/config-loader.ts, camel/toolkits/hybrid_browser_toolkit/ts/websocket-server.js
Adds fullVisualMode to WebSocket config; get_snapshot_for_ai now accepts includeCoordinates and viewportLimit and delegates to session; shutdown performs guarded browser/server close with deferred exit.
Python config and toolkit constructors
camel/toolkits/hybrid_browser_toolkit/config_loader.py, camel/toolkits/hybrid_browser_toolkit/hybrid_browser_toolkit.py, camel/toolkits/hybrid_browser_toolkit/hybrid_browser_toolkit_ts.py
Adds BrowserConfig.full_visual_mode, maps fullVisualMode in/from kwargs and WS config, surfaces full_visual_mode in constructors and forwards it to TS backend; avoids in-place mutation of click responses.
Python WS wrapper (TS process logging & readiness)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py
Adds async TS log reader and file-backed TS logs, _server_ready_future readiness coordination, merges stderr→stdout, exposes ts_log_file_path/handle, manages log-reader lifecycle, and improves startup/shutdown and timeout handling.
Example update
examples/toolkits/hybrid_browser_toolkit_example.py
Removes FileWriteToolkit/filesystem output, sets viewport_limit=True, narrows tools, updates prompt text, and ensures browser_close() via try/finally.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant PyToolkit as PythonToolkit
  participant TSServer
  participant Session
  participant Page

  rect rgba(230,240,255,0.35)
  note over User,Page: Action flow (fullVisualMode-aware)
  User->>PyToolkit: browser_click / visit_page / ...
  PyToolkit->>TSServer: send WS command (includes fullVisualMode)
  TSServer->>Session: performAction(...)
  Session->>Session: getSnapshotForAction(viewportLimit)
  alt fullVisualMode = true
    Session-->>TSServer: snapshot = "full visual mode" sentinel
  else
    Session->>Page: collect page state / compute snapshot
    Session-->>TSServer: snapshot (real)
  end
  TSServer-->>PyToolkit: { result, snapshot, tabs... }
  PyToolkit-->>User: structured response
  end
Loading
sequenceDiagram
  autonumber
  participant Session
  participant Page
  participant Parser as HierarchyParser
  participant Overlay as SomScreenshotInjected

  rect rgba(235,255,235,0.35)
  note over Session,Overlay: SOM overlay generation path
  Session->>Page: fetch page + AI snapshot
  Session->>Parser: filterClickableByHierarchy(snapshotText, clickables)
  Parser-->>Session: filteredClickableSet
  Session->>Overlay: captureOptimized(page, snapshotResult, filteredClickableSet, undefined)
  Overlay-->>Session: { imageDataURL, timing, markedElements }
  end
Loading
sequenceDiagram
  autonumber
  participant Session
  participant Page
  participant NewTab as PopupPage

  rect rgba(255,245,230,0.35)
  note over Session,NewTab: Click with popup/new-tab handling and DOM stability wait
  Session->>Page: performClick(target)
  alt popup/new-tab detected
    Session->>NewTab: wait for DOMContentLoaded
    par
      Session->>NewTab: waitForDOMStability(~100ms quiet)
      Session->>NewTab: wait for network idle
    and
      Session->>NewTab: detect content/interactive elements
    end
  else
    Session->>Page: waitForDOMStability / stability checks
  end
  Session-->>Session: continue workflow
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement, Review Required

Suggested reviewers

  • hu-xianglong

Poem

🐇
I hopped through snapshots, tiny and bright,
Pasted labels that danced in soft light.
Tabs pop open, filters trim the mess,
Full-visual sentinel keeps the view blessed.
Snip-snap—new marks, and off I sprint!

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/browser_som_screenshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (12)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (6)

4-11: Remove or utilize isClickable to avoid dead fields.

isClickable exists on ElementWithCoordinates but is not used anywhere in this module. Either remove it to avoid drift or use it to vary styling (e.g., color/opacity) for future non-clickable overlays.

 interface ElementWithCoordinates {
   ref: string;
   x: number;
   y: number;
   width: number;
   height: number;
-  isClickable: boolean;
 }

85-127: Label bounds use a fixed width; long refs will overflow and occupancy checks become inaccurate.

Both the rendered background rect (lines 429–433) and the collision checks (lines 106–121) assume LABEL_MIN_WIDTH. For multi-character refs (common), text will overflow and overlap logic underestimates occupied space.

Apply dynamic label width based on an approximate text width. This keeps occupancy and rendering consistent:

+  private readonly LABEL_MAX_WIDTH = 160;
+  private estimateTextWidth(text: string): number {
+    // Approximate 11px Arial at ~6px per char plus padding
+    const approx = this.LABEL_PADDING * 2 + Math.ceil(text.length * 6);
+    return Math.max(this.LABEL_MIN_WIDTH, Math.min(this.LABEL_MAX_WIDTH, approx));
+  }
-    sortedElements.forEach(element => {
+    sortedElements.forEach(element => {
+      const labelWidth = this.estimateTextWidth(element.ref);
       // Try different positions within the element
       const positions = [
         { x: element.x + this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT }, // Top-left
-        { x: element.x + element.width - this.LABEL_MIN_WIDTH - this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT }, // Top-right
+        { x: element.x + element.width - labelWidth - this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT }, // Top-right
         { x: element.x + this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-left
-        { x: element.x + element.width - this.LABEL_MIN_WIDTH - this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-right
-        { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y + element.height / 2 } // Center
+        { x: element.x + element.width - labelWidth - this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-right
+        { x: element.x + element.width / 2 - labelWidth / 2, y: element.y + element.height / 2 } // Center
       ];
       // Find first non-overlapping position
       for (const pos of positions) {
         const labelBounds = {
           x: pos.x,
           y: pos.y - this.LABEL_HEIGHT,
-          width: this.LABEL_MIN_WIDTH,
+          width: labelWidth,
           height: this.LABEL_HEIGHT
         };

132-145: Carry dynamic widths into occupancy for external labels.

When seeding occupiedPositions from existing labels, derive each label's actual width or you’ll underestimate collisions.

-    const occupiedPositions = existingLabels.map(label => ({
-      x: label.x,
-      y: label.y - this.LABEL_HEIGHT,
-      width: this.LABEL_MIN_WIDTH,
-      height: this.LABEL_HEIGHT
-    }));
+    const occupiedPositions = existingLabels.map(label => {
+      const w = this.estimateTextWidth(label.ref);
+      return {
+        x: label.x,
+        y: label.y - this.LABEL_HEIGHT,
+        width: w,
+        height: this.LABEL_HEIGHT
+      };
+    });

205-231: Naive O(n^2) grouping; also not transitive.

Current grouping checks proximity only to the seed element. Elements A—B—C where AB and BC, but not A~C, end up split. For dense clusters, this increases label clutter.

Refactor to union-find/BFS over proximity graph:

  • Build adjacency by threshold.
  • BFS/DFS to collect connected components for groups.

I can provide a patch if you want to switch to a transitive grouping.


325-354: Use dynamic width in external position search.

findBestExternalPosition should use the measured width; otherwise left/right placements will be off-screen or overlapping.

-    const positions = [
-      { x: element.x + element.width + 10, y: element.y + element.height / 2 }, // Right
-      { x: element.x - this.LABEL_MIN_WIDTH - 10, y: element.y + element.height / 2 }, // Left
-      { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y - this.LABEL_HEIGHT - 10 }, // Top
-      { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y + element.height + 10 + this.LABEL_HEIGHT } // Bottom
-    ];
+    const w = this.estimateTextWidth(element.ref);
+    const positions = [
+      { x: element.x + element.width + 10, y: element.y + element.height / 2 }, // Right
+      { x: element.x - w - 10, y: element.y + element.height / 2 }, // Left
+      { x: element.x + element.width / 2 - w / 2, y: element.y - this.LABEL_HEIGHT - 10 }, // Top
+      { x: element.x + element.width / 2 - w / 2, y: element.y + element.height + 10 + this.LABEL_HEIGHT } // Bottom
+    ];
     for (const pos of positions) {
       const labelBounds = {
         x: pos.x,
         y: pos.y - this.LABEL_HEIGHT,
-        width: this.LABEL_MIN_WIDTH,
+        width: w,
         height: this.LABEL_HEIGHT
       };

383-389: Consider a small collision margin to avoid labels abutting each other.

Edges touching are currently considered “no overlap”. Adding a small margin improves legibility with dense layouts.

-  private overlapsWithExisting(
+  private overlapsWithExisting(
     label: {x: number, y: number, width: number, height: number},
     existing: Array<{x: number, y: number, width: number, height: number}>
   ): boolean {
-    return existing.some(existing => 
-      !(label.x + label.width < existing.x ||
-        existing.x + existing.width < label.x ||
-        label.y + label.height < existing.y ||
-        existing.y + existing.height < label.y)
-    );
+    const m = 2; // margin in px
+    return existing.some(e => 
+      !((label.x + label.width) <= (e.x - m) ||
+        (e.x + e.width) <= (label.x - m) ||
+        (label.y + label.height) <= (e.y - m) ||
+        (e.y + e.height) <= (label.y - m))
+    );
   }
camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts (6)

247-251: Clickable filtering depends on [cursor=pointer] heuristic; may miss truly interactive elements.

Many clickable elements (e.g., anchors, buttons) don’t always emit [cursor=pointer] in the snapshot. You may under-mark elements.

  • Consider alternative signals from the snapshot (role=button/link, tabindex, event listeners, tag name) to build the clickable set.
  • Fall back to visibility + size thresholds when in doubt.
    Would you like me to prototype an enriched parser?

256-266: Clamp and skip degenerate rectangles after scaling.

After scaling and clamping to the screenshot, width/height can drop to 0–1 px near edges. These produce invisible labels and can skew occupancy.

-      const elements = interactiveElements.map(([ref, element]) => {
+      const elements = interactiveElements.map(([ref, element]) => {
         const coords = element.coordinates!;
-        return {
+        const x = Math.max(0, coords.x * scaleX);
+        const y = Math.max(0, coords.y * scaleY);
+        const width = Math.min(coords.width * scaleX, screenshotWidth - x);
+        const height = Math.min(coords.height * scaleY, screenshotHeight - y);
+        if (width < 1 || height < 1) {
+          return null;
+        }
+        return {
           ref,
-          x: Math.max(0, coords.x * scaleX),
-          y: Math.max(0, coords.y * scaleY),
-          width: Math.min(coords.width * scaleX, screenshotWidth - Math.max(0, coords.x * scaleX)),
-          height: Math.min(coords.height * scaleY, screenshotHeight - Math.max(0, coords.y * scaleY)),
+          x,
+          y,
+          width,
+          height,
           isClickable: true  // All filtered elements are clickable
         };
-      });
+      }).filter(Boolean) as any;

268-274: Wrap svgContent in a grouping element for easier future styling/debug toggles.

Not required, but grouping facilitates toggling overlay visibility or styles via a single class/id.

-      const svgOverlay = `
-        <svg width="${screenshotWidth}" height="${screenshotHeight}" xmlns="http://www.w3.org/2000/svg">
-          ${svgContent}
-        </svg>
-      `;
+      const svgOverlay = `
+        <svg width="${screenshotWidth}" height="${screenshotHeight}" xmlns="http://www.w3.org/2000/svg">
+          <g id="camel-overlay" data-version="v2">
+            ${svgContent}
+          </g>
+        </svg>
+      `;

153-158: “marked visually” count can be inaccurate; align it with clickable filtering.

You report all elements-with-coordinates, but later filter to clickable ones. The message can overstate how many were actually labeled.

-      // Count elements with coordinates
-      const elementsWithCoords = Object.values(snapshotResult.elements).filter(el => el.coordinates).length;
+      // Count eligible elements (with coordinates and clickable hint)
+      const clickableSet = this.parseClickableElements(snapshotResult.snapshot);
+      const elementsWithCoords = Object.entries(snapshotResult.elements)
+        .filter(([ref, el]) => el.coordinates && clickableSet.has(ref)).length;

Optionally, return the exact count from addVisualMarksOptimized to include overlap-removal effects.

Do you want me to thread the exact labeled count back via a small return-structure change?


319-345: Topmost detection relies on iteration order of elements; not DOM/stacking-aware.

removeOverlappedElements assumes “later in array = on top”. If snapshotResult.elements isn’t in DOM or stacking order, results will be flaky.

  • If snapshot provides z-index / paint order, sort by it before filtering.
  • Otherwise, consider retaining all overlapping clickable elements but de-emphasize ones that are fully covered (e.g., dashed outline) to avoid false negatives.

Can you confirm the ordering guarantees of snapshotResult.elements?


290-293: Silent fallback on errors loses diagnostics; log minimal context in debug mode.

Returning the raw screenshot is correct, but a guarded debug log helps triage in CI.

     } catch (error) {
       // Error adding visual marks, falling back to original screenshot
       // Return original screenshot if marking fails
+      if (process.env.HYBRID_BROWSER_DEBUG === '1') {
+        console.warn('[VisualMarks] Failed to render overlay:', error);
+      }
       return screenshotBuffer;
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4b07763 and 65a9ff2.

📒 Files selected for processing (2)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts (2 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pytest_apps
  • GitHub Check: pytest_examples
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_package_llm_test
  • GitHub Check: Minimal dependency check
  • GitHub Check: pre-commit
🔇 Additional comments (3)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (2)

25-32: Good encapsulation and clear, tunable constants.

The class-based design and centralized constants make this easy to evolve and test. Solid foundation for the labeling system.


1-2: Overall: strong improvement over inline mark generation.

The centralization solves overlap and readability pain points we’ve seen. With the small fixes above (escaping, dynamic widths, group occupancy), this should be robust in the wild.

If you have noUnusedLocals enabled, confirm this compiles after removing groupCenter and adding the helper methods. I can add unit tests covering:

  • Large vs small elements
  • Group labels clamping and collision avoidance
  • SVG escaping for tricky refs (e.g., <a>&"'</a>)

Would you like me to draft those tests?

camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts (1)

5-5: Good modularization by delegating to ScreenshotLabeler.

Importing the labeler keeps this file focused on orchestration and reduces duplication of SVG logic. Nice separation of concerns.

@nitpicker55555
Copy link
Collaborator Author

nitpicker55555 commented Aug 22, 2025

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (2)

185-209: Good catch: group labels now clamp vertically and avoid collisions

The added clamping and overlap checks for group labels address the earlier review’s concerns about off-screen rendering and collision in dense regions. Nice improvement.

Also applies to: 307-364


487-498: Security: XML/SVG escaping is implemented correctly and background now scales with text

Escaping refs and using dynamic background width closes SVG/XML injection vectors and improves legibility. Good job.

Also applies to: 500-538

🧹 Nitpick comments (3)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (3)

220-246: Grouping heuristic is O(n^2) with a fixed 50px threshold; consider adaptive threshold or spatial indexing

For pages with many small elements, consider:

  • Use a grid/spatial hash (bucket by, e.g., 64px) to reduce candidate pairs.
  • Make “threshold” relative to LABEL_HEIGHT or viewport size to adapt across DPIs.

487-498: Optional: Use isClickable to style/filter, or drop the property to avoid confusion

You define isClickable on ElementWithCoordinates but don’t use it. Either:

  • Style non-clickables differently (stroke/label color), or
  • Filter them out before labeling and remove the field.

435-441: Nit: Consider integer rounding for crisper rendering

Rounding x/y/width/height to integers often produces sharper lines/text in browsers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 65a9ff2 and fdfb16c.

📒 Files selected for processing (1)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pre-commit
  • GitHub Check: pytest_examples
  • GitHub Check: pytest_apps
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_package_llm_test
🔇 Additional comments (1)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (1)

481-486: Confirm consumers expect an SVG fragment (not a full element)

generateSVG returns a fragment of SVG nodes. Verify the caller wraps it in an with proper width/height/viewBox. If the consumer expects a full SVG, export a container here.

@nitpicker55555
Copy link
Collaborator Author

nitpicker55555 commented Aug 22, 2025

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (1)

25-35: Unify label geometry (dynamic width/height) and fix “Top” placement bug to prevent collisions and off-screen rendering

Placement uses fixed LABEL_MIN_WIDTH/LABEL_HEIGHT while rendering uses dynamic width (estimateTextWidth + padding). This mismatch causes missed overlap detection and labels rendering out of bounds. Also, findBestExternalPosition subtracts LABEL_HEIGHT for the “Top” baseline, pushing labels too far up. Centralize font/padding, use a single getLabelWidth helper everywhere, and clamp consistently.

Apply the following diffs:

@@
 export class ScreenshotLabeler {
   private readonly MIN_ELEMENT_WIDTH = 40; // Minimum width for direct labeling
   private readonly MIN_ELEMENT_HEIGHT = 70; // Minimum height for direct labeling
   private readonly LABEL_PADDING = 4;
   private readonly LABEL_HEIGHT = 16;
   private readonly LABEL_MIN_WIDTH = 24;
   private readonly LINE_WIDTH = 3; // Increased line width
   private readonly LINE_COLOR = '#FF0066'; // Bright pink/red for high visibility
   private readonly ARROW_SIZE = 8; // Arrow head size
   private readonly MIN_LABEL_SPACING = 20;
+  // Centralized font settings to keep placement and rendering in sync
+  private readonly FONT_SIZE = 11;
+  private readonly FONT_FAMILY = 'Arial, sans-serif';
@@
   private createInternalLabels(elements: ElementWithCoordinates[]): LabelPosition[] {
@@
-    sortedElements.forEach(element => {
-      // Try different positions within the element
+    sortedElements.forEach(element => {
+      const w = this.getLabelWidth(element.ref);
+      // Try different positions within the element (baseline y for text)
       const positions = [
-        { x: element.x + this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT }, // Top-left
-        { x: element.x + element.width - this.LABEL_MIN_WIDTH - this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT }, // Top-right
-        { x: element.x + this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-left
-        { x: element.x + element.width - this.LABEL_MIN_WIDTH - this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-right
-        { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y + element.height / 2 } // Center
+        { x: element.x + this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT + this.LABEL_PADDING }, // Top-left
+        { x: element.x + element.width - w - this.LABEL_PADDING, y: element.y + this.LABEL_HEIGHT + this.LABEL_PADDING }, // Top-right
+        { x: element.x + this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-left
+        { x: element.x + element.width - w - this.LABEL_PADDING, y: element.y + element.height - this.LABEL_PADDING }, // Bottom-right
+        { x: element.x + element.width / 2 - w / 2, y: element.y + element.height / 2 } // Center
       ];
@@
-        const labelBounds = {
-          x: pos.x,
-          y: pos.y - this.LABEL_HEIGHT,
-          width: this.LABEL_MIN_WIDTH,
-          height: this.LABEL_HEIGHT
-        };
+        const labelBounds = {
+          x: pos.x,
+          y: pos.y - this.LABEL_HEIGHT - this.LABEL_PADDING,
+          width: w,
+          height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+        };
@@
   private createExternalLabels(
@@
-    const occupiedPositions = existingLabels.map(label => ({
-      x: label.x,
-      y: label.y - this.LABEL_HEIGHT,
-      width: this.LABEL_MIN_WIDTH,
-      height: this.LABEL_HEIGHT
-    }));
+    const occupiedPositions = existingLabels.map(label => ({
+      x: label.x,
+      y: label.y - this.LABEL_HEIGHT - this.LABEL_PADDING,
+      width: this.getLabelWidth(label.ref),
+      height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+    }));
@@
-        if (labelPos) {
+        if (labelPos) {
           labels.push({
             ref: element.ref,
             x: labelPos.x,
             y: labelPos.y,
             lineToElement: (() => {
-              const actualLabelWidth = this.calculateLabelWidth(element.ref);
-              const padding = 4;
+              const actualLabelWidth = this.getLabelWidth(element.ref);
+              const padding = this.LABEL_PADDING;
               const connections = this.getBestConnectionPoints(
                 labelPos.x - padding,  // Account for padding
                 labelPos.y,
                 actualLabelWidth,
-                this.LABEL_HEIGHT + 4,  // Account for vertical padding
+                this.LABEL_HEIGHT + this.LABEL_PADDING * 2,  // Account for vertical padding
                 element.x,
                 element.y,
                 element.width,
                 element.height
               );
               return {
                 x1: connections.start.x,
                 y1: connections.start.y,
                 x2: connections.end.x,
                 y2: connections.end.y
               };
             })()
           });
@@
-          occupiedPositions.push({
-            x: labelPos.x,
-            y: labelPos.y - this.LABEL_HEIGHT,
-            width: this.LABEL_MIN_WIDTH,
-            height: this.LABEL_HEIGHT
-          });
+          const w = this.getLabelWidth(element.ref);
+          occupiedPositions.push({
+            x: labelPos.x,
+            y: labelPos.y - this.LABEL_HEIGHT - this.LABEL_PADDING,
+            width: w,
+            height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+          });
@@
-            // Clamp Y position to screen bounds
-            const clampedY = Math.max(this.LABEL_HEIGHT, Math.min(screenshotHeight, label.y));
+            // Clamp Y position to screen bounds (respect padding)
+            const clampedY = Math.max(this.LABEL_HEIGHT + this.LABEL_PADDING, Math.min(screenshotHeight - this.LABEL_PADDING, label.y));
             label.y = clampedY;
@@
-            const labelBounds = {
-              x: label.x,
-              y: label.y - this.LABEL_HEIGHT,
-              width: this.LABEL_MIN_WIDTH,
-              height: this.LABEL_HEIGHT
-            };
+            const labelBounds = {
+              x: label.x,
+              y: label.y - this.LABEL_HEIGHT - this.LABEL_PADDING,
+              width: this.getLabelWidth(label.ref),
+              height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+            };
@@
-    // Check if right side placement would go off-screen
-    if (labelX + this.LABEL_MIN_WIDTH > screenshotWidth) {
+    // Check if right side placement would go off-screen
+    const maxW = Math.max(...group.map(e => this.getLabelWidth(e.ref)));
+    if (labelX + maxW > screenshotWidth) {
       // Try left side instead
-      const leftX = groupBounds.x - this.LABEL_MIN_WIDTH - 20;
+      const leftX = groupBounds.x - maxW - 20;
       if (leftX >= 0) {
         labelX = leftX;
         useLeftSide = true;
       } else {
         // Can't place on either side
         return labels;
       }
     }
@@
-        const labelBounds = {
-          x: labelX,
-          y: labelYPosition - this.LABEL_HEIGHT,
-          width: this.LABEL_MIN_WIDTH,
-          height: this.LABEL_HEIGHT
-        };
+        const w = this.getLabelWidth(sortedGroup[i].ref);
+        const labelBounds = {
+          x: labelX,
+          y: labelYPosition - this.LABEL_HEIGHT - this.LABEL_PADDING,
+          width: w,
+          height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+        };
@@
-        const actualLabelWidth = this.calculateLabelWidth(sortedGroup[i].ref);
-        const padding = 4;
+        const actualLabelWidth = this.getLabelWidth(sortedGroup[i].ref);
+        const padding = this.LABEL_PADDING;
         const connections = this.getBestConnectionPoints(
           labelX - padding,  // Account for padding
           labelYPosition,
           actualLabelWidth,
-          this.LABEL_HEIGHT + 4,  // Account for vertical padding
+          this.LABEL_HEIGHT + this.LABEL_PADDING * 2,  // Account for vertical padding
           sortedGroup[i].x,
           sortedGroup[i].y,
           sortedGroup[i].width,
           sortedGroup[i].height
         );
@@
-    if (!foundValidPosition) {
+    if (!foundValidPosition) {
       sortedGroup.forEach((element, index) => {
         const labelYPosition = Math.max(this.LABEL_HEIGHT, 
           Math.min(screenshotHeight, labelY + (index * this.MIN_LABEL_SPACING)));
-        
-        const actualLabelWidth = this.calculateLabelWidth(element.ref);
-        const padding = 4;
+        const actualLabelWidth = this.getLabelWidth(element.ref);
+        const padding = this.LABEL_PADDING;
         const connections = this.getBestConnectionPoints(
           labelX - padding,  // Account for padding
           labelYPosition,
           actualLabelWidth,
-          this.LABEL_HEIGHT + 4,  // Account for vertical padding
+          this.LABEL_HEIGHT + this.LABEL_PADDING * 2,  // Account for vertical padding
           element.x,
           element.y,
           element.width,
           element.height
         );
@@
   private findBestExternalPosition(
@@
-  ): {x: number, y: number} | null {
-    // Try positions around the element
-    const positions = [
-      { x: element.x + element.width + 10, y: element.y + element.height / 2 }, // Right
-      { x: element.x - this.LABEL_MIN_WIDTH - 10, y: element.y + element.height / 2 }, // Left
-      { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y - this.LABEL_HEIGHT - 10 }, // Top
-      { x: element.x + element.width / 2 - this.LABEL_MIN_WIDTH / 2, y: element.y + element.height + 10 + this.LABEL_HEIGHT } // Bottom
-    ];
+  ): {x: number, y: number} | null {
+    const w = this.getLabelWidth(element.ref);
+    // Try positions around the element
+    const positions = [
+      { x: element.x + element.width + 10, y: element.y + element.height / 2 }, // Right
+      { x: element.x - w - 10, y: element.y + element.height / 2 }, // Left
+      { x: element.x + element.width / 2 - w / 2, y: element.y - 10 }, // Top (baseline at gap above element)
+      { x: element.x + element.width / 2 - w / 2, y: element.y + element.height + 10 + this.LABEL_HEIGHT } // Bottom
+    ];
@@
-      const labelBounds = {
-        x: pos.x,
-        y: pos.y - this.LABEL_HEIGHT,
-        width: this.LABEL_MIN_WIDTH,
-        height: this.LABEL_HEIGHT
-      };
+      const labelBounds = {
+        x: pos.x,
+        y: pos.y - this.LABEL_HEIGHT - this.LABEL_PADDING,
+        width: w,
+        height: this.LABEL_HEIGHT + this.LABEL_PADDING * 2
+      };
@@
-  /**
-   * Calculate actual label width including padding
-   */
-  private calculateLabelWidth(ref: string): number {
-    const padding = 4;
-    const textWidth = this.estimateTextWidth(ref, 11, 'Arial, sans-serif');
-    return Math.max(this.LABEL_MIN_WIDTH, textWidth + padding * 2);
-  }
+  /**
+   * Compute final label box width used consistently in placement and rendering
+   */
+  private getLabelWidth(ref: string): number {
+    const textWidth = this.estimateTextWidth(ref, this.FONT_SIZE, this.FONT_FAMILY);
+    return Math.max(this.LABEL_MIN_WIDTH, textWidth + this.LABEL_PADDING * 2);
+  }
@@
-      const textColor = '#0096FF';
-      const fontSize = 11;
-      const fontFamily = 'Arial, sans-serif';
-      const padding = 4;
+      const textColor = '#0096FF';
@@
-      // Calculate dynamic width based on text content
-      const textWidth = this.estimateTextWidth(label.ref, fontSize, fontFamily);
-      const labelWidth = Math.max(this.LABEL_MIN_WIDTH, textWidth + padding * 2);
+      // Dynamic width based on text content and padding
+      const labelWidth = this.getLabelWidth(label.ref);
@@
-        <rect x="${label.x - padding}" y="${label.y - this.LABEL_HEIGHT - 2}" 
-              width="${labelWidth}" height="${this.LABEL_HEIGHT + 4}"
+        <rect x="${label.x - this.LABEL_PADDING}" y="${label.y - this.LABEL_HEIGHT - this.LABEL_PADDING}" 
+              width="${labelWidth}" height="${this.LABEL_HEIGHT + this.LABEL_PADDING * 2}"
               fill="white" opacity="0.9" rx="2"/>
-        <text x="${label.x}" y="${label.y}" font-family="${fontFamily}"
-              font-size="${fontSize}" fill="${textColor}" font-weight="bold">${escapedRef}</text>
+        <text x="${label.x}" y="${label.y}" font-family="${this.FONT_FAMILY}"
+              font-size="${this.FONT_SIZE}" fill="${textColor}" font-weight="bold">${escapedRef}</text>

Why this matters:

  • Eliminates geometry drift between placement and rendering.
  • Prevents false-negative overlap checks and off-screen label backgrounds.
  • Corrects the “Top” position baseline so labels don’t jump above the viewport.

Please sanity-check on a dense page (e.g., a form with many short labels). Ensure no labels overlap and none render off-screen when “Top” is chosen.

Also applies to: 97-127, 142-147, 164-196, 205-221, 316-327, 351-356, 401-432, 447-468, 656-676, 591-598

🧹 Nitpick comments (2)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (2)

332-400: Group-column placement: consider left-side early and clamp the whole column rather than per-label retries

You already try column shifts, but you clamp per-label and stop on first failure. A column-level clamp (ensuring first and last label stay within [LABEL_HEIGHT+PADDING, screenshotHeight-PADDING]) reduces attempts and avoids jitter. If the right column is tight against the edge, try left before entering shift attempts.

If helpful, I can propose a compact column-fit function that returns the best shift or null.


238-264: Grouping heuristic might under-cluster dense UIs

A fixed threshold=50 px may under-cluster on high-DPI screenshots or oversized canvases. Consider scaling threshold by screenshot dimensions (e.g., min(width,height)*0.03) or by median element size for more robust grouping.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between fdfb16c and ddc11fe.

📒 Files selected for processing (2)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts (3 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/hybrid-browser-toolkit.ts
🔇 Additional comments (2)
camel/toolkits/hybrid_browser_toolkit/ts/src/screenshot-labeler.ts (2)

609-621: LGTM: element overlays

Blue rectangle styling for element bounds is consistent and readable. Rounded corners and 2px stroke are a good balance.


496-553: LGTM: connection point selection

getBestConnectionPoints is clear and correct: computes candidate edge centers and picks the shortest pair. Works well with the earlier change to pass dynamic labelWidth/Height.

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (3)

48-53: Avoid logging secrets/PII (mask sensitive args in action_logger).

Currently, keystrokes (e.g., text) and other sensitive inputs are logged verbatim.

@@
-        # Log inputs (skip self)
-        inputs = {
-            "args": args,
-            "kwargs": kwargs,
-        }
+        # Log inputs (skip self) — mask sensitive fields
+        SENSITIVE_KEYS = {
+            "text", "password", "passwd", "token", "api_key", "apikey",
+            "authorization", "auth", "secret", "cookie", "session"
+        }
+        masked_kwargs = {
+            k: ("***" if isinstance(k, str) and k.lower() in SENSITIVE_KEYS else v)
+            for k, v in kwargs.items()
+        }
+        inputs = {"args": args, "kwargs": masked_kwargs}
@@
-            await self._log_action(
+            await self._log_action(
                 action_name=action_name,
                 inputs=inputs,
                 outputs=result,
                 execution_time=execution_time,
                 page_load_time=page_load_time,
             )
@@
-            await self._log_action(
+            await self._log_action(
                 action_name=action_name,
                 inputs=inputs,
                 outputs=None,
                 execution_time=execution_time,
                 error=error_msg,
             )

Also applies to: 65-71, 80-86


121-146: Sanitize session_id in log filenames to prevent path/name injection.

Use a safe slug when composing paths.

@@
-        self.session_id = (config or {}).get('session_id', 'default')
+        self.session_id = (config or {}).get('session_id', 'default')
+        # restrict to [A-Za-z0-9_-]
+        session_id_safe = ''.join(
+            (c if (c.isalnum() or c in ('-', '_')) else '_') for c in self.session_id
+        )
@@
-            self.log_file_path = os.path.join(
+            self.log_file_path = os.path.join(
                 log_dir,
-                f"hybrid_browser_toolkit_ws_{timestamp}_{self.session_id}.log",
+                f"hybrid_browser_toolkit_ws_{timestamp}_{session_id_safe}.log",
             )
@@
-            self.ts_log_file_path = os.path.join(
+            self.ts_log_file_path = os.path.join(
                 log_dir,
-                f"typescript_console_{timestamp}_{self.session_id}.log",
+                f"typescript_console_{timestamp}_{session_id_safe}.log",
             )

414-421: Make ping bounded; don't hang on dead connections.

Bound the health check with a timeout.

-            await self.websocket.ping()
+            await asyncio.wait_for(self.websocket.ping(), timeout=5.0)
♻️ Duplicate comments (2)
camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts (1)

124-130: Remove dataset-specific hard-coded pair debugging.

Hard-coding ref IDs leaks dataset artifacts and creates noisy logs. Use the DEBUG flag if you must log, but never bake specific refs.

Apply this diff:

-        // Debug specific pairs
-        if ((parentRef === 'e296' && childRef === 'e297') ||
-            (parentRef === 'e361' && childRef === 'e363') ||
-            (parentRef === 'e371' && childRef === 'e373') ||
-            (parentRef === 'e344' && childRef === 'e346') ||
-            (parentRef === 'e348' && childRef === 'e350')) {
-          console.log(`[Debug] Found pair: ${parentRef} (${parentNode.type}) -> ${childRef} (${childNode.type})`);
-        }
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

426-431: Boolean coercion for onclick is correct now.

Prevents string|boolean leakage into later logic.

🧹 Nitpick comments (9)
camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts (2)

154-159: Gate debug logs behind DEBUG_SNAPSHOT_PARSER.

Unconditional console logging will spam consumers. Respect the existing DEBUG flag.

Apply this diff:

-      filtered.delete(child);
-      console.log(`[Hierarchy Filter] Filtered ${child} (${childType}) - keeping parent ${parent} (${parentType})`);
+      filtered.delete(child);
+      if (DEBUG_SNAPSHOT_PARSER) {
+        console.log(`[Hierarchy Filter] Filtered ${child} (${childType}) - keeping parent ${parent} (${parentType})`);
+      }
@@
-      filtered.delete(parent);
-      console.log(`[Hierarchy Filter] Filtered ${parent} (${parentType}) - keeping child ${child} (${childType})`);
+      filtered.delete(parent);
+      if (DEBUG_SNAPSHOT_PARSER) {
+        console.log(`[Hierarchy Filter] Filtered ${parent} (${parentType}) - keeping child ${child} (${childType})`);
+      }
@@
-              filtered.delete(childRef);
-              console.log(`[Hierarchy Filter] Filtered ${childRef} (${childType}) contained in ${currentParent} (${parentType})`);
+              filtered.delete(childRef);
+              if (DEBUG_SNAPSHOT_PARSER) {
+                console.log(`[Hierarchy Filter] Filtered ${childRef} (${childType}) contained in ${currentParent} (${parentType})`);
+              }
@@
-      filtered.delete(ref);
-      console.log(`[Hierarchy Filter] Filtered ${ref} (generic wrapper around button ${clickableChildren[0]})`);
+      filtered.delete(ref);
+      if (DEBUG_SNAPSHOT_PARSER) {
+        console.log(`[Hierarchy Filter] Filtered ${ref} (generic wrapper around button ${clickableChildren[0]})`);
+      }

Also applies to: 195-197, 221-222


90-90: Remove unused variable debugInfo.

It’s never read.

Apply this diff:

-  const debugInfo: any[] = [];
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (3)

248-260: Snapshot parsing helper: broaden role regex and normalize case.

Align with other modules that allow hyphens/digits and use lower-case.

Apply this diff:

-        const typeMatch = line.match(/^\s*-?\s*([\w-]+)/);
-        const role = typeMatch ? typeMatch[1] : undefined;
+        const typeMatch = line.match(/^\s*-?\s*([a-z0-9_-]+)/i);
+        const role = typeMatch ? typeMatch[1].toLowerCase() : undefined;

885-909: Use evaluate instead of evaluateHandle and drop unused local; tiny leak/readability.

No need for a handle; also remove unused lastChangeTime.

Apply this diff:

-    const startTime = Date.now();
-    const stabilityThreshold = 100; // Consider stable if no changes for 100ms
-    let lastChangeTime = Date.now();
+    const startTime = Date.now();
+    const stabilityThreshold = 100; // Consider stable if no changes for 100ms
@@
-      // Monitor DOM changes
-      await page.evaluateHandle(() => {
+      // Monitor DOM changes
+      await page.evaluate(() => {
         let changeCount = 0;
         (window as any).__domStabilityCheck = { changeCount: 0, lastChange: Date.now() };
@@
-        (window as any).__domStabilityObserver = observer;
-      });
+        (window as any).__domStabilityObserver = observer;
+      });
@@
-      await page.evaluate(() => {
+      await page.evaluate(() => {
         const observer = (window as any).__domStabilityObserver;
         if (observer) observer.disconnect();
         delete (window as any).__domStabilityObserver;
         delete (window as any).__domStabilityCheck;
       }).catch(() => {});

Also applies to: 927-935


1269-1275: Guard against non-positive scrollPositionScale.

A scale of 0 breaks viewport math. Normalize to > 0.

Apply this diff:

-    const scale = Number.isFinite(browserConfig.scrollPositionScale)
-      ? browserConfig.scrollPositionScale
-      : 1;
+    const rawScale = Number(browserConfig.scrollPositionScale);
+    const scale = Number.isFinite(rawScale) && rawScale > 0 ? rawScale : 1;
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (4)

369-373: Harden JSON logging for non-serializable objects.

Use default=str to avoid crashes on exotic types.

-                f.write(
-                    json.dumps(log_entry, ensure_ascii=False, indent=2) + '\n'
-                )
+                f.write(json.dumps(
+                    log_entry, ensure_ascii=False, indent=2, default=str
+                ) + '\n')

721-731: Ruff SIM115: persistent file handle is intentional; annotate.

Since the handle must stay open, add a noqa to silence the false positive.

-                self.ts_log_file = open(
-                    self.ts_log_file_path, 'w', encoding='utf-8'
-                )
+                self.ts_log_file = open(  # noqa: SIM115 - persistent handle across loop
+                    self.ts_log_file_path, 'w', encoding='utf-8'
+                )

539-549: Type hint correctness for _ensure_ref_prefix.

Function returns None when ref is falsy; update hints accordingly.

-    def _ensure_ref_prefix(self, ref: str) -> str:
+    def _ensure_ref_prefix(self, ref: Optional[str]) -> Optional[str]:
@@
-        if not ref:
-            return ref
+        if not ref:
+            return ref

156-170: Use shutil.which for tool checks (faster, style-aligned).

Avoid spawning processes just to test presence.

+        import shutil
-        # Check if npm is installed
-        npm_check = subprocess.run(
-            ['npm', '--version'],
-            capture_output=True,
-            text=True,
-        )
-        if npm_check.returncode != 0:
+        # Check if npm is installed
+        if shutil.which('npm') is None:
             raise RuntimeError(
                 "npm is not installed or not in PATH. "
                 "Please install Node.js and npm from https://nodejs.org/ "
                 "to use the hybrid browser toolkit."
             )
@@
-        # Check if node is installed
-        node_check = subprocess.run(
-            ['node', '--version'],
-            capture_output=True,
-            text=True,
-        )
-        if node_check.returncode != 0:
+        # Check if node is installed
+        if shutil.which('node') is None:
             raise RuntimeError(
                 "node is not installed or not in PATH. "
                 "Please install Node.js from https://nodejs.org/ "
                 "to use the hybrid browser toolkit."
             )

Also applies to: 171-183

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 93225b4 and 230cb23.

📒 Files selected for processing (3)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (10 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts (1 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the python code against the google python style guide and point out any mismatches.

Files:

  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py
🪛 Ruff (0.12.2)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py

722-722: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_package_llm_test
  • GitHub Check: pytest_examples
  • GitHub Check: pytest_apps
  • GitHub Check: pre-commit
🔇 Additional comments (3)
camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts (1)

51-55: Indent-based parent resolution: confirm indent semantics.

Using “>= indent” pops siblings and any ancestor at the same indentation. Ensure the snapshot emitter uses fixed-width indentation (e.g., 2 spaces per level). If tabs or mixed widths appear, consider normalizing indentation to levels.

camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (2)

67-75: Applying stealth headers/UA for CDP contexts looks good.

Keeps behavior consistent across connection modes.


137-145: Applying stealth headers/UA for launched contexts looks good.

Parity with CDP path maintained.

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (4)

250-255: Cleanup on start-time failure is incomplete (log reader/task/file leak).

If server readiness times out, only the process is killed; the log-reader task remains and the file handle may stay open. Cancel the reader, await it, close the file, and clear handles.

Apply this diff:

   if not server_ready:
-      self.process.kill()
-      raise RuntimeError(
-          "WebSocket server failed to start within timeout"
-      )
+      self.process.kill()
+      with contextlib.suppress(Exception):
+          # Ensure the process fully exits
+          self.process.wait(timeout=2)
+      # Cancel and await the log reader task
+      if self._log_reader_task and not self._log_reader_task.done():
+          self._log_reader_task.cancel()
+          with contextlib.suppress(asyncio.CancelledError):
+              await self._log_reader_task
+      # Close TS log file if open
+      if getattr(self, 'ts_log_file', None):
+          with contextlib.suppress(Exception):
+              self.ts_log_file.close()
+          self.ts_log_file = None
+      self.process = None
+      raise RuntimeError("WebSocket server failed to start within timeout")

265-269: Also cleanup when WebSocket connect fails after process start.

Mirror the above cleanup to avoid a dangling reader and file handle if connect throws.

Apply this diff:

   except Exception as e:
-      self.process.kill()
-      raise RuntimeError(
-          f"Failed to connect to WebSocket server: {e}"
-      ) from e
+      self.process.kill()
+      with contextlib.suppress(Exception):
+          self.process.wait(timeout=2)
+      if self._log_reader_task and not self._log_reader_task.done():
+          self._log_reader_task.cancel()
+          with contextlib.suppress(asyncio.CancelledError):
+              await self._log_reader_task
+      if getattr(self, 'ts_log_file', None):
+          with contextlib.suppress(Exception):
+              self.ts_log_file.close()
+          self.ts_log_file = None
+      self.process = None
+      raise RuntimeError(f"Failed to connect to WebSocket server: {e}") from e

420-427: Bound ping to a short timeout to avoid indefinite await.

Awaiting ping() can hang if the server never responds. Use asyncio.wait_for.

Apply this diff:

-        # Check if connection is still alive
-        try:
-            # Send a ping to check connection
-            await self.websocket.ping()
+        # Check if connection is still alive
+        try:
+            # Send a ping to check connection (bounded wait)
+            await asyncio.wait_for(self.websocket.ping(), timeout=5.0)

722-786: Open the TS log file with a context manager and simplify finalization.

Prevents accidental leaks and matches linter guidance.

Apply this diff:

-    async def _read_and_log_output(self):
+    async def _read_and_log_output(self):
         """Read stdout from Node.js process & handle SERVER_READY + logging."""
         if not self.process:
             return
 
         try:
-            if self.ts_log_file_path:
-                self.ts_log_file = open(
-                    self.ts_log_file_path, 'w', encoding='utf-8'
-                )
-                self.ts_log_file.write(
-                    f"TypeScript Console Log - Started at "
-                    f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n"
-                )
-                self.ts_log_file.write("=" * 80 + "\n")
-                self.ts_log_file.flush()
-
-            while self.process and self.process.poll() is None:
-                try:
-                    line = await asyncio.get_running_loop().run_in_executor(
-                        None, self.process.stdout.readline
-                    )
-                    if not line:  # EOF
-                        break
-
-                    # Check for SERVER_READY message
-                    if line.startswith('SERVER_READY:'):
-                        try:
-                            self.server_port = int(
-                                line.split(':', 1)[1].strip()
-                            )
-                            logger.info(
-                                f"WebSocket server ready on port "
-                                f"{self.server_port}"
-                            )
-                            if (
-                                self._server_ready_future
-                                and not self._server_ready_future.done()
-                            ):
-                                self._server_ready_future.set_result(True)
-                        except (ValueError, IndexError) as e:
-                            logger.error(f"Failed to parse SERVER_READY: {e}")
-
-                    # Write all output to log file
-                    if self.ts_log_file:
-                        timestamp = time.strftime('%H:%M:%S')
-                        self.ts_log_file.write(f"[{timestamp}] {line}")
-                        self.ts_log_file.flush()
-
-                except Exception as e:
-                    logger.warning(f"Error reading stdout: {e}")
-                    break
-
-        except Exception as e:
-            logger.warning(f"Error in _read_and_log_output: {e}")
-        finally:
-            if self.ts_log_file:
-                self.ts_log_file.write("\n" + "=" * 80 + "\n")
-                self.ts_log_file.write(
-                    f"TypeScript Console Log - Ended at "
-                    f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n"
-                )
-                self.ts_log_file.close()
-                self.ts_log_file = None
+            import contextlib
+            with contextlib.ExitStack() as stack:
+                if self.ts_log_file_path:
+                    self.ts_log_file = stack.enter_context(
+                        open(self.ts_log_file_path, 'w', encoding='utf-8')
+                    )
+                    self.ts_log_file.write(
+                        f"TypeScript Console Log - Started at "
+                        f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n"
+                    )
+                    self.ts_log_file.write("=" * 80 + "\n")
+                    self.ts_log_file.flush()
+
+                while self.process and self.process.poll() is None:
+                    try:
+                        line = await asyncio.get_running_loop().run_in_executor(
+                            None, self.process.stdout.readline
+                        )
+                        if not line:  # EOF
+                            break
+
+                        # Check for SERVER_READY message
+                        if line.startswith('SERVER_READY:'):
+                            try:
+                                self.server_port = int(
+                                    line.split(':', 1)[1].strip()
+                                )
+                                logger.info(
+                                    f"WebSocket server ready on port "
+                                    f"{self.server_port}"
+                                )
+                                if (
+                                    self._server_ready_future
+                                    and not self._server_ready_future.done()
+                                ):
+                                    self._server_ready_future.set_result(True)
+                            except (ValueError, IndexError) as e:
+                                logger.error(f"Failed to parse SERVER_READY: {e}")
+
+                        # Write all output to log file
+                        if self.ts_log_file:
+                            timestamp = time.strftime('%H:%M:%S')
+                            self.ts_log_file.write(f"[{timestamp}] {line}")
+                            self.ts_log_file.flush()
+
+                    except Exception as e:
+                        logger.warning(f"Error reading stdout: {e}")
+                        break
+
+                # Footer if we had a file
+                if self.ts_log_file:
+                    self.ts_log_file.write("\n" + "=" * 80 + "\n")
+                    self.ts_log_file.write(
+                        f"TypeScript Console Log - Ended at "
+                        f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n"
+                    )
+                # ExitStack closes file; clear handle
+                self.ts_log_file = None
+        except Exception as e:
+            logger.warning(f"Error in _read_and_log_output: {e}")
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

342-371: Timing fields are misleading: coordinate enrichment is reported as 0.

You integrated coordinate calc into mappingTime but set coordinate_enrichment_time_ms to 0. Split the timings.

Apply this diff:

-      const mappingTime = Date.now() - mappingStart;
+      const baseMappingTime = Date.now() - mappingStart;
+      let coordinateTime = 0;
@@
-      return {
+      return {
         snapshot: finalSnapshot,
         elements: finalElements,
         metadata: {
           elementCount: Object.keys(finalElements).length,
           url: page.url(),
           timestamp: new Date().toISOString(),
         },
         timing: {
           total_time_ms: totalTime,
           snapshot_time_ms: snapshotTime,
-          coordinate_enrichment_time_ms: 0, // Integrated into mapping
-          aria_mapping_time_ms: mappingTime,
+          coordinate_enrichment_time_ms: coordinateTime,
+          aria_mapping_time_ms: baseMappingTime,
         },
       };

And above the includeCoordinates block, measure coordinate time:

-      if (includeCoordinates) {
+      if (includeCoordinates) {
+        const coordStart = Date.now();
         // Get coordinates for each ref using aria-ref selector
         for (const ref of refs) {
           try {
             const selector = `aria-ref=${ref}`;
-            const element = await page.locator(selector).first();
-            const exists = await element.count() > 0;
+            const element = page.locator(selector).first();
+            const exists = (await element.count()) > 0;
@@
           } catch (error) {
             // Failed to get coordinates for element
           }
         }
+        coordinateTime = Date.now() - coordStart;
       }
♻️ Duplicate comments (2)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (1)

225-231: Good fix: create the readiness future before starting the reader.

This removes the race and aligns with earlier guidance.

camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

452-461: Correlate popup to the clicking page; prefer page.waitForEvent('popup') with non-throwing wait.

Using context().waitForEvent('page') can capture unrelated tabs; correlating to the page avoids flakiness. Also keep the wait short and non-throwing.

Apply this diff:

-        // Set up popup listener before clicking
-        const popupPromise = page.context().waitForEvent('page', { timeout: browserConfig.popupTimeout });
+        // Set up popup listener correlated to this page
+        const popupPromise = page
+          .waitForEvent('popup', { timeout: Math.min(browserConfig.popupTimeout, 700) })
+          .catch(() => null);
@@
-        try {
-          // Wait for new page to open
-          const newPage = await popupPromise;
+        try {
+          // Wait for new page (if any)
+          const newPage = await popupPromise;
+          if (!newPage) {
+            return { success: true, method: 'playwright-aria-ref' };
+          }
🧹 Nitpick comments (4)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (4)

248-260: Remove unused parseElementFromSnapshot to avoid duplication.

Mapping now uses buildSnapshotIndex; this helper appears dead and risks divergence.

Apply this diff:

-  private parseElementFromSnapshot(snapshotText: string, ref: string): { role?: string; text?: string } {
-    const lines = snapshotText.split('\n');
-    for (const line of lines) {
-      if (line.includes(`[ref=${ref}]`)) {
-        const typeMatch = line.match(/^\s*-?\s*([\w-]+)/);
-        const role = typeMatch ? typeMatch[1] : undefined;
-        const textMatch = line.match(/"([^"]*)"/);
-        const text = textMatch ? textMatch[1] : undefined;
-        return { role, text };
-      }
-    }
-    return {};
-  }

311-339: Minor: avoid unnecessary awaits and clarify existence checks.

locator().first() is sync; awaiting it yields no benefit. Add parentheses for readability on count comparison.

Apply this diff:

-            const element = await page.locator(selector).first();
-            const exists = await element.count() > 0;
+            const element = page.locator(selector).first();
+            const exists = (await element.count()) > 0;

Repeat similarly in performType/performSelect for consistency.


430-439: Behavioral change: forcing all links to open in new tabs. Verify product intent.

This alters site behavior and can spawn many tabs. Confirm this is desired; otherwise gate behind a config flag.

I can add a forceLinksOpenInNewTabs boolean in config and default it to false if you want.


110-117: UA placement: set userAgent on contexts, not on launch().

chromium.launch() ignores userAgent; you already set it on newContext. Remove the launch-level userAgent to avoid confusion.

Apply this diff:

-      if (stealthConfig.enabled) {
-        launchOptions.args = stealthConfig.args || [];
-        
-        // Apply stealth user agent if configured
-        if (stealthConfig.userAgent) {
-          launchOptions.userAgent = stealthConfig.userAgent;
-        }
-      }
+      if (stealthConfig.enabled) {
+        launchOptions.args = stealthConfig.args || [];
+      }

Also applies to: 132-148

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 230cb23 and f039dd1.

📒 Files selected for processing (3)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (7 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts (1 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/snapshot-parser.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the python code against the google python style guide and point out any mismatches.

Files:

  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py
🪛 Ruff (0.12.2)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py

729-729: Use a context manager for opening files

(SIM115)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pre-commit
  • GitHub Check: pytest_examples
  • GitHub Check: pytest_package_llm_test
  • GitHub Check: pytest_apps
  • GitHub Check: pytest_package_fast_test
🔇 Additional comments (4)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (3)

233-238: Helpful log: surface TS console log path.

This improves operability and debugging.


295-331: Orderly shutdown looks solid now.

Shutdown command first, then process termination and finally task cancellation with suppress — good sequencing to avoid blocked .readline().


441-444: Correct: bind futures to the running loop.

Prevents cross-loop hazards.

camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

300-309: Good switch to one-pass snapshot indexing.

This removes O(n²) parsing and reduces latency on large pages.

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (2)

225-231: Good: readiness future created before starting the log reader.

This eliminates the race flagged earlier; the guard in the reader also looks correct.


467-469: Good: futures are created on the running loop.

This resolves cross-loop pitfalls.

camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

452-481: Scope popup detection to the clicked page and make it non-blocking short wait.

Context-level page event can capture unrelated tabs and adds latency. Prefer a short, non-throwing page.waitForEvent('popup') and handle the no-popup case.

Apply:

-        const popupPromise = page.context().waitForEvent('page', { timeout: browserConfig.popupTimeout });
+        const popupPromise = page
+          .waitForEvent('popup', { timeout: Math.min(browserConfig.popupTimeout, 700) })
+          .catch(() => null);
@@
-        try {
-          // Wait for new page to open
-          const newPage = await popupPromise;
+        // Wait for new page to open (if any)
+        const newPage = await popupPromise;
+        if (!newPage) {
+          return { success: true, method: 'playwright-aria-ref' };
+        }
@@
-          return { success: true, method: 'playwright-aria-ref-newtab', newTabId };
-        } catch (popupError) {
-          return { success: true, method: 'playwright-aria-ref' };
-        }
+          return { success: true, method: 'playwright-aria-ref-newtab', newTabId };
🧹 Nitpick comments (6)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (4)

24-25: Add precise type annotations for new attributes (minor style + clarity).

Annotate the new attributes and import IO to match Google style and improve IDE/type checking.

-from typing import TYPE_CHECKING, Any, Dict, List, Optional
+from typing import TYPE_CHECKING, Any, Dict, List, Optional, IO
@@
-        self._server_ready_future = None  # Future to track server ready state
+        self._server_ready_future: Optional[asyncio.Future[bool]] = None  # Future to track server ready state
@@
-        self.ts_log_file_path: Optional[str] = None
-        self.ts_log_file = None  # File handle for TypeScript logs
-        self._log_reader_task = None  # Task for reading and logging stdout
+        self.ts_log_file_path: Optional[str] = None
+        self.ts_log_file: Optional[IO[str]] = None  # File handle for TypeScript logs
+        self._log_reader_task: Optional[asyncio.Task[None]] = None  # Task for reading and logging stdout

Also applies to: 119-131


220-223: Pin subprocess text encoding to UTF-8.

Avoid locale-dependent decoding of Node output.

-            text=True,
+            text=True,
+            encoding="utf-8",

243-249: Make server start timeout configurable.

Ten seconds can be tight on CI or cold machines. Read from config with a sane default.

-        timeout = 10  # 10 seconds timeout
+        timeout = int((self.config or {}).get('server_start_timeout_sec', 10))

304-356: Shutdown ordering looks solid; also fail fast any pending futures.

Nice sequencing: send shutdown → close WS → stop process → cancel tasks → close files. Add a small block to cancel outstanding command futures to avoid hung awaiters during shutdown.

         # Close TS log file if open
         if getattr(self, 'ts_log_file', None):
             with contextlib.suppress(Exception):
                 self.ts_log_file.close()
             self.ts_log_file = None
 
         # Ensure process handle cleared
         self.process = None
+
+        # Reject any pending command futures to avoid hangs on shutdown
+        if self._pending_responses:
+            err = RuntimeError("Shutting down: pending command cancelled")
+            for fut in list(self._pending_responses.values()):
+                if not fut.done():
+                    fut.set_exception(err)
+            self._pending_responses.clear()
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (2)

1243-1249: Viewport math fix — LGTM; drop unused scrollPos param.

filterElementsInViewport no longer uses scrollPos. Remove it here and at the call site to avoid confusion.

Apply:

-  private filterElementsInViewport(
-    elements: Record<string, SnapshotElement>, 
-    viewport: { width: number, height: number }, 
-    scrollPos: { x: number, y: number }
-  ): Record<string, SnapshotElement> {
+  private filterElementsInViewport(
+    elements: Record<string, SnapshotElement>, 
+    viewport: { width: number, height: number }
+  ): Record<string, SnapshotElement> {

And at usage (Line 351):

-        finalElements = this.filterElementsInViewport(playwrightMapping, viewport, scrollPos);
+        finalElements = this.filterElementsInViewport(playwrightMapping, viewport);

248-260: Remove unused private method parseElementFromSnapshot
This helper isn’t referenced anywhere; delete it to prevent future drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f039dd1 and 7205f57.

📒 Files selected for processing (2)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (8 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the python code against the google python style guide and point out any mismatches.

Files:

  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_package_llm_test
  • GitHub Check: pytest_apps
🔇 Additional comments (6)
camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (1)

747-816: Log reader and SERVER_READY handling look robust.

  • ExitStack-managed file handle with footer, flush per line: good.
  • SERVER_READY parsing sets the future with guard: good.
  • run_in_executor with get_running_loop: good.
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (5)

137-145: Stealth headers/UA on new context — LGTM.


262-276: Single-pass snapshot index — LGTM.


300-309: O(n) mapping from the prebuilt index — LGTM.


324-333: Coordinate enrichment merges without clobbering role/text — LGTM.


857-913: Remove unused lastChangeTime and avoid leaking JSHandle
Use page.evaluate instead of evaluateHandle to prevent leaking the handle and drop the unused lastChangeTime.

@@ camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts:857-913
-    let lastChangeTime = Date.now();
@@
-      await page.evaluateHandle(() => {
+      await page.evaluate(() => {

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (2)

121-128: Persistent-context path misses stealth headers; UA-only applied.

launchPersistentContext supports extraHTTPHeaders and viewport. Add them so stealth is consistent across all launch modes.

Apply this diff:

       if (stealthConfig.enabled) {
         launchOptions.args = stealthConfig.args || [];
-        
-        // Apply stealth user agent if configured
+        // Apply stealth user agent/headers if configured
         if (stealthConfig.userAgent) {
           launchOptions.userAgent = stealthConfig.userAgent;
         }
+        if (stealthConfig.extraHTTPHeaders) {
+          launchOptions.extraHTTPHeaders = stealthConfig.extraHTTPHeaders;
+        }
       }
 
       if (browserConfig.userDataDir) {
-        this.context = await chromium.launchPersistentContext(
+        // Ensure viewport is honored in persistent context
+        launchOptions.viewport = browserConfig.viewport;
+        this.context = await chromium.launchPersistentContext(
           browserConfig.userDataDir,
           launchOptions
         );

Also applies to: 130-141


112-114: CDP mode inconsistency: earlier “newPage is not supported”, later tries newPage().

Either support newPage() in CDP or remove the fallback creation in getCurrentPage(). Right now behavior is contradictory and will surprise callers.

Apply this diff to align with “no newPage in CDP”:

-      if (browserConfig.connectOverCdp && this.context) {
-        console.log('[CDP] No active page found, attempting to create new page...');
-        try {
-          const newPage = await this.context.newPage();
-          const newTabId = this.generateTabId();
-          this.registerNewPage(newTabId, newPage);
-          this.currentTabId = newTabId;
-          // Set page timeouts
-          newPage.setDefaultNavigationTimeout(browserConfig.navigationTimeout);
-          newPage.setDefaultTimeout(browserConfig.navigationTimeout);
-          console.log(`[CDP] Created new page with tab ID: ${newTabId}`);
-          return newPage;
-        } catch (error) {
-          console.error('[CDP] Failed to create new page:', error);
-          throw new Error('No active page available and failed to create new page in CDP mode');
-        }
-      }
+      if (browserConfig.connectOverCdp) {
+        throw new Error('No active page available in CDP mode; frontend must pre-create blank tabs.');
+      }

Also applies to: 199-217

🧹 Nitpick comments (4)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (4)

259-271: Remove now-unused parseElementFromSnapshot to avoid divergence.

buildSnapshotIndex replaced per-ref parsing and normalizes role casing; keeping this duplicate parser risks inconsistent behavior.

Apply this diff:

-  private parseElementFromSnapshot(snapshotText: string, ref: string): { role?: string; text?: string } {
-    const lines = snapshotText.split('\n');
-    for (const line of lines) {
-      if (line.includes(`[ref=${ref}]`)) {
-        const typeMatch = line.match(/^\s*-?\s*([\w-]+)/);
-        const role = typeMatch ? typeMatch[1] : undefined;
-        const textMatch = line.match(/"([^"]*)"/);
-        const text = textMatch ? textMatch[1] : undefined;
-        return { role, text };
-      }
-    }
-    return {};
-  }

273-287: Minor: precompile regexes to reduce per-line allocations.

Hoist refRe, roleRe, textRe as private readonly fields to reuse across calls.

Example (outside this method):

// At class level
private readonly reRef = /\[ref=([^\]]+)\]/i;
private readonly reRole = /^\s*-?\s*([a-z0-9_-]+)/i;
private readonly reText = /"([^"]*)"/;

// And use them here instead of new RegExp each loop.

359-364: Avoid extra round-trip: reuse cached scrollPosition since viewport math ignores scroll.

You call getCurrentScrollPosition() but filterElementsInViewport now operates purely in viewport space. Use the cached field to save an eval.

Apply this diff:

-        const viewport = page.viewportSize() || { width: 1280, height: 720 };
-        const scrollPos = await this.getCurrentScrollPosition();
-        finalElements = this.filterElementsInViewport(playwrightMapping, viewport, scrollPos);
+        const viewport = page.viewportSize() || { width: 1280, height: 720 };
+        finalElements = this.filterElementsInViewport(playwrightMapping, viewport, this.scrollPosition);

377-381: Timing metric misreports coordinate cost as 0.

coordinate_enrichment_time_ms should reflect work done when includeCoordinates=true.

Apply this diff:

-          coordinate_enrichment_time_ms: 0, // Integrated into mapping
-          aria_mapping_time_ms: mappingTime,
+          coordinate_enrichment_time_ms: includeCoordinates ? mappingTime : 0, // approx
+          aria_mapping_time_ms: mappingTime,

If you want precise breakdown, time the coordinate loop separately.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7205f57 and 534c1c8.

📒 Files selected for processing (2)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (8 hunks)
  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • camel/toolkits/hybrid_browser_toolkit/ws_wrapper.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pytest_package_llm_test
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_examples
  • GitHub Check: pre-commit
  • GitHub Check: pytest_apps
🔇 Additional comments (6)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (6)

62-72: Good fix: apply stealth headers on reused CDP context; warn for immutable UA.

This closes the gap when reusing a pre-existing context.


78-86: Consistent stealth on new CDP context creation.

Applying headers and UA via contextOptions is correct.


311-321: LGTM: one-pass snapshot indexing eliminates O(n²) scans.

The mapping defaults are sensible and stable.


335-338: LGTM: merge coordinates without losing role/text.

Prevents accidental overwrites.


1254-1260: LGTM: viewport math fixed to pure viewport coordinates.

This resolves the scroll-offset double-count.

Also applies to: 1273-1273


1235-1241: No action needed: browser.close() on CDP sessions only disconnects the client
Playwright docs: calling Browser.close() on an instance obtained via chromium.connectOverCDP clears all contexts and disconnects from the browser server without terminating the remote browser process. (playwright.dev)

@nitpicker55555
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Aug 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

348-354: viewportLimit without coordinates yields no real filtering; force-enable coords.

If viewportLimit is true and includeCoordinates is false, elements without coords pass through.

-  private async getSnapshotForAINative(includeCoordinates = false, viewportLimit = false): Promise<SnapshotResult & { timing: DetailedTiming }> {
+  private async getSnapshotForAINative(includeCoordinates = false, viewportLimit = false): Promise<SnapshotResult & { timing: DetailedTiming }> {
     const startTime = Date.now();
     const page = await this.getCurrentPage();
     
     try {
+      if (viewportLimit && !includeCoordinates) {
+        includeCoordinates = true;
+      }
♻️ Duplicate comments (1)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (1)

452-482: New-tab detection can hang; bind to page 'popup' with short, non-throwing window.

Context 'page' listens globally and waits full popupTimeout; prefer page.waitForEvent('popup') with a short timeout and catch.

-        // Set up popup listener before clicking
-        const popupPromise = page.context().waitForEvent('page', { timeout: browserConfig.popupTimeout });
+        // Set up popup listener with a short, non-throwing window
+        const popupPromise = page
+          .waitForEvent('popup', { timeout: Math.min(browserConfig.popupTimeout, 700) })
+          .catch(() => null);
@@
-        try {
-          // Wait for new page to open
-          const newPage = await popupPromise;
-          
+        // Wait for new page to open (if any)
+        const newPage = await popupPromise;
+        if (!newPage) {
+          return { success: true, method: 'playwright-aria-ref' };
+        }
+
-          // Generate tab ID for the new page
+          // Generate tab ID for the new page
           const newTabId = this.generateTabId();
           this.registerNewPage(newTabId, newPage);
@@
-          return { success: true, method: 'playwright-aria-ref-newtab', newTabId };
-        } catch (popupError) {
-          return { success: true, method: 'playwright-aria-ref' };
-        }
+          return { success: true, method: 'playwright-aria-ref-newtab', newTabId };
🧹 Nitpick comments (5)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (5)

124-131: extraHTTPHeaders on launchOptions is ineffective for chromium.launch (noise).

LaunchOptions ignores extraHTTPHeaders; keep it only for persistent context or add a clarifying comment.

Apply this to avoid confusion and scope headers to persistent contexts only:

       if (stealthConfig.userAgent) {
         launchOptions.userAgent = stealthConfig.userAgent;
       }
-      if (stealthConfig.extraHTTPHeaders) {
-        launchOptions.extraHTTPHeaders = stealthConfig.extraHTTPHeaders;
-      }

200-206: Nit: comment in visitPage contradicts this behavior.

getCurrentPage throws in CDP mode; update the comment in visitPage (Lines 938-941) to match.

-        // If no active page is available, getCurrentPage() will create one in CDP mode
+        // In CDP mode, getCurrentPage() will NOT create a page; frontend must pre-create blank tabs

248-276: Remove unused duplicate snapshot parser.

parseElementFromSnapshot is dead code after adopting buildSnapshotIndex; keeping both invites divergence.

-  private parseElementFromSnapshot(snapshotText: string, ref: string): { role?: string; text?: string } {
-    const lines = snapshotText.split('\n');
-    for (const line of lines) {
-      if (line.includes(`[ref=${ref}]`)) {
-        const typeMatch = line.match(/^\s*-?\s*([\w-]+)/);
-        const role = typeMatch ? typeMatch[1] : undefined;
-        const textMatch = line.match(/"([^"]*)"/);
-        const text = textMatch ? textMatch[1] : undefined;
-        return { role, text };
-      }
-    }
-    return {};
-  }

289-295: Deduplicate refs before mapping to cut coordinate lookups.

Repeated refs cause redundant locator/boundingBox work.

-      const refs: string[] = [];
+      const refs: string[] = [];
       let match;
       while ((match = refPattern.exec(snapshotText)) !== null) {
         refs.push(match[1]);
       }
+      const uniqueRefs = Array.from(new Set(refs));

And use uniqueRefs in subsequent loops.


431-441: Opening all links in new tabs should be configurable.

Forcing target=_blank changes site semantics; guard with a config flag.

-      //  Open ALL links in new tabs
+      // Optionally open links in new tabs (configurable)
+      const alwaysOpenLinksInNewTabs = this.configLoader.getBrowserConfig().alwaysOpenLinksInNewTabs === true;
@@
-      const shouldOpenNewTab = naturallyOpensNewTab || isNavigableLink;
+      const shouldOpenNewTab = naturallyOpensNewTab || (alwaysOpenLinksInNewTabs && isNavigableLink);
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 534c1c8 and 0ae1c56.

📒 Files selected for processing (1)
  • camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pytest_examples
  • GitHub Check: pre-commit
  • GitHub Check: pytest_apps
  • GitHub Check: pytest_package_fast_test
  • GitHub Check: pytest_package_llm_test
🔇 Additional comments (6)
camel/toolkits/hybrid_browser_toolkit/ts/src/browser-session.ts (6)

63-72: CDP reuse: headers applied correctly; UA warning is appropriate.

Good application of setExtraHTTPHeaders on an existing context and guarding UA.


78-86: New context: stealth injection LGTM.

Context options wiring for headers and UA is correct.


153-161: Non-persistent: stealth in context options LGTM.


300-310: Snapshot index + coordinate merge LGTM.

Single-pass index and non-destructive coordinate enrichment look good.

Also applies to: 311-339


857-904: DOM stability wait refactor LGTM.

Switch to evaluate + waitForFunction with cleanup avoids handle leaks and busy-wait.


1235-1241: Viewport math fix LGTM.

Using viewport-relative coords and removing scroll offsets is correct.

Also applies to: 1253-1254

@nitpicker55555 nitpicker55555 changed the title chore: browser_som_screenshot enhance: browser_som_screenshot Aug 29, 2025
@nitpicker55555 nitpicker55555 self-assigned this Aug 29, 2025
@Wendong-Fan Wendong-Fan merged commit 1408c25 into master Aug 29, 2025
13 checks passed
@Wendong-Fan Wendong-Fan deleted the chore/browser_som_screenshot branch August 29, 2025 17:38
@Wendong-Fan Wendong-Fan added enhancement New feature or request Tool labels Aug 29, 2025
@Wendong-Fan Wendong-Fan added this to the Sprint 36 milestone Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Tool
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants