Skip to content

Conversation

chezsmithy
Copy link
Contributor

@chezsmithy chezsmithy commented Aug 24, 2025

Description

Performed extensive analysis of common grep scenarios being attempted by the LLM and researched common features enabled in grep tools in other code assistants.

Implementing recommended changes to allow the LLM to request specific arguments like -A and -B or -l for example to research code just before or just after.

Implementating scoping by add an optional path parameter so the LLM doesn't need to always search the whole repo which might result in truncation. Also allow scoping searches to specific files to avoid the need to always read the whole file.

AI Code Review

  • Team members only: AI review runs automatically when PR is opened or marked ready for review
  • Team members can also trigger a review by commenting @continue-general-review or @continue-detailed-review

Checklist

  • I've read the contributing guide
  • The relevant docs, if any, have been updated or created
  • The relevant tests, if any, have been updated or created

Screen recording or screenshot

[ When applicable, please include a short screen recording or screenshot - this makes it much easier for us as contributors to review and understand your changes. See this PR as a good example. ]

Tests

Added exstensive tests.
Perfomed extensive manual testing by asking the LLM to generate common lsTool type queries against the test repo and observed the outcomes expecting full passes.


Summary by cubic

Refactored grep search to accept full ripgrep args and added path scoping, context flags, and -l support. This makes searches more precise, reduces noise, and helps avoid repo-wide scans.

  • New Features

    • Grep tool accepts args[] and path; buildRipgrepArgs composes defaults (-i, ignore files, --heading, context), merges -A/-B/-C, and supports -l. Inputs are sanitized and paths validated (no absolute/traversal).
    • IDE API change: getSearchResults now takes rgArgs: string[] (plus optional maxResults); VS Code and IntelliJ forward args directly to ripgrep.
    • Improved formatting handles file-only (-l) output and standard results, with sensible truncation.
  • Migration

    • Update any custom IDE/protocol implementations to use getSearchResults(rgArgs, maxResults).
    • Use buildRipgrepArgs(query, { extraArgs, path, maxResults }) when invoking searches; grepSearch tool now supports args[] and path.

@chezsmithy chezsmithy requested a review from a team as a code owner August 24, 2025 04:32
@chezsmithy chezsmithy removed the request for review from a team August 24, 2025 04:32
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Aug 24, 2025
@chezsmithy chezsmithy changed the title Codex/refactor-grep-implementation-in-extensions feat: ✨ Refactor and improve grepTool Aug 24, 2025
@chezsmithy
Copy link
Contributor Author

chezsmithy commented Aug 25, 2025

@RomneyDa thoughts on these additional enhancements? I wanted to solve for a number of use cases I saw attempted but failed. The big ones were -A -B, -l and path scoped searches. Note often I see failed grep attempts by Claude Sonnet with fallback to full file reads or terminal use.

@chezsmithy chezsmithy force-pushed the codex/refactor-grep-implementation-in-extensions branch from aede3e7 to bfae80b Compare August 26, 2025 03:44
try {
// const { query, warning } = prepareQueryForRipgrep(rawQuery);
Copy link
Contributor Author

@chezsmithy chezsmithy Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RomneyDa in my testing, I noted adding this check in before processing reduces the number of successful queries slightly. Does it make sense to include it? Without it I see a higher degree of success, now this is with Claude Sonnet 4 which might be more reliable at building regex.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does it reduce successful queries? Might be a specific bug with the prepareQueryForRipgrep function? We could look at fixing/adding a test case. The preparation was introduced because even claude 4 regularly outputs queries in a way where the JSON parsed pattern is invalid regex. Not sure if it's a prompting issue or model issue or parsing issue but prepareQueryForRipgrep resolves most cases

Open to other approaches especially if seems to be a prompting issue or JSON parsing issue

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note comment from first review about the systemMessageDescription
Partly requesting changes because we're still not sure we want another tool that uses array args for now, because of issues with array args when using system message tools, and many models just aren't great with array args. Seems like a definite improvement for e.g. claude 4. I think we'd want a simple pattern grep search tool available for some models. Might make sense to more or less duplicate and use config-dependent to only add the fleshed out grep search tool for great models. Thoughts?

exampleArgs: [["query", ".*main_services.*"]],
exampleArgs: [
["query", ".*main_services.*"],
["query", "throw new Error"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path?
EDIT: to clarify, I mean the system message tools example for this will not be valid. missing args "path" and "args"

try {
// const { query, warning } = prepareQueryForRipgrep(rawQuery);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases does it reduce successful queries? Might be a specific bug with the prepareQueryForRipgrep function? We could look at fixing/adding a test case. The preparation was introduced because even claude 4 regularly outputs queries in a way where the JSON parsed pattern is invalid regex. Not sure if it's a prompting issue or model issue or parsing issue but prepareQueryForRipgrep resolves most cases

Open to other approaches especially if seems to be a prompting issue or JSON parsing issue

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Issues and PRs Aug 27, 2025
@chezsmithy
Copy link
Contributor Author

@RomneyDa got it! Why don't I look at if it's possible to scrape the args and path out of the query string too. I like the idea of maybe two tools and applying by model. I'll take your lead on what you are doing for the search and replace work if that makes sense?

@Patrick-Erichsen Patrick-Erichsen removed their request for review August 29, 2025 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants