Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Parsing bug in codebase indexing handle data #4932

Open
1 of 3 tasks
AfterStories opened this issue Apr 1, 2025 · 0 comments
Open
1 of 3 tasks

JSON Parsing bug in codebase indexing handle data #4932

AfterStories opened this issue Apr 1, 2025 · 0 comments
Assignees
Labels
area:indexing Relates to embedding and indexing kind:bug Indicates an unexpected problem or unintended behavior "needs-triage"

Comments

@AfterStories
Copy link

AfterStories commented Apr 1, 2025

Before submitting your bug report

Relevant environment info

- OS:win10
- Continue version:0.0.9
- IDE version:jetbrains intellij 2022.3

I'm currently using version 0.0.9 but I have check the relevant files that the latest code is same as 0.0.9

Description

When testing the codebase features, I found that the indexing process was stuck.
Finally, I found that the error occurred in this code:
binary\src\TcpMessenger.ts (also same logic is in binary\src\IpcMessenger.ts)
 
_handleLine function, caught the following error message:
 

Error parsing line:  {"messageId":"2c250064-3f76-428c-a0bf-045ef736a123","messageType":"readFile","dat
a":"package edge.el.dto\n\nuses edge.jsonmapper.JsonProperty\n\n/** Abstract definition of the expression. */\nabstract
 class ExpressionDTO {\n\n  /** Expression kind (aka constant, function application, etc...). */\n  @JsonProperty\n  pr
ivate var _kind : String as readonly Kind\n\n\n  /** Creates a new expression DTO and initializes its kind. */\n  inter
nal construct(ekind : String) {\n    this._kind \u003d ekind\n  }\n\n}\n"}{"messageId":"1e5425f7-b848-46ee-bf22-8ebbc9f
82322","messageType":"readFile","data":"package edge.el.dto\n\nuses edge.jsonmapper.JsonProperty\n\n/**\n * Expression
having a constant value.\n */\nclass ConstantExpressionDTO extends ExpressionDTO {\n\n  /** Constant type (string, numb
er, date, etc...). */\n  @JsonProperty\n  private var _type : String as readonly Type\n\n  /** Constant value. */\n  @J
sonProperty\n  private var _value : Object as readonly Value\n\n  construct(t : String, val : Object) {\n    super(\"co
nst\")\n    this._type \u003d t\n    this._value \u003d val\n  }\n}\n"} SyntaxError: Unexpected non-whitespace characte
r after JSON at position 512 (line 1 column 513)

 
 
 
According to the logic of the _handleLine function, it converts line to a JSON object and reads the value of messageType.
 
The error message SyntaxError: Unexpected non-whitespace character after JSON indicates:
 
The JSON parser successfully parsed a complete JSON object,But after the 512th character, a non-whitespace character was found
 
This is probably because the two JSON objects are directly connected:
`{... the first JSON ends}{ the second JSON begins...}``
 

To check the upstream logic, in the _handleData function, the data parameter is converted to a string and then split. The code uses d.split(/\r\n/) to split the message, but the error message indicates that there may be multiple JSON objects directly connected in the actual received data without line breaks.
 
I don't know why the source code always assumes that JSON messages are always separated by \r\n, but from the error message I got, the _handleData function may mistakenly assign multiple Json data to the lines variable.
 

chatGPT's suggested enhance code to ensure split json not only separated by \r\n

FYI

protected _handleData(data: Buffer) {
    // Convert buffer to string
    const d = data.toString();
   
    // Combine with any previously unfinished data
    let fullData = this._unfinishedLine ? this._unfinishedLine + d : d;
    this._unfinishedLine = undefined;
   
    // Variables to track JSON parsing state
    let startPos = 0;        // Starting position of current JSON object
    let depth = 0;          // Tracks nesting depth of braces {}
    let inString = false;   // Whether we're currently inside a string
    let escapeNext = false; // Whether the next character is escaped
   
    // Scan through each character in the data
    for (let i = 0; i < fullData.length; i++) {
        const char = fullData[i];
       
        // Skip this character if it's escaped
        if (escapeNext) {
            escapeNext = false;
            continue;
        }
       
        // Check for escape character within a string
        if (char === '\\' && inString) {
            escapeNext = true;
            continue;
        }
       
        // Toggle string state when encountering quotes (not escaped)
        if (char === '"') {
            inString = !inString;
            continue;
        }
       
        // Only process braces when not inside a string
        if (!inString) {
            if (char === '{') {
                depth++;
            } else if (char === '}') {
                depth--;
               
                // We found a complete JSON object when depth returns to 0
                if (depth === 0 && startPos <= i) {
                    const jsonStr = fullData.substring(startPos, i + 1);
                    try {
                        // Validate that this is a valid JSON object
                        JSON.parse(jsonStr);
                        // Process this JSON object
                        this._handleLine(jsonStr);
                        // Update start position for the next JSON object
                        startPos = i + 1;
                    } catch (e) {
                        // If parsing fails, this might not be a valid JSON boundary
                        // Continue processing without updating startPos
                        console.warn("JSON validation failed, continuing to find valid JSON boundary");
                    }
                }
            }
        }
    }
   
    // Save any unfinished JSON data for the next data chunk
    if (startPos < fullData.length) {
        this._unfinishedLine = fullData.substring(startPos);
    } else {
        this._unfinishedLine = undefined;
    }
}
 
 
@sestinj sestinj self-assigned this Apr 1, 2025
@dosubot dosubot bot added area:indexing Relates to embedding and indexing kind:bug Indicates an unexpected problem or unintended behavior labels Apr 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:indexing Relates to embedding and indexing kind:bug Indicates an unexpected problem or unintended behavior "needs-triage"
Projects
None yet
Development

No branches or pull requests

2 participants