You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。
The issue you're encountering is due to the rejectRequestPattern parameter, which is currently hardcoded to reject certain file types, including .cgi pages. This is causing the browserless service to abort requests to URLs like the one you're trying to access.
To resolve this, you can modify the code to allow the rejectRequestPattern to be set via an environment variable. This would provide flexibility to adjust the pattern as needed for different use cases.
Here's a general approach to implement this change:
Modify the Code: Update the part of the code where rejectRequestPattern is defined to check for an environment variable. If the environment variable is set, use its value; otherwise, fall back to the default pattern.
Set Environment Variable: In your deployment environment (e.g., Docker), set the environment variable with the desired pattern that includes .cgi files.
This change would involve editing the code where rejectRequestPattern is used, such as in the browserless function or related configuration files. Unfortunately, I don't have the exact location of the crawlSinglePage function in the codebase, but you can look for where rejectRequestPattern is defined and used, such as in the test file packages/web-crawler/src/crawImpl/__tests__/browserless.test.ts[1].
By making this change, you should be able to customize the request pattern to allow .cgi pages and other necessary file types.
📦 部署环境
Docker
📦 部署模式
服务端模式(lobe-chat-database 镜像)
📌 软件版本
v1.77.3
💻 系统环境
Windows
🌐 浏览器
Edge
🐛 问题描述
使用crawlSinglePage中的browserless方法(自部署)回答以下输入内容
报错如图
browserless报错如下
经分析可能是
rejectRequestPattern
拦截了必要的.cgi
页面请求导致,看了下源码这个参数是写死的,可否改成可以从环境变量传入的方式?📷 复现步骤
模型选择deepseek-r1,辅助搜索模型deepseek-v3,询问
🚦 期望结果
正确获取网页内容
📝 补充信息
No response
The text was updated successfully, but these errors were encountered: