Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vercel 部署 边缘函数初始响应超时 #7207

Open
sxjeru opened this issue Mar 28, 2025 · 4 comments
Open

Vercel 部署 边缘函数初始响应超时 #7207

sxjeru opened this issue Mar 28, 2025 · 4 comments
Labels
⚡️ Performance Performance issue | 性能问题 unconfirm 未被维护者确认的问题

Comments

@sxjeru
Copy link
Contributor

sxjeru commented Mar 28, 2025

📦 部署环境

Vercel

📦 部署模式

服务端模式(lobe-chat-database 镜像)

📌 软件版本

latest

💻 系统环境

Windows

🌐 浏览器

Chrome

🐛 问题描述

https://vercel.com/docs/functions/runtimes/edge/edge-functions#streaming

从 Vercel 文档可知,Edge Functions 需要在 25s 内开始发送一个初始响应,否则会直接返回超时报错。

但是目前有越来越多的推理模型上线,它们的首字响应很可能超出这个限制,导致客户端这边已经超时截断了,但 api 那边却开始输出,白白浪费 tokens。

咱也不清楚下面方法的可行性:能否针对推理模型,伪造一个初始响应,让 Vercel Edge Functions 以为已经开始流式输出了,那样大概就不会因初始响应超时而截断了。

Image

📷 复现步骤

No response

🚦 期望结果

No response

📝 补充信息

该问题目前可以通过启用“客户端请求”解决,或者用 docker 方式搭建。

另外也建议在等待时添加一些提示语,避免长时间等待让用户误以为是网络问题啥的。

Image

@sxjeru sxjeru added the unconfirm 未被维护者确认的问题 label Mar 28, 2025
@lobehubbot
Copy link
Member

👀 @sxjeru

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible.
Please make sure you have given us as much context as possible.
非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。

Copy link

dosubot bot commented Mar 28, 2025

I found a similar closed issue that addresses the problem of Vercel's 25-second limit on the initial response from Edge Functions. The suggested solution involves leveraging the streaming capabilities of the Lobe Chat framework to ensure that the response starts streaming back to the client before the 25-second limit is reached. Here are the key steps:

  1. Enable Streaming: Set the stream parameter to true when making requests to the OpenAI API.

    const response = await openai.chat.completions.create(
      {
        messages: formatMessages,
        ...params,
        stream: true,
      },
      { headers: { Accept: '*/*' } },
    );
  2. Handle Streaming Responses: Use the OpenAIStream utility to process the streaming response and return it as a StreamingTextResponse.

    const stream = OpenAIStream(response);
    return new StreamingTextResponse(stream);
  3. Frontend Handling: Ensure that the frontend is set up to handle streaming responses using the fetchSSE method.

    export const fetchSSE = async (fetchFn: () => Promise<Response>, options: FetchSSEOptions = {}) => {
      const response = await fetchFn();
    
      if (!response.ok) {
        const chatMessageError = await getMessageError(response);
        options.onErrorHandle?.(chatMessageError);
        return;
      }
    
      const returnRes = response.clone();
      const data = response.body;
    
      if (!data) return;
      const reader = data.getReader();
      const decoder = new TextDecoder();
    
      let done = false;
    
      while (!done) {
        const { value, done: doneReading } = await reader.read();
        done = doneReading;
        const chunkValue = decoder.decode(value, { stream: true });
    
        options.onMessageHandle?.(chunkValue);
      }
    
      await options?.onFinish?.(output);
    
      return returnRes;
    };

By implementing these steps, you can avoid the 504 Gateway Timeout error by ensuring that the response starts streaming back to the client within the 25-second limit [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@dosubot dosubot bot added the ⚡️ Performance Performance issue | 性能问题 label Mar 28, 2025
@arvinxx
Copy link
Contributor

arvinxx commented Mar 29, 2025

能否针对推理模型,伪造一个初始响应,让 Vercel Edge Functions 以为已经开始流式输出了

我去,这个idea 妙啊,我之前也在在纠结怎么处理这个问题。这么看感觉应该可行的

@lobehubbot
Copy link
Member

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Can you fake an initial response for the inference model, so that Vercel Edge Functions thinks that streaming output has started

I'll go, this idea is amazing. I've been struggling with how to deal with this problem before. It seems that this should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ Performance Performance issue | 性能问题 unconfirm 未被维护者确认的问题
Projects
None yet
Development

No branches or pull requests

3 participants