High latency in gRPC calls upon new TCP connections with Tokio-based gateway #7076

yicixin · 2025-01-08T07:00:55Z

yicixin
Jan 8, 2025

Background
I'm building a TCP long-lived connection gateway in Rust using Tokio. Each incoming TCP connection is accepted in one task, and then I spawn a new task to handle the per-connection loop. Inside that loop, I forward each request via gRPC (using Tonic) to a backend service.

The code almost like this:

#[tokio::main]
async fn main() -> Result<()> {
    run().await
}

async fn run() -> Result<()> {
    let channel = Endpoint::from_static("http://grpc-server")
        .connect()
        .await?;

    let listener = TcpListener::bind("0.0.0.0:8080").await?;
    loop {
        let (tcp_stream, _) = listener.accept().await?;

        let client = backend::backend_client::BackendClient::new(channel.clone());
        tokio::spawn(async move {
            handle(tcp_stream, client).await;
        });
    }
}

async fn handle(
    mut tcp_stream: TcpStream,
    client: backend::backend_client::BackendClient<tonic::transport::Channel>,
) {
    loop {
        let mut buf = BytesMut::with_capacity(1024);
        let n = tcp_stream.read_buf(&mut buf).await.expect("read failed");
        if n == 0 {
            break;
        }

        let start = Instant::now();
        grpc_call(buf, client.clone()).await;
        let duration = start.elapsed();
        if duration.as_millis() > 100 {
            println!("slow request: {:?}", duration);
        }
    }
}

async fn grpc_call(
    body: BytesMut,
    mut client: backend::backend_client::BackendClient<tonic::transport::Channel>,
) {
    let req = backend::Message {
        message: body.freeze()
    };
    client.do_request(req).await.expect("rpc failed");
}

Problem
When a large number of tcp connections are established, I notice that the few gRPC requests experience a significant latency spike . Under higher concurrency, this spike becomes more noticeable.

In my tests, I replaced the gRPC server with a mock implementation that essentially does nothing. In fact, I even replaced the entire grpc_call logic with tokio::time::sleep to simulate a fixed delay. However, the measured latency is still much higher than the specified sleep duration, indicating that the overhead doesn’t come from the backend itself. Therefore, I strongly suspect that creating a new task for each connection might cause scheduling overhead or delays within the Tokio runtime, leading to these inflated response times.

I can also confirm that the increased gRPC latency is not due to excessive concurrency. I created a batch of test clients that generate requests at a stable RPS, and the latency only spikes during connection or disconnection.. Once past that phase, the gRPC request latency remains consistently low, indicating that at this level of concurrency, there's no bottleneck with the gRPC calls.

Question
So, how do I avoid this high latency? I would rather trade the speed of tcp connection establishment for the stability of latency, I know that limiting the rate of tcp accept is a simple solution, but is there a better option, because choosing the right rate is always difficult.

Darksonn · 2025-01-08T07:11:08Z

Darksonn
Jan 8, 2025
Maintainer

An easy thing you can try is to move the accept loop to a worker thread:

#[tokio::main]
async fn main() -> Result<()> {
    tokio::spawn(run()).await.unwrap()
}

1 reply

yicixin Jan 8, 2025
Author

Very useful advice👍, in the case of sleep, this change works, but in my actual project, accept is already executed with tokio::spawn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High latency in gRPC calls upon new TCP connections with Tokio-based gateway #7076

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

High latency in gRPC calls upon new TCP connections with Tokio-based gateway #7076

yicixin Jan 8, 2025

Replies: 1 comment · 1 reply

Darksonn Jan 8, 2025 Maintainer

yicixin Jan 8, 2025 Author

yicixin
Jan 8, 2025

Replies: 1 comment 1 reply

Darksonn
Jan 8, 2025
Maintainer

yicixin Jan 8, 2025
Author