Skip to content

Pinned Loading

  1. vllm vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 56.2k 9.6k

  2. llm-compressor llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    Python 1.8k 213

  3. recipes recipes Public

    Common recipes to run vLLM

    111 25

Repositories

Showing 10 of 21 repositories
  • vllm-ascend Public

    Community maintained hardware plugin for vLLM on Ascend

    vllm-project/vllm-ascend’s past year of commit activity
    Python 1,033 Apache-2.0 366 340 (5 issues need help) 140 Updated Aug 25, 2025
  • vllm-spyre Public

    Community maintained hardware plugin for vLLM on Spyre

    vllm-project/vllm-spyre’s past year of commit activity
    Python 32 Apache-2.0 21 7 15 Updated Aug 25, 2025
  • vllm-gaudi Public

    Community maintained hardware plugin for vLLM on Intel Gaudi

    vllm-project/vllm-gaudi’s past year of commit activity
    Python 8 26 1 22 Updated Aug 25, 2025
  • aibrix Public

    Cost-efficient and pluggable Infrastructure components for GenAI inference

    vllm-project/aibrix’s past year of commit activity
    Go 4,087 Apache-2.0 433 213 (21 issues need help) 17 Updated Aug 25, 2025
  • vllm Public

    A high-throughput and memory-efficient inference and serving engine for LLMs

    vllm-project/vllm’s past year of commit activity
    Python 56,245 Apache-2.0 9,639 1,775 (16 issues need help) 1,017 Updated Aug 25, 2025
  • vllm-xpu-kernels Public

    The vLLM XPU kernels for Intel GPU

    vllm-project/vllm-xpu-kernels’s past year of commit activity
    Python 5 Apache-2.0 8 0 7 Updated Aug 25, 2025
  • flash-attention Public Forked from Dao-AILab/flash-attention

    Fast and memory-efficient exact attention

    vllm-project/flash-attention’s past year of commit activity
    Python 88 BSD-3-Clause 1,930 0 13 Updated Aug 24, 2025
  • llm-compressor Public

    Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

    vllm-project/llm-compressor’s past year of commit activity
    Python 1,839 Apache-2.0 213 50 (7 issues need help) 34 Updated Aug 23, 2025
  • vllm-project/vllm-project.github.io’s past year of commit activity
    HTML 14 20 0 0 Updated Aug 23, 2025
  • production-stack Public

    vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

    vllm-project/production-stack’s past year of commit activity
    Python 1,710 Apache-2.0 265 71 (3 issues need help) 45 Updated Aug 23, 2025