Local LLM Server with NPU Acceleration
-
Updated
Apr 25, 2025 - Python
Local LLM Server with NPU Acceleration
Function-calling API for LLM from multiple providers
A flexible FastAPI-based framework for handling AI tasks using Large Language Models (LLMs). Supports multiple providers, extensible tasks and routers, Redis caching, and OpenAI integration. Easily scalable for various LLM-based applications.
Add a description, image, and links to the llm-server topic page so that developers can more easily learn about it.
To associate your repository with the llm-server topic, visit your repo's landing page and select "manage topics."