Merge pull request #354 from d4v3y0rk/add-Kokoro-FastAPI-docs

add instructions for kokoro
open-webui · Jan 10, 2025 · ac76134 · ac76134
2 parents d07a595 + 34a958b
commit ac76134
Showing 1 changed file with 82 additions and 0 deletions.
diff --git a/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md b/docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md
@@ -0,0 +1,82 @@
+---
+sidebar_position: 2
+title: "🗨️ Kokoro-FastAPI Using Docker"
+---
+
+:::warning
+This tutorial is a community contribution and is not supported by the OpenWebUI team. It serves only as a demonstration on how to customize OpenWebUI for your specific use case. Want to contribute? Check out the contributing tutorial.
+:::
+
+# Integrating `Kokoro-FastAPI` 🗣️ with Open WebUI
+
+## What is `Kokoro-FastAPI`?
+
+[Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) is a dockerized FastAPI wrapper for the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds:
+
+- 100x+ real-time speed via HF A100
+- 35-50x+ real-time speed via 4060Ti
+- 5x+ real-time speed via M3 Pro CPU
+
+Key Features:
+- OpenAI-compatible Speech endpoint with inline voice combination
+- NVIDIA GPU accelerated or CPU Onnx inference
+- Streaming support with variable chunking
+- Multiple audio format support (mp3, wav, opus, flac, aac, pcm)
+- Web UI interface for easy testing
+- Phoneme endpoints for conversion and generation
+
+Voices:
+ - af
+ - af_bella
+ - af_nicole
+ - af_sarah
+ - af_sky
+ - am_adam
+ - am_michael
+ - bf_emma
+ - bf_isabella
+ - bf_george
+ - bf_lewis
+
+Languages:
+ - en_us
+ - en_uk
+
+## Requirements
+
+- Docker installed on your system
+- Open WebUI running
+- For GPU support: NVIDIA GPU with CUDA 12.1
+- For CPU-only: No special requirements
+
+## ⚡️ Quick start
+
+You can choose between GPU or CPU versions:
+
+```bash
+# GPU Version (Requires NVIDIA GPU with CUDA 12.1)
+docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:latest
+
+# CPU Version (ONNX optimized inference)
+docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:cpu-latest
+```
+
+## Setting up Open WebUI to use `Kokoro-FastAPI`
+
+- Open the Admin Panel and go to Settings -> Audio
+- Set your TTS Settings to match the following:
+- - Text-to-Speech Engine: OpenAI
+  - API Base URL: `http://localhost:8880/v1`
+  - API Key: `not-needed`
+  - TTS Model: `kokoro`
+  - TTS Voice: `af_bella`
+
+
+
+:::info
+The default API key is the string `not-needed`. You do not have to change that value if you do not need the added security.
+:::
+
+**And that's it!**
+
+# Please see the repo [Kokoro-FastAPI](https://github.com/Sharrnah/Kokoro-FastAPI) for instructions on how to build the docker container. (For chajnging ports etc)