Skip to content

Commit

Permalink
Merge pull request #354 from d4v3y0rk/add-Kokoro-FastAPI-docs
Browse files Browse the repository at this point in the history
add instructions for kokoro
  • Loading branch information
tjbck authored Jan 10, 2025
2 parents d07a595 + 34a958b commit ac76134
Showing 1 changed file with 82 additions and 0 deletions.
82 changes: 82 additions & 0 deletions docs/tutorials/text-to-speech/Kokoro-FastAPI-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
sidebar_position: 2
title: "🗨️ Kokoro-FastAPI Using Docker"
---

:::warning
This tutorial is a community contribution and is not supported by the OpenWebUI team. It serves only as a demonstration on how to customize OpenWebUI for your specific use case. Want to contribute? Check out the contributing tutorial.
:::

# Integrating `Kokoro-FastAPI` 🗣️ with Open WebUI

## What is `Kokoro-FastAPI`?

[Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) is a dockerized FastAPI wrapper for the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model that implements the OpenAI API endpoint specification. It offers high-performance text-to-speech with impressive generation speeds:

- 100x+ real-time speed via HF A100
- 35-50x+ real-time speed via 4060Ti
- 5x+ real-time speed via M3 Pro CPU

Key Features:
- OpenAI-compatible Speech endpoint with inline voice combination
- NVIDIA GPU accelerated or CPU Onnx inference
- Streaming support with variable chunking
- Multiple audio format support (mp3, wav, opus, flac, aac, pcm)
- Web UI interface for easy testing
- Phoneme endpoints for conversion and generation

Voices:
- af
- af_bella
- af_nicole
- af_sarah
- af_sky
- am_adam
- am_michael
- bf_emma
- bf_isabella
- bf_george
- bf_lewis

Languages:
- en_us
- en_uk

## Requirements

- Docker installed on your system
- Open WebUI running
- For GPU support: NVIDIA GPU with CUDA 12.1
- For CPU-only: No special requirements

## ⚡️ Quick start

You can choose between GPU or CPU versions:

```bash
# GPU Version (Requires NVIDIA GPU with CUDA 12.1)
docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:latest

# CPU Version (ONNX optimized inference)
docker run -d -p 8880:8880 -p 7860:7860 remsky/kokoro-fastapi:cpu-latest
```

## Setting up Open WebUI to use `Kokoro-FastAPI`

- Open the Admin Panel and go to Settings -> Audio
- Set your TTS Settings to match the following:
- - Text-to-Speech Engine: OpenAI
- API Base URL: `http://localhost:8880/v1`
- API Key: `not-needed`
- TTS Model: `kokoro`
- TTS Voice: `af_bella`



:::info
The default API key is the string `not-needed`. You do not have to change that value if you do not need the added security.
:::

**And that's it!**

# Please see the repo [Kokoro-FastAPI](https://github.com/Sharrnah/Kokoro-FastAPI) for instructions on how to build the docker container. (For chajnging ports etc)

0 comments on commit ac76134

Please sign in to comment.