Skip to content

Commit

Permalink
feat: improve models page
Browse files Browse the repository at this point in the history
  • Loading branch information
louisjoecodes committed Dec 20, 2024
1 parent 08ece38 commit c9a0e6f
Show file tree
Hide file tree
Showing 4 changed files with 214 additions and 182 deletions.
2 changes: 1 addition & 1 deletion fern/changelog/2024-12-19.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Model

- **Introducing Flash TTS Model**: Our fastest text-to-speech model yet, generating speech in just 75ms. Available in two versions: Flash v2 (English-only) and Flash v2.5 (32 languages). Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).
- **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).

## Launches

Expand Down
Binary file added fern/developer-guides/images/model-latency.webp
Binary file not shown.
141 changes: 87 additions & 54 deletions fern/developer-guides/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,40 +3,43 @@ title: Models
slug: developer-guides/models
---

ElevenLabs is the leading provider of AI-powered audio technology. This guide helps developers choose the right model for their use case.
## Flagship Models

<CardGroup cols={2}>
<Card title="Eleven Multilingual v2">
Our most advanced speech synthesis model.
<br />
<Icon icon="check" iconType="solid" /> Highest realism, emotional range

<Icon icon="check" iconType="solid" /> Best for voiceovers, audiobooks, content

<Icon icon="check" iconType="solid" /> Multilingual

<Card title="Eleven Multilingual v2" href="#eleven-multilingual-v2">
Our most lifelike, emotionally rich speech synthesis model
<div className="mt-4 space-y-2">
<div className="text-sm">Most natural-sounding output</div>
<div className="text-sm">32 languages supported</div>
<div className="text-sm">10,000 character limit</div>
<div className="text-sm">Higher price per character</div>
</div>
</Card>
<Card title="Eleven Flash v2.5" href="#eleven-flash-v25">
Our fast, affordable speech synthesis model
<div className="mt-4 space-y-2">
<div className="text-sm">Ultra-low latency (~75ms&dagger;)</div>
<div className="text-sm">32 languages supported</div>
<div className="text-sm">40,000 character limit</div>
<div className="text-sm">Faster model, 50% lower price per character</div>
</div>
</Card>
<Card title="Eleven v2.5 Flash">
Our lowest latency model.
<br />
<Icon icon="check" iconType="solid" /> ~75ms latency (excl. network)

<Icon icon="check" iconType="solid" /> Ideal for real-time conversational AI

<Icon icon="check" iconType="solid" /> Multilingual

</Card>
</CardGroup>
<div className="text-center">
[Model pricing details](https://elevenlabs.io/pricing)
</div>

# Flagship Models
## Models Overview

| Model ID | Description | Max Characters | Languages | Best For |
| ---------------------------- | ------------------------------------------ | -------------- | ------------ | ----------------------------------------------------------- |
| `eleven_multilingual_v2` | Most life-like, emotionally rich model | 10,000 | 29 languages | Voice overs, audiobooks, content creation |
| `eleven_flash_v2_5` | High quality, low-latency model (~75ms) | 40,000 | 32 languages | Developer use cases requiring speed and multiple languages |
| `eleven_flash_v2` | High quality, low-latency model (~75ms) | 30,000 | English only | Developer use cases requiring speed (English only) |
| `eleven_english_sts_v2` | State-of-the-art speech-to-speech | 5,000 | English only | Maximum control over content and prosody |
| `eleven_multilingual_sts_v2` | Cutting-edge multilingual speech-to-speech | 10,000 | 29 languages | Advanced multilingual speech synthesis with prosody control |
The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.

| Model ID | Description | Languages |
| ---------------------------- | -------------------------------------------------------------------- | --------- |
| `eleven_multilingual_v2` | Our most lifelike model with rich emotional expression | 32 |
| `eleven_flash_v2_5` | Ultra-fast model optimized for real-time use (~75ms&dagger;) | 32 |
| `eleven_flash_v2` | Ultra-fast model optimized for real-time use (~75ms&dagger;) | English |
| `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | 29 |
| `eleven_english_sts_v2` | English-only voice changer model (Speech to Speech) | English |

<Accordion title="Older Models">

Expand All @@ -46,37 +49,67 @@ These models are maintained for backward compatibility but are not recommended f

</Warning>

| Model ID | Description | Max Characters | Languages | Best For |
| ------------------------ | ---------------------------------------------- | -------------- | ------------ | ----------------------------------------- |
| `eleven_monolingual_v1` | First generation TTS model | 10,000 | English only | Legacy model (outclassed by v2 models) |
| `eleven_multilingual_v1` | First multilingual model | 10,000 | 9 languages | Legacy model (outclassed by v2 models) |
| `eleven_turbo_v2_5` | High quality, low-latency model (~250ms-300ms) | 40,000 | 32 languages | Legacy model (outclassed by Flash models) |
| `eleven_turbo_v2` | High quality, low-latency model (~250ms-300ms) | 30,000 | English only | Legacy model (outclassed by Flash models) |
| Model ID | Description | Languages |
| ------------------------ | --------------------------------------------------------------------------- | --------- |
| `eleven_monolingual_v1` | First generation TTS model (outclassed by v2 models) | English |
| `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models) | 9 |
| `eleven_turbo_v2_5` | High quality, low-latency model (~250ms-300ms) (outclassed by Flash models) | 32 |
| `eleven_turbo_v2` | High quality, low-latency model (~250ms-300ms) (outclassed by Flash models) | English |

</Accordion>

# Model Selection Guide

Choose your model based on these primary considerations:

<Steps>
<Step title="Latency Requirements">
*Quality over Speed?* Use Standard Multilingual models <br /> • *Need
real-time?* Use Flash models
</Step>
<Step title="Language Support">
*English only?* → Consider `eleven_flash_v2` <br /> • *Multiple
languages?* → Use `eleven_multilingual_v2` or `eleven_flash_v2_5`
</Step>
<Step title="Use Case">
*Content Creation*`eleven_multilingual_v2` <br /> • *Conversational AI*
`eleven_flash_v2_5` <br /> • *Professional Voice Clones* → Either model{" "}
<br /> • *Speech to Speech?*`eleven_english_sts_v2` or
`eleven_multilingual_sts_v2` family
</Step>
</Steps>
## Eleven Multilingual v2

Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.

The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent.

This model excels in scenarios requiring high-quality, emotionally nuanced speech:

- **Audiobook Production**: Perfect for long-form narration with complex emotional delivery
- **Character Voiceovers**: Ideal for gaming and animation due to its emotional range
- **Professional Content**: Well-suited for corporate videos and e-learning materials
- **Multilingual Projects**: Maintains consistent voice quality across language switches

While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.

## Eleven Flash v2.5

Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms&dagger;) across 32 languages.

The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.

This model is particularly well-suited for:

- **Conversational AI**: Perfect for real-time voice agents and chatbots
- **Interactive Applications**: Ideal for games and applications requiring immediate response
- **Large-Scale Processing**: Efficient for bulk text-to-speech conversion

With its lower price point and 75ms latency, Flash v2.5 is the cost-effective choice for developers needing fast, reliable speech synthesis across multiple languages.

<Frame background="subtle">
<img src="/developer-guides/images/model-latency.webp" />
</Frame>

## Model Selection Guide

### Requirements
**Quality:** Use `eleven_multilingual_v2`

**Low-latency:** Use Eleven Flash models

**Multilingual support:** Use `eleven_multilingual_v2` or `eleven_flash_v2_5`

### Use Case
**Content Creation:** Use `eleven_multilingual_v2`

**Conversational AI:** Use `eleven_flash_v2_5` or `eleven_flash_v2`

**Voice Changer (Speech to Speech):** Use `eleven_multilingual_sts_v2`

<Note>
For detailed language support information and troubleshooting guidance, refer
to our [help documentation](https://help.elevenlabs.io).
</Note>

&dagger; Excluding application & network latency
Loading

0 comments on commit c9a0e6f

Please sign in to comment.