feat: improve models page

elevenlabs · Dec 20, 2024 · c9a0e6f · c9a0e6f
1 parent 08ece38
commit c9a0e6f
Show file tree

Hide file tree

Showing 4 changed files with 214 additions and 182 deletions.
diff --git a/fern/changelog/2024-12-19.md b/fern/changelog/2024-12-19.md
@@ -1,6 +1,6 @@
 ## Model
 
-- **Introducing Flash TTS Model**: Our fastest text-to-speech model yet, generating speech in just 75ms. Available in two versions: Flash v2 (English-only) and Flash v2.5 (32 languages). Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).
+- **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).
 
 ## Launches
 

diff --git a/fern/developer-guides/images/model-latency.webp b/fern/developer-guides/images/model-latency.webp
diff --git a/fern/developer-guides/models.mdx b/fern/developer-guides/models.mdx
@@ -3,40 +3,43 @@ title: Models
 slug: developer-guides/models
 ---
 
-ElevenLabs is the leading provider of AI-powered audio technology. This guide helps developers choose the right model for their use case.
+## Flagship Models
 
 <CardGroup cols={2}>
-  <Card title="Eleven Multilingual v2">
-    Our most advanced speech synthesis model.
-    <br />
-    <Icon icon="check" iconType="solid" /> Highest realism, emotional range
-
-    <Icon icon="check" iconType="solid" /> Best for voiceovers, audiobooks, content
-
-    <Icon icon="check" iconType="solid" /> Multilingual
-
+  <Card title="Eleven Multilingual v2" href="#eleven-multilingual-v2">
+    Our most lifelike, emotionally rich speech synthesis model
+    <div className="mt-4 space-y-2">
+      <div className="text-sm">Most natural-sounding output</div>
+      <div className="text-sm">32 languages supported</div>
+      <div className="text-sm">10,000 character limit</div>
+      <div className="text-sm">Higher price per character</div>
+    </div>
+  </Card>
+  <Card title="Eleven Flash v2.5" href="#eleven-flash-v25">
+    Our fast, affordable speech synthesis model
+    <div className="mt-4 space-y-2">
+      <div className="text-sm">Ultra-low latency (~75ms&dagger;)</div>
+      <div className="text-sm">32 languages supported</div>
+      <div className="text-sm">40,000 character limit</div>
+      <div className="text-sm">Faster model, 50% lower price per character</div>
+    </div>
   </Card>
-  <Card title="Eleven v2.5 Flash">
-  Our lowest latency model.
-   <br />
-    <Icon icon="check" iconType="solid" /> ~75ms latency (excl. network)
-
-    <Icon icon="check" iconType="solid" /> Ideal for real-time conversational AI
-
-    <Icon icon="check" iconType="solid" /> Multilingual
-
-</Card>
 </CardGroup>
+<div className="text-center">
+  [Model pricing details](https://elevenlabs.io/pricing)
+</div>
 
-# Flagship Models
+## Models Overview
 
-| Model ID                     | Description                                | Max Characters | Languages    | Best For                                                    |
-| ---------------------------- | ------------------------------------------ | -------------- | ------------ | ----------------------------------------------------------- |
-| `eleven_multilingual_v2`     | Most life-like, emotionally rich model     | 10,000         | 29 languages | Voice overs, audiobooks, content creation                   |
-| `eleven_flash_v2_5`          | High quality, low-latency model (~75ms)    | 40,000         | 32 languages | Developer use cases requiring speed and multiple languages  |
-| `eleven_flash_v2`            | High quality, low-latency model (~75ms)    | 30,000         | English only | Developer use cases requiring speed (English only)          |
-| `eleven_english_sts_v2`      | State-of-the-art speech-to-speech          | 5,000          | English only | Maximum control over content and prosody                    |
-| `eleven_multilingual_sts_v2` | Cutting-edge multilingual speech-to-speech | 10,000         | 29 languages | Advanced multilingual speech synthesis with prosody control |
+The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.
+
+| Model ID                     | Description                                                          | Languages |
+| ---------------------------- | -------------------------------------------------------------------- | --------- |
+| `eleven_multilingual_v2`     | Our most lifelike model with rich emotional expression               | 32        |
+| `eleven_flash_v2_5`          | Ultra-fast model optimized for real-time use (~75ms&dagger;)         | 32        |
+| `eleven_flash_v2`            | Ultra-fast model optimized for real-time use (~75ms&dagger;)         | English   |
+| `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | 29        |
+| `eleven_english_sts_v2`      | English-only voice changer model (Speech to Speech)                  | English   |
 
 <Accordion title="Older Models">
 
@@ -46,37 +49,67 @@ These models are maintained for backward compatibility but are not recommended f
 
 </Warning>
 
-| Model ID                 | Description                                    | Max Characters | Languages    | Best For                                  |
-| ------------------------ | ---------------------------------------------- | -------------- | ------------ | ----------------------------------------- |
-| `eleven_monolingual_v1`  | First generation TTS model                     | 10,000         | English only | Legacy model (outclassed by v2 models)    |
-| `eleven_multilingual_v1` | First multilingual model                       | 10,000         | 9 languages  | Legacy model (outclassed by v2 models)    |
-| `eleven_turbo_v2_5`      | High quality, low-latency model (~250ms-300ms) | 40,000         | 32 languages | Legacy model (outclassed by Flash models) |
-| `eleven_turbo_v2`        | High quality, low-latency model (~250ms-300ms) | 30,000         | English only | Legacy model (outclassed by Flash models) |
+| Model ID                 | Description                                                                 | Languages |
+| ------------------------ | --------------------------------------------------------------------------- | --------- |
+| `eleven_monolingual_v1`  | First generation TTS model (outclassed by v2 models)                        | English   |
+| `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models)                          | 9         |
+| `eleven_turbo_v2_5`      | High quality, low-latency model (~250ms-300ms) (outclassed by Flash models) | 32        |
+| `eleven_turbo_v2`        | High quality, low-latency model (~250ms-300ms) (outclassed by Flash models) | English   |
 
 </Accordion>
 
-# Model Selection Guide
-
-Choose your model based on these primary considerations:
-
-<Steps>
-  <Step title="Latency Requirements">
-    • *Quality over Speed?* Use Standard Multilingual models <br /> • *Need
-    real-time?* Use Flash models
-  </Step>
-  <Step title="Language Support">
-    • *English only?* → Consider `eleven_flash_v2` <br /> • *Multiple
-    languages?* → Use `eleven_multilingual_v2` or `eleven_flash_v2_5`
-  </Step>
-  <Step title="Use Case">
-    • *Content Creation* → `eleven_multilingual_v2` <br /> • *Conversational AI*
-    → `eleven_flash_v2_5` <br /> • *Professional Voice Clones* → Either model{" "}
-    <br /> • *Speech to Speech?* → `eleven_english_sts_v2` or
-    `eleven_multilingual_sts_v2` family
-  </Step>
-</Steps>
+## Eleven Multilingual v2
+
+Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
+
+The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent.
+
+This model excels in scenarios requiring high-quality, emotionally nuanced speech:
+
+- **Audiobook Production**: Perfect for long-form narration with complex emotional delivery
+- **Character Voiceovers**: Ideal for gaming and animation due to its emotional range
+- **Professional Content**: Well-suited for corporate videos and e-learning materials
+- **Multilingual Projects**: Maintains consistent voice quality across language switches
+
+While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.
+
+## Eleven Flash v2.5
+
+Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms&dagger;) across 32 languages.
+
+The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.
+
+This model is particularly well-suited for:
+
+- **Conversational AI**: Perfect for real-time voice agents and chatbots
+- **Interactive Applications**: Ideal for games and applications requiring immediate response
+- **Large-Scale Processing**: Efficient for bulk text-to-speech conversion
+
+With its lower price point and 75ms latency, Flash v2.5 is the cost-effective choice for developers needing fast, reliable speech synthesis across multiple languages.
+
+<Frame background="subtle">
+  <img src="/developer-guides/images/model-latency.webp" />
+</Frame>
+
+## Model Selection Guide
+
+### Requirements
+**Quality:** Use `eleven_multilingual_v2`
+
+**Low-latency:** Use Eleven Flash models
+
+**Multilingual support:** Use `eleven_multilingual_v2` or `eleven_flash_v2_5`
+
+### Use Case
+**Content Creation:** Use `eleven_multilingual_v2`  
+
+**Conversational AI:** Use `eleven_flash_v2_5` or `eleven_flash_v2`
+
+**Voice Changer (Speech to Speech):** Use `eleven_multilingual_sts_v2`
 
 <Note>
   For detailed language support information and troubleshooting guidance, refer
   to our [help documentation](https://help.elevenlabs.io).
 </Note>
+
+&dagger; Excluding application & network latency