Skip to content

Commit

Permalink
com.rest.elevenlabs 3.4.0 (#100)
Browse files Browse the repository at this point in the history
- com.utilities.rest -> 3.3.0
- com.utilities.encoder.ogg -> 4.0.2
- Added additional request properties for TextToSpeechRequest
  - `previous_text`, `next_text`, `previous_request_ids`, `next_request_ids`, `languageCode`, `withTimestamps`
  - `cacheFormat` which can be `None`, `Wav`, or `Ogg`
- Added support for transcription timestamps by @tomkail
- Added support for language code in TextToSpeechRequest @Mylan719
- Refactored `VoiceClip`
  - clip samples and data are now prioritized over the `AudioClip`
    - audioClip will not be created until you access the `VoiceClip.AudioClip` property
    - if an audio clip is not loaded, you can load it with `LoadCachedAudioClipAsync`
 - Refactored demo scene to use `OnAudioFilterRead` to better quality stream playback

---------

Co-authored-by: Milan Mikuš <mylan719@gmail.com>
Co-authored-by: Milan Mikuš <milan.mikus@riganti.cz>
Co-authored-by: Tom Kail <thomas.kail@betterup.co>
Co-authored-by: Tom Kail <tkail92@gmail.com>
  • Loading branch information
5 people authored Nov 25, 2024
1 parent 4600769 commit fd14bc5
Show file tree
Hide file tree
Showing 26 changed files with 892 additions and 421 deletions.
41 changes: 19 additions & 22 deletions Documentation~/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ The recommended installation method is though the unity package manager and [Ope
- [com.utilities.extensions](https://github.com/RageAgainstThePixel/com.utilities.extensions)
- [com.utilities.audio](https://github.com/RageAgainstThePixel/com.utilities.audio)
- [com.utilities.encoder.ogg](https://github.com/RageAgainstThePixel/com.utilities.encoder.ogg)
- [com.utilities.encoder.wav](https://github.com/RageAgainstThePixel/com.utilities.encoder.wav)
- [com.utilities.rest](https://github.com/RageAgainstThePixel/com.utilities.rest)

---
Expand All @@ -59,7 +60,7 @@ The recommended installation method is though the unity package manager and [Ope
- [Text to Speech](#text-to-speech)
- [Stream Text To Speech](#stream-text-to-speech)
- [Voices](#voices)
- [Get Shared Voices](#get-shared-voices) :new:
- [Get Shared Voices](#get-shared-voices)
- [Get All Voices](#get-all-voices)
- [Get Default Voice Settings](#get-default-voice-settings)
- [Get Voice](#get-voice)
Expand All @@ -70,13 +71,13 @@ The recommended installation method is though the unity package manager and [Ope
- [Samples](#samples)
- [Download Voice Sample](#download-voice-sample)
- [Delete Voice Sample](#delete-voice-sample)
- [Dubbing](#dubbing) :new:
- [Dub](#dub) :new:
- [Get Dubbing Metadata](#get-dubbing-metadata) :new:
- [Get Transcript for Dub](#get-transcript-for-dub) :new:
- [Get dubbed file](#get-dubbed-file) :new:
- [Delete Dubbing Project](#delete-dubbing-project) :new:
- [SFX Generation](#sfx-generation) :new:
- [Dubbing](#dubbing)
- [Dub](#dub)
- [Get Dubbing Metadata](#get-dubbing-metadata)
- [Get Transcript for Dub](#get-transcript-for-dub)
- [Get dubbed file](#get-dubbed-file)
- [Delete Dubbing Project](#delete-dubbing-project)
- [SFX Generation](#sfx-generation)
- [History](#history)
- [Get History](#get-history)
- [Get History Item](#get-history-item)
Expand Down Expand Up @@ -265,8 +266,8 @@ Convert text to speech.
var api = new ElevenLabsClient();
var text = "The quick brown fox jumps over the lazy dog.";
var voice = (await api.VoicesEndpoint.GetAllVoicesAsync()).FirstOrDefault();
var defaultVoiceSettings = await api.VoicesEndpoint.GetDefaultVoiceSettingsAsync();
var voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(text, voice, defaultVoiceSettings);
var request = new TextToSpeechRequest(voice, text);
var voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(request);
audioSource.PlayOneShot(voiceClip.AudioClip);
```

Expand All @@ -284,18 +285,14 @@ Stream text to speech.
var api = new ElevenLabsClient();
var text = "The quick brown fox jumps over the lazy dog.";
var voice = (await api.VoicesEndpoint.GetAllVoicesAsync()).FirstOrDefault();
var partialClips = new Queue<AudioClip>();
var voiceClip = await api.TextToSpeechEndpoint.StreamTextToSpeechAsync(
text,
voice,
partialClip =>
{
// Note: Best to queue them and play them in update loop!
// See TextToSpeech sample demo for details
partialClips.Enqueue(partialClip);
});
// The full completed clip:
audioSource.clip = voiceClip.AudioClip;
var partialClips = new Queue<VoiceClip>();
var request = new TextToSpeechRequest(voice, message, model: Model.EnglishTurboV2, outputFormat: OutputFormat.PCM_44100);
var voiceClip = await api.TextToSpeechEndpoint.StreamTextToSpeechAsync(request, partialClip =>
{
// Note: check demo scene for best practices
// on how to handle playback with OnAudioFilterRead
partialClips.Enqueue(partialClip);
});
```

### [Voices](https://docs.elevenlabs.io/api-reference/voices)
Expand Down
6 changes: 3 additions & 3 deletions Editor/ElevenLabsDashboard.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1106,7 +1106,7 @@ private async void GenerateSynthesizedText()
Directory.CreateDirectory(downloadDir);
}

voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(speechSynthesisTextInput, currentVoiceOption, currentVoiceSettings, currentModelOption);
voiceClip = await api.TextToSpeechEndpoint.TextToSpeechAsync(new(currentVoiceOption, speechSynthesisTextInput, voiceSettings: currentVoiceSettings, model: currentModelOption));
voiceClip.CopyIntoProject(editorDownloadDirectory);
}
catch (Exception e)
Expand Down Expand Up @@ -1225,7 +1225,7 @@ private void RenderVoiceLab()
EditorGUILayout.Space(EndWidth);
EditorGUILayout.EndHorizontal();
EditorGUI.indentLevel++;

EditorGUILayout.BeginHorizontal();
{
EditorGUILayout.LabelField(voice.Id, EditorStyles.boldLabel);
Expand All @@ -1242,7 +1242,7 @@ private void RenderVoiceLab()
EditorGUILayout.Space(EndWidth);
EditorGUILayout.EndHorizontal();
EditorGUI.indentLevel++;

if (!voiceLabels.TryGetValue(voice.Id, out var cachedLabels))
{
cachedLabels = new Dictionary<string, string>();
Expand Down
11 changes: 11 additions & 0 deletions Runtime/Common/CacheFormat.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Licensed under the MIT License. See LICENSE in the project root for license information.

namespace ElevenLabs
{
public enum CacheFormat
{
None,
Ogg,
Wav
}
}

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

81 changes: 78 additions & 3 deletions Runtime/Common/GeneratedClip.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

using ElevenLabs.Extensions;
using System;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Scripting;
using Utilities.Audio;
using Utilities.WebRequestRest;

namespace ElevenLabs
{
Expand All @@ -12,16 +16,30 @@ namespace ElevenLabs
public class GeneratedClip : ISerializationCallbackReceiver
{
[Preserve]
internal GeneratedClip(string id, string text, AudioClip audioClip, string cachedPath)
internal GeneratedClip(string id, string text, AudioClip audioClip, string cachedPath = null)
{
this.id = id;
this.text = text;
TextHash = $"{id}{text}".GenerateGuid();
textHash = TextHash.ToString();
this.audioClip = audioClip;
this.cachedPath = cachedPath;
SampleRate = audioClip.frequency;
}

[Preserve]
internal GeneratedClip(string id, string text, ReadOnlyMemory<byte> clipData, int sampleRate, string cachedPath = null)
{
this.id = id;
this.text = text;
TextHash = $"{id}{text}".GenerateGuid();
textHash = TextHash.ToString();
this.cachedPath = cachedPath;
ClipData = clipData;
SampleRate = sampleRate;
}

private readonly ReadOnlyMemory<byte> audioData;

[SerializeField]
private string id;

Expand All @@ -44,16 +62,73 @@ internal GeneratedClip(string id, string text, AudioClip audioClip, string cache
private AudioClip audioClip;

[Preserve]
public AudioClip AudioClip => audioClip;
public AudioClip AudioClip
{
get
{
if (audioClip == null && !audioData.IsEmpty)
{
var pcmData = PCMEncoder.Decode(audioData.ToArray());
audioClip = AudioClip.Create(Id, pcmData.Length, 1, SampleRate, false);
audioClip.SetData(pcmData, 0);
}

if (audioClip == null)
{
Debug.LogError($"{nameof(audioClip)} is null, try loading it with LoadCachedAudioClipAsync");
}

return audioClip;
}
}

[SerializeField]
private string cachedPath;

[Preserve]
public string CachedPath => cachedPath;

public ReadOnlyMemory<byte> ClipData { get; }

private float[] clipSamples;

public float[] ClipSamples
{
get
{
if (!ClipData.IsEmpty)
{
clipSamples ??= PCMEncoder.Decode(ClipData.ToArray());
}
else if (audioClip != null)
{
clipSamples = new float[audioClip.samples];
audioClip.GetData(clipSamples, 0);
}

return clipSamples;
}
}

public int SampleRate { get; }

public void OnBeforeSerialize() => textHash = TextHash.ToString();

public void OnAfterDeserialize() => TextHash = Guid.Parse(textHash);

public static implicit operator AudioClip(GeneratedClip clip) => clip?.AudioClip;

public async Task<AudioClip> LoadCachedAudioClipAsync(CancellationToken cancellationToken = default)
{
var audioType = cachedPath switch
{
var path when path.EndsWith(".ogg") => AudioType.OGGVORBIS,
var path when path.EndsWith(".wav") => AudioType.WAV,
var path when path.EndsWith(".mp3") => AudioType.MPEG,
_ => AudioType.UNKNOWN
};

return await Rest.DownloadAudioClipAsync($"file://{cachedPath}", audioType, cancellationToken: cancellationToken);
}
}
}
24 changes: 0 additions & 24 deletions Runtime/Common/OutputFormatExtensions.cs

This file was deleted.

44 changes: 44 additions & 0 deletions Runtime/Common/TimestampedTranscriptCharacter.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Licensed under the MIT License. See LICENSE in the project root for license information.

using Newtonsoft.Json;
using UnityEngine.Scripting;

namespace ElevenLabs
{
/// <summary>
/// Represents timing information for a single character in the transcript
/// </summary>
[Preserve]
public class TimestampedTranscriptCharacter
{
[Preserve]
[JsonConstructor]
internal TimestampedTranscriptCharacter(string character, double startTime, double endTime)
{
Character = character;
StartTime = startTime;
EndTime = endTime;
}

/// <summary>
/// The character being spoken
/// </summary>
[Preserve]
[JsonProperty("character")]
public string Character { get; }

/// <summary>
/// The time in seconds when this character starts being spoken
/// </summary>
[Preserve]
[JsonProperty("character_start_times_seconds")]
public double StartTime { get; }

/// <summary>
/// The time in seconds when this character finishes being spoken
/// </summary>
[Preserve]
[JsonProperty("character_end_times_seconds")]
public double EndTime { get; }
}
}
11 changes: 11 additions & 0 deletions Runtime/Common/TimestampedTranscriptCharacter.cs.meta

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 11 additions & 1 deletion Runtime/Common/VoiceClip.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,26 @@ namespace ElevenLabs
public sealed class VoiceClip : GeneratedClip
{
[Preserve]
internal VoiceClip(string id, string text, Voice voice, AudioClip audioClip, string cachedPath)
internal VoiceClip(string id, string text, Voice voice, AudioClip audioClip, string cachedPath = null)
: base(id, text, audioClip, cachedPath)
{
this.voice = voice;
}

[Preserve]
internal VoiceClip(string id, string text, Voice voice, ReadOnlyMemory<byte> clipData, int sampleRate, string cachedPath = null)
: base(id, text, clipData, sampleRate, cachedPath)
{
this.voice = voice;
}

[SerializeField]
private Voice voice;

[Preserve]
public Voice Voice => voice;

[Preserve]
public TimestampedTranscriptCharacter[] TimestampedTranscriptCharacters { get; internal set; }
}
}
2 changes: 0 additions & 2 deletions Runtime/Dubbing/DubbingEndpoint.cs
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@
using System.Diagnostics;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;
using UnityEngine.Networking;
using Utilities.WebRequestRest;
using Debug = UnityEngine.Debug;

Expand Down
16 changes: 16 additions & 0 deletions Runtime/Extensions/Extensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
// Licensed under the MIT License. See LICENSE in the project root for license information.

namespace ElevenLabs.Extensions
{
public static class Extensions
{
public static int GetSampleRate(this OutputFormat format) => format switch
{
OutputFormat.PCM_16000 => 16000,
OutputFormat.PCM_22050 => 22050,
OutputFormat.PCM_24000 => 24000,
OutputFormat.PCM_44100 => 44100,
_ => 44100
};
}
}
11 changes: 11 additions & 0 deletions Runtime/Extensions/Extensions.cs.meta

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit fd14bc5

Please sign in to comment.