Skip to content

Commit 4124f27

Browse files
authored
Add GCP Text-to-Speech and Cloud Run (#1)
* add gcp tts interface * add TODOs to readme * add optional TTS and adjust autoplay * more updates * add precommit and flake8 * parameterize more * parameterize more * formatting * config update * add dockerfile * update readme * docker updates * more docker updates, now tested * more docker updates, now tested * add cloud run tested
1 parent 33fde8e commit 4124f27

8 files changed

+399
-35
lines changed

.flake8

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[flake8]
2+
# Some sane defaults for the code style checker flake8
3+
exclude =
4+
.tox
5+
build
6+
dist
7+
.eggs
8+
docs/conf.py
9+
max-line-length = 88
10+
ignore =
11+
# Whitespace before ':'
12+
E203
13+
# Whitepsace at end of line
14+
W291
15+
# Line break before logical
16+
W503
17+
# Missing f-string placeholders
18+
F541

.gitignore

+8
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,11 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
131+
# Other
132+
**/*.jpg
133+
**/*.jpeg
134+
**/*.mp3
135+
**/*.wav
136+
**/*.json
137+
**/*.txt

.pre-commit-config.yaml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
fail_fast: true
3+
repos:
4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v2.1.0
6+
hooks:
7+
- id: check-executables-have-shebangs
8+
- id: check-json
9+
- id: pretty-format-json
10+
args: ["--autofix"]
11+
- id: check-merge-conflict
12+
- id: debug-statements
13+
- id: detect-private-key
14+
- id: forbid-new-submodules
15+
- id: trailing-whitespace
16+
- id: requirements-txt-fixer
17+
- repo: https://github.com/adrienverge/yamllint
18+
rev: v1.14.0
19+
hooks:
20+
- id: yamllint
21+
args: ['-d {rules: {line-length: disable}}', '-s']
22+
- repo: https://github.com/ambv/black
23+
rev: 22.3.0
24+
hooks:
25+
- id: black
26+
language_version: python3
27+
- repo: https://github.com/pycqa/flake8
28+
rev: 3.8.3
29+
hooks:
30+
- id: flake8
31+
args: ["--config=.flake8"]

Dockerfile

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Use the official Python image as the parent image
2+
FROM python:3.8-slim-buster
3+
4+
# Set the working directory to /app
5+
WORKDIR /app
6+
7+
# Install FFmpeg and other system dependencies
8+
RUN apt-get update && apt-get install -y --no-install-recommends \
9+
ffmpeg \
10+
&& rm -rf /var/lib/apt/lists/*
11+
12+
# Install the required dependencies
13+
COPY requirements.txt ./
14+
RUN pip install --no-cache-dir -r requirements.txt
15+
16+
# Copy the necessary files to the Docker image
17+
COPY storyteller.py config.py ./
18+
19+
# Expose port 7860
20+
EXPOSE 7860
21+
22+
# Set the default command to execute the `storyteller.py` script
23+
CMD ["python", "storyteller.py"]

README.md

+95-9
Original file line numberDiff line numberDiff line change
@@ -2,21 +2,107 @@
22

33
---
44

5-
TODO: Document. Medium?
6-
Make note of pricing.
5+
This is a Gradio UI application that takes in a request for a story from the microphone
6+
and speaks an interactive Choose-Your-Own-Adventure style children's story. It leverages:
7+
8+
- [OpenAI Whisper](https://openai.com/research/whisper): to transcribe user audio input request
9+
- [OpenAI ChatGPT (3.5-turbo)](https://platform.openai.com/docs/models/gpt-3-5):
10+
to generate a story chapter given the user's inputs
11+
- (Optional) [Google Cloud Text-to-Speech](https://cloud.google.com/text-to-speech/):
12+
to use realistic voices when telling the story.
13+
14+
## Pricing
15+
16+
**WARNING: This application uses paid API services. Create quotas and watch your usage.**
17+
18+
At the time of writing, the pricing is as follows:
19+
20+
- [whisper](https://openai.com/pricing): $0.006 / minute (rounded to the nearest second)
21+
- [gpt-3.5-turbo](https://openai.com/pricing): $0.002 / 1K tokens
22+
- [Google Text-to-Speech](https://cloud.google.com/text-to-speech/pricing):
23+
- 0 to 1 million bytes free per month
24+
- $0.000016 USD per byte ($16.00 USD per 1 million bytes)
25+
26+
Check the links as these can change often. But at the time of writing it costs less
27+
than one USD for light use.
28+
29+
Both OpenAI and Google offer free credits for new users.
730

831
## Setup
932

10-
1. Get an OpenAI API key.
33+
Note there are two ways to speak the story: Mac or GCP Text-to-Speech. If using a Mac,
34+
the Mac `say` command is used and that's the easiest/fastest route to running this.
35+
It uses the System voice set up in the Accessibility settings.
36+
However, if not on a Mac or if you prefer a more realistic voice, the GCP Text-to-Speech may be used.
37+
This requires you having (a) a GCP project, (b) the TTS API enabled, and (c) your account authenticated
38+
in gcloud (or GOOGLE_APPLICATION_CREDENTIALS environment variable set).
39+
40+
This application has only been tested on a Macbook.
41+
42+
1. Sign up at OpenAI and acquire an [OpenAI API key](https://platform.openai.com/account/api-keys).
1143
1. Add to environment variable with: `export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxx`"
1244
1. Create virtual environment
13-
1. `pip install -r requirements.txt`
14-
1. Brew install `ffmpeg`: `brew install ffmpeg`
15-
1. Update config in `config.py` as desired
45+
1. Run `pip install -r requirements.txt`
46+
1. If on Mac, brew install `ffmpeg`: `brew install ffmpeg`
47+
48+
- Linux may need to install also but untested.
49+
50+
1. Review and update config in `config.py` as desired
51+
1. If using GCP TTS
52+
1. set in `config.py`: `SPEECH_METHOD = SpeechMethod.GCP`
53+
1. Navigate to the [Google API page](https://console.cloud.google.com/apis/api/texttospeech.googleapis.com/) and enable the API
54+
1. Confirm you are authenticated in gcloud and your account has access to that API.
1655
1. Run with: `python storyteller.py`
1756
1. Navigate to `http://127.0.0.1:7860/` and have fun!
1857

19-
## TODO
58+
## Running as Docker Container
59+
60+
Replace `<service-name>` with a name of your choice.
61+
62+
1. Build Docker image: `docker build -t <image-name> .`
63+
1. Run locally with something similar to:
64+
65+
```
66+
docker run -it --rm \
67+
-e GOOGLE_APPLICATION_CREDENTIALS=/tmp/creds.json \
68+
-v ${HOME}/.config/gcloud/application_default_credentials.json:/tmp/creds.json \
69+
-e OPENAI_API_KEY=<openai-api-key> \
70+
-p <port>:7860 \
71+
audio-storyteller \
72+
python storyteller.py \
73+
--address=0.0.0.0 \
74+
--port=7860 \
75+
--user=<username> \
76+
--password=<password>
77+
```
78+
79+
Fill in: `<openai-api-key>, <port>, and optional <username>:<password>.
80+
Then once running, navigate on a browser to `127.0.0.1:<port>` and fill in the
81+
optional username:password you provided.
82+
83+
## Deploying to Google Cloud Run
84+
85+
1. Follow the directions above to create a local docker image.
86+
1. Tag and push (Note: Follow [these directions](https://cloud.google.com/container-registry/docs/advanced-authentication) to authenticate)
87+
```
88+
docker tag <image-name> gcr.io/<project-id>/<image-name>
89+
docker push gcr.io/<project-id>/<image-name>
90+
```
91+
1. Create a service account on your GCP project IAM page named: `audio-storytelling-bot@<project-id>.iam.gserviceaccount.com`
92+
1. Deploy with the following command, setting anything in `<>` appropriately:
93+
94+
```
95+
gcloud run deploy audio-storytelling-bot \
96+
--image gcr.io/<project-id>/<image-name> \
97+
--platform managed \
98+
--service-account=audio-storytelling-bot@<project-id>.iam.gserviceaccount.com \
99+
--set-env-vars=OPENAI_API_KEY=<openai-key-string> \
100+
--no-allow-unauthenticated \
101+
--port=7860 \
102+
--cpu=1 \
103+
--memory=512Mi \
104+
--min-instances=0 \
105+
--max-instances=1
106+
```
20107

21-
- [ ] Fix the audio thread error that pops up
22-
- [ ] Document
108+
Cloud Run will automatically scale the number of instances based on the incoming traffic. You can access the deployed Gradio application via the URL provided by the Cloud Run service.

config.py

+38
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,41 @@
1+
from enum import Enum
12
import time
23

4+
"""
5+
Speech method
6+
None: No speech
7+
"gcp": Google Cloud Platform Text-to-Speech API
8+
"mac": Mac OS X say command
9+
10+
Note: For GCP, you must be authenticated with the gcloud CLI or set the
11+
GOOGLE_APPLICATION_CREDENTIALS environment variable
12+
"""
13+
14+
15+
# Define the class enum
16+
class SpeechMethod(Enum):
17+
NONE = 1
18+
GCP = 2
19+
MAC = 3
20+
21+
22+
# Set the method here
23+
SPEECH_METHOD = SpeechMethod.GCP
24+
25+
26+
"""
27+
Other configuration
28+
"""
329
RESOLUTION = "512x512" # One of 256x256, 512x512, 1024x1024
430
PROMPT_MAX_LEN = 1000 # Max length of prompt for DALL-E
531
IMAGE_PATH = "generated_image.jpg" # path to save generated image
632
TRANSCRIPT_PATH = f"transcript-{int(time.time())}.txt"
33+
GENERATED_SPEECH_PATH = "generated_speech.mp3"
34+
TTS_SPEECH_DELAY = 5.0 # seconds to wait before playing generated speech
35+
36+
# Voice for GCP Text-to-Speech API
37+
# Samples: https://cloud.google.com/text-to-speech/docs/voices
38+
TTS_VOICE = "en-GB-Neural2-C"
739

840
"""
941
Example Prompts
@@ -33,3 +65,9 @@
3365
of each chapter first pause for a moment. Then ask the reader a single question
3466
that chooses the path for their next chapter in their story.
3567
"""
68+
69+
"""
70+
DERIVED CONFIG
71+
"""
72+
# Derive only xx-xx from TTS_VOICE
73+
TTS_VOICE_LANGUAGE_CODE = "-".join(TTS_VOICE.split("-")[0:2])

requirements.txt

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
1+
black
2+
flake8
3+
google-cloud-texttospeech
4+
gradio
15
openai
2-
gradio
6+
pre-commit

0 commit comments

Comments
 (0)