|
2 | 2 |
|
3 | 3 | ---
|
4 | 4 |
|
5 |
| -TODO: Document. Medium? |
6 |
| -Make note of pricing. |
| 5 | +This is a Gradio UI application that takes in a request for a story from the microphone |
| 6 | +and speaks an interactive Choose-Your-Own-Adventure style children's story. It leverages: |
| 7 | + |
| 8 | +- [OpenAI Whisper](https://openai.com/research/whisper): to transcribe user audio input request |
| 9 | +- [OpenAI ChatGPT (3.5-turbo)](https://platform.openai.com/docs/models/gpt-3-5): |
| 10 | + to generate a story chapter given the user's inputs |
| 11 | +- (Optional) [Google Cloud Text-to-Speech](https://cloud.google.com/text-to-speech/): |
| 12 | + to use realistic voices when telling the story. |
| 13 | + |
| 14 | +## Pricing |
| 15 | + |
| 16 | +**WARNING: This application uses paid API services. Create quotas and watch your usage.** |
| 17 | + |
| 18 | +At the time of writing, the pricing is as follows: |
| 19 | + |
| 20 | +- [whisper](https://openai.com/pricing): $0.006 / minute (rounded to the nearest second) |
| 21 | +- [gpt-3.5-turbo](https://openai.com/pricing): $0.002 / 1K tokens |
| 22 | +- [Google Text-to-Speech](https://cloud.google.com/text-to-speech/pricing): |
| 23 | + - 0 to 1 million bytes free per month |
| 24 | + - $0.000016 USD per byte ($16.00 USD per 1 million bytes) |
| 25 | + |
| 26 | +Check the links as these can change often. But at the time of writing it costs less |
| 27 | +than one USD for light use. |
| 28 | + |
| 29 | +Both OpenAI and Google offer free credits for new users. |
7 | 30 |
|
8 | 31 | ## Setup
|
9 | 32 |
|
10 |
| -1. Get an OpenAI API key. |
| 33 | +Note there are two ways to speak the story: Mac or GCP Text-to-Speech. If using a Mac, |
| 34 | +the Mac `say` command is used and that's the easiest/fastest route to running this. |
| 35 | +It uses the System voice set up in the Accessibility settings. |
| 36 | +However, if not on a Mac or if you prefer a more realistic voice, the GCP Text-to-Speech may be used. |
| 37 | +This requires you having (a) a GCP project, (b) the TTS API enabled, and (c) your account authenticated |
| 38 | +in gcloud (or GOOGLE_APPLICATION_CREDENTIALS environment variable set). |
| 39 | + |
| 40 | +This application has only been tested on a Macbook. |
| 41 | + |
| 42 | +1. Sign up at OpenAI and acquire an [OpenAI API key](https://platform.openai.com/account/api-keys). |
11 | 43 | 1. Add to environment variable with: `export OPENAI_API_KEY="sk-xxxxxxxxxxxxxxx`"
|
12 | 44 | 1. Create virtual environment
|
13 |
| -1. `pip install -r requirements.txt` |
14 |
| -1. Brew install `ffmpeg`: `brew install ffmpeg` |
15 |
| -1. Update config in `config.py` as desired |
| 45 | +1. Run `pip install -r requirements.txt` |
| 46 | +1. If on Mac, brew install `ffmpeg`: `brew install ffmpeg` |
| 47 | + |
| 48 | +- Linux may need to install also but untested. |
| 49 | + |
| 50 | +1. Review and update config in `config.py` as desired |
| 51 | +1. If using GCP TTS |
| 52 | +1. set in `config.py`: `SPEECH_METHOD = SpeechMethod.GCP` |
| 53 | +1. Navigate to the [Google API page](https://console.cloud.google.com/apis/api/texttospeech.googleapis.com/) and enable the API |
| 54 | +1. Confirm you are authenticated in gcloud and your account has access to that API. |
16 | 55 | 1. Run with: `python storyteller.py`
|
17 | 56 | 1. Navigate to `http://127.0.0.1:7860/` and have fun!
|
18 | 57 |
|
19 |
| -## TODO |
| 58 | +## Running as Docker Container |
| 59 | + |
| 60 | +Replace `<service-name>` with a name of your choice. |
| 61 | + |
| 62 | +1. Build Docker image: `docker build -t <image-name> .` |
| 63 | +1. Run locally with something similar to: |
| 64 | + |
| 65 | +``` |
| 66 | +docker run -it --rm \ |
| 67 | + -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/creds.json \ |
| 68 | + -v ${HOME}/.config/gcloud/application_default_credentials.json:/tmp/creds.json \ |
| 69 | + -e OPENAI_API_KEY=<openai-api-key> \ |
| 70 | + -p <port>:7860 \ |
| 71 | + audio-storyteller \ |
| 72 | + python storyteller.py \ |
| 73 | + --address=0.0.0.0 \ |
| 74 | + --port=7860 \ |
| 75 | + --user=<username> \ |
| 76 | + --password=<password> |
| 77 | +``` |
| 78 | + |
| 79 | +Fill in: `<openai-api-key>, <port>, and optional <username>:<password>. |
| 80 | +Then once running, navigate on a browser to `127.0.0.1:<port>` and fill in the |
| 81 | +optional username:password you provided. |
| 82 | + |
| 83 | +## Deploying to Google Cloud Run |
| 84 | + |
| 85 | +1. Follow the directions above to create a local docker image. |
| 86 | +1. Tag and push (Note: Follow [these directions](https://cloud.google.com/container-registry/docs/advanced-authentication) to authenticate) |
| 87 | + ``` |
| 88 | + docker tag <image-name> gcr.io/<project-id>/<image-name> |
| 89 | + docker push gcr.io/<project-id>/<image-name> |
| 90 | + ``` |
| 91 | +1. Create a service account on your GCP project IAM page named: `audio-storytelling-bot@<project-id>.iam.gserviceaccount.com` |
| 92 | +1. Deploy with the following command, setting anything in `<>` appropriately: |
| 93 | + |
| 94 | + ``` |
| 95 | + gcloud run deploy audio-storytelling-bot \ |
| 96 | + --image gcr.io/<project-id>/<image-name> \ |
| 97 | + --platform managed \ |
| 98 | + --service-account=audio-storytelling-bot@<project-id>.iam.gserviceaccount.com \ |
| 99 | + --set-env-vars=OPENAI_API_KEY=<openai-key-string> \ |
| 100 | + --no-allow-unauthenticated \ |
| 101 | + --port=7860 \ |
| 102 | + --cpu=1 \ |
| 103 | + --memory=512Mi \ |
| 104 | + --min-instances=0 \ |
| 105 | + --max-instances=1 |
| 106 | + ``` |
20 | 107 |
|
21 |
| -- [ ] Fix the audio thread error that pops up |
22 |
| -- [ ] Document |
| 108 | +Cloud Run will automatically scale the number of instances based on the incoming traffic. You can access the deployed Gradio application via the URL provided by the Cloud Run service. |
0 commit comments