Releases: KoljaB/RealtimeSTT
v0.2.2
- new parameter silero_deactivity_detection (bool, default=False)
Enables the Silero model for end-of-speech detection. More robust against background noise. Utilizes additional GPU resources but improves accuracy in noisy environments. When False, uses the default WebRTC VAD, which is more sensitive and may continue recording longer due to background sounds.
v0.2.1
- implements #85 (Currently on linux there is a CUDA initialization error caused by a multiple model loadings that the pytorch Multiprocessing library. Standard thread.Thread() works fine. This commit consolidates how threads are created to use one way or the other and defaults to thread.Thread() for Linux., shoutout to Daniel Williams providing this patch)
- upgrades to faster_whisper==1.0.3
- removed "match" keyword because it is only available from Python 3.10
v0.2.0
v0.2.0 with OpenWakeWord Support
Training models
Look here for information about how to train your own OpenWakeWord models. You can use a simple Google Colab notebook for a start or use a more detailed notebook that enables more customization (can produce high quality models, but requires more development experience).
Convert model to ONNX format
You might need to use tf2onnx to convert tensorflow tflite models to onnx format:
pip install -U tf2onnx
python -m tf2onnx.convert --tflite my_model_filename.tflite --output my_model_filename.onnx
Configure RealtimeSTT
Suggested starting parameters for OpenWakeWord usage:
with AudioToTextRecorder(
wakeword_backend="oww",
wake_words_sensitivity=0.35,
openwakeword_model_paths="word1.onnx,word2.onnx",
wake_word_buffer_duration=1,
) as recorder:
OpenWakeWord Test
-
Set up the openwakeword test project:
mkdir samantha_wake_word && cd samantha_wake_word curl -O https://raw.githubusercontent.com/KoljaB/RealtimeSTT/master/tests/openwakeword_test.py curl -L https://huggingface.co/KoljaB/SamanthaOpenwakeword/resolve/main/suh_mahn_thuh.onnx -o suh_mahn_thuh.onnx curl -L https://huggingface.co/KoljaB/SamanthaOpenwakeword/resolve/main/suh_man_tuh.onnx -o suh_man_tuh.onnx
Ensure you have
curl
installed for downloading files. If not, you can manually download the files from the provided URLs. -
Create and activate a virtual environment:
python -m venv venv
- For Windows:
venv\Scripts\activate
- For Unix-like systems (Linux/macOS):
source venv/bin/activate
- For macOS:
Usepython3
instead ofpython
andpip3
instead ofpip
if needed.
- For Windows:
-
Install dependencies:
python -m pip install --upgrade pip python -m pip install RealtimeSTT python -m pip install -U torch torchaudio --index-url https://download.pytorch.org/whl/cu121
The PyTorch installation command includes CUDA 12.1 support. Adjust if a different version is required.
-
Run the test script:
python openwakeword_test.py
On the very first start some models for openwakeword are downloaded.
v0.1.16
v0.1.15
- added parameter beam_size
(int, default=5)
The beam size to use for beam search decoding - added parameter beam_size_realtime
(int, default=3)
The beam size to use for real-time transcription beam search decoding. - added parameter initial_prompt
(str or iterable of int, default=None)
Initial prompt to be fed to the transcription models. - added parameter suppress_tokens
(list of int, default=[-1])
Tokens to be suppressed from the transcription output. - added method set_microphone(microphone_on=True)
This parameter allows dynamical switching between recording from the input device configured in RealtimeSTT and chunks injected into the processing pipeline with the feed_audio-method
v0.1.13
- added beam_size: int = 5 and beam_size_realtime: int = 3 parameters to AudioToTextRecorder constructor allowing faster (realtime) transcriptions by lowering the beamsizes
- added last_transcription_bytes containing the raw bytes from the last transcription
You can retrieve those bytes with recorder.last_transcription_bytes for further analysis, saving to file etc
v0.1.12
- fixed qsize issue for macOS
- upgrade requirements to torch 2.2.2