Skip to content

Conversation

richiejp
Copy link
Collaborator

@richiejp richiejp commented Aug 18, 2025

Description

Converts the Whisper backend to use Purego similar to stablediffusion. Also adds some features which are not in the upstream CGO bindings.

Notes for Reviewers

  • We could upstream the Purego bindings, but I'm not sure what that would look like, so will just try it here first.
  • Initially I've added just a new VAD backend for testing, then will convert the rest.

Signed commits

  • Yes, I signed my commits.

TODO:

  • fix VAD end time (speech segments are detected, but RT API is not submitting for transcription after period of silence, possibly time units on segments are wrong)
  • convert rest of whisper backend to purego
  • fix transcription failed bug
  • use transcriptions in-built VAD mode

Copy link

netlify bot commented Aug 18, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit f740b48
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68ad79dbf559d1000855c37e
😎 Deploy Preview https://deploy-preview-6087--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@richiejp richiejp force-pushed the chore/whisper-purego branch from 206a71d to 0345dfb Compare August 22, 2025 09:55
@richiejp
Copy link
Collaborator Author

ah now I realise that the VAD model can be combined with the transcribe model. So we can just call transcribe and it does VAD first and short circuits if no speech is detected. This changes a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant