You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I'm currently working on transcribing group discussion data for a research project. For our use-case stuff like "uhm" and "hmm" and stutters are quite important. I previously worked with whisperx and achieved quite nice results using prompting. However, bc of hardware constraints I switched to mlx-whisper to get that sweet gpu speed.
I started with the large-v3-turbo german model from the mlx-community, but the results were quite underwhelming - the transcription quality was poor and no prompting could result in any "uhm". So then I found Crisper Whisper which was trained for exactly my purposes. The conversion worked like a charm and the transcript is much better, but still no "uhm"s and weirdly while the resulting words are correct, they have weird spaces inbetween (like "bir thday" or "h ighlight"). In the directory there are also tokenizer jsons, so I thought that might be the problem? But the tokenization in mlx-whisper.transcribe works with tiktoken and I have no idea how to implement that the crisper whisper way.
Any advise would be much appreciated!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I'm currently working on transcribing group discussion data for a research project. For our use-case stuff like "uhm" and "hmm" and stutters are quite important. I previously worked with whisperx and achieved quite nice results using prompting. However, bc of hardware constraints I switched to mlx-whisper to get that sweet gpu speed.
I started with the large-v3-turbo german model from the mlx-community, but the results were quite underwhelming - the transcription quality was poor and no prompting could result in any "uhm". So then I found Crisper Whisper which was trained for exactly my purposes. The conversion worked like a charm and the transcript is much better, but still no "uhm"s and weirdly while the resulting words are correct, they have weird spaces inbetween (like "bir thday" or "h ighlight"). In the directory there are also tokenizer jsons, so I thought that might be the problem? But the tokenization in mlx-whisper.transcribe works with tiktoken and I have no idea how to implement that the crisper whisper way.
Any advise would be much appreciated!
Beta Was this translation helpful? Give feedback.
All reactions