Questions tagged [speech-to-text]
The speech-to-text tag has no summary.
40 questions
4 votes
2 answers
82 views
Validatioin loss zigzagging
I'm training a speech recognition model using the Nvidia Nemo framework. Just results with the small fastconformer model and two dozen iterations are pretty good; for my data I would say they are ...
1 vote
1 answer
2k views
How does OpenAI Whisper's medium.en, large and whisper-large-v2 compare in terms of word error rate?
I want to use OpenAI's Whisper to transcribe some speech files in English. I only care about minimize the word error rate. How do medium.en, ...
0 votes
1 answer
158 views
Are there any pre-trained non english model of deepspeech?
I want to try deepspeech model. I founded only english pre-trained model Are there any other pre-trained not english model of ...
1 vote
0 answers
38 views
Procedure or term for analyzing transcribed text and returning bulleted output
I am attempting to analyze transcribed text from an audio file to group bullet points based on known key phrases in the text. Example: I have verbally stated the following keywords in the text, which ...
1 vote
1 answer
140 views
Evaluate Text-to-speech without Human Involved?
I've explored text-to-speech evaluation matrices and they seem to used Mean Opinion Score (MOS) to evaluate a particular model. This matrice required humans to help to judge the model based on a scale ...
1 vote
0 answers
60 views
How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?
I have a personal dataset of 10000 audio files, each consisting a single spoken sentence. These files each have the transcribed text labels with them that I can use for supervised HMM training. Now ...
2 votes
2 answers
414 views
How to evaluate the quality of speech-to-text data without access to the true labels?
I am dealing with a data set of transcribed call center data, where customers are being recorded when interacting with the agent. This is then automatically transcribed by an external transcription ...
2 votes
1 answer
174 views
How is an ASR's output compared to ground truth for validation?
I am curious how it is done as I am interested in doing something similar. I have some manually transcribed data that contains tags for multiple speakers. I want to compare how well the out of the box ...