Newest 'speech-to-text' Questions

4 votes

2 answers

82 views

Validatioin loss zigzagging

I'm training a speech recognition model using the Nvidia Nemo framework. Just results with the small fastconformer model and two dozen iterations are pretty good; for my data I would say they are ...

comodoro

143

asked Feb 8 at 11:58

1 vote

1 answer

2k views

How does OpenAI Whisper's medium.en, large and whisper-large-v2 compare in terms of word error rate?

I want to use OpenAI's Whisper to transcribe some speech files in English. I only care about minimize the word error rate. How do medium.en, ...

Franck Dernoncourt

5,882

asked Oct 8, 2023 at 21:03

0 votes

1 answer

158 views

Are there any pre-trained non english model of deepspeech?

I want to try deepspeech model. I founded only english pre-trained model Are there any other pre-trained not english model of ...

user3668129

829

asked Mar 4, 2023 at 7:19

1 vote

0 answers

38 views

Procedure or term for analyzing transcribed text and returning bulleted output

I am attempting to analyze transcribed text from an audio file to group bullet points based on known key phrases in the text. Example: I have verbally stated the following keywords in the text, which ...

Ryan Watts

111

asked Aug 3, 2022 at 6:44

1 vote

1 answer

140 views

Evaluate Text-to-speech without Human Involved?

I've explored text-to-speech evaluation matrices and they seem to used Mean Opinion Score (MOS) to evaluate a particular model. This matrice required humans to help to judge the model based on a scale ...

Nontawat Wutticome

35

asked Oct 4, 2021 at 5:04

1 vote

0 answers

60 views

How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?

I have a personal dataset of 10000 audio files, each consisting a single spoken sentence. These files each have the transcribed text labels with them that I can use for supervised HMM training. Now ...

Zander

11

asked May 2, 2021 at 1:55

2 votes

2 answers

414 views

How to evaluate the quality of speech-to-text data without access to the true labels?

I am dealing with a data set of transcribed call center data, where customers are being recorded when interacting with the agent. This is then automatically transcribed by an external transcription ...

miri_h_ds

21

asked Jan 24, 2021 at 1:12

2 votes

1 answer

174 views

How is an ASR's output compared to ground truth for validation?

I am curious how it is done as I am interested in doing something similar. I have some manually transcribed data that contains tags for multiple speakers. I want to compare how well the out of the box ...

Samarth

359

asked Oct 20, 2020 at 22:16

Stack Exchange Network

Questions tagged [speech-to-text]

Validatioin loss zigzagging

How does OpenAI Whisper's medium.en, large and whisper-large-v2 compare in terms of word error rate?

Are there any pre-trained non english model of deepspeech?

Procedure or term for analyzing transcribed text and returning bulleted output

Evaluate Text-to-speech without Human Involved?

How do I initialize a Hidden Markov Model when using MFCC features for speech recognition?

How to evaluate the quality of speech-to-text data without access to the true labels?

How is an ASR's output compared to ground truth for validation?

Hot Network Questions