Skip to main content

Questions tagged [speech-processing]

Speech processing is the study of speech signals and the processing methods of these signals.

1 vote
0 answers
37 views

So I have been trying to implement the complex cepstrum from scratch as part of learning in Speech signal processing course. The hidden catch is the removal of linear phase term, which my professor ...
Nishchala Mukku ee24s004's user avatar
0 votes
1 answer
117 views

Consider a 1st order high-pass filter like this: $$y[n] = x[n] - \alpha x[n-1]$$ I found that in Praat's manual the relationship of its cutoff frequency $f_c$ and $\alpha$ is illustrated as: $$\alpha=...
C.K.'s user avatar
  • 103
0 votes
0 answers
36 views

I am implementing my HMM-GMM speech recognition model. Right now I am facing a problem described below. Given phone-level HMMs A and B, build word-level HMM C. In this questions lets assume that ...
ASR's user avatar
  • 1
0 votes
0 answers
33 views

Using Praat to extract the bandwidth of formants, I noticed there is no option to extract the bandwidth of the pitch. Since the F0 values do not match the pitch values, I cannot apply the same method ...
איתי עשהאל's user avatar
0 votes
2 answers
74 views

I am reading Jurafsky and Martin's Speech and Language Processing Chapter 28 on Phonetics (pages 15,16) and they introduce waveforms and spectrum. What I don't understand is how they came from a ...
heretoinfinity's user avatar
0 votes
1 answer
414 views

I have data on which I have performed Voice Activity Detection (VAD) and this returns a file containing columns of data in the following order : Segment Id, Audio file name, Start time, End time. For ...
zero_day's user avatar
3 votes
1 answer
109 views

Context: $\bar{\Theta}$ is the room regression filter coefficients (RRC); $$X_{t} = \bar{\Theta}^{H}\bar{X}_{t-1} + s_{t}$$ means in words: the filter that defines how the room causes reverberation to ...
user3371266's user avatar
2 votes
0 answers
89 views

I'm studying the perception of vowel formants (resonances of the vocal tract) and need to create stimuli where the signal below the first (lowest) formant is removed. I have some synthesised vowels ...
Renata Koch's user avatar
1 vote
1 answer
59 views

I am going through Fundamentals of Speech Recognition (Rabiner). I stumble upon the concept of Two Level Dynamic Programming . Can you suggest me any online resources to study the same?
Anantha Krishnan's user avatar
2 votes
1 answer
129 views

"The digitised speech signal $s(n)$ is put through a low order digital system (usually first order FIR filter) to spectrally flatten the signal and make it less susceptible to finite precision ...
Anantha Krishnan's user avatar
1 vote
1 answer
73 views

The Dynamic time warping is applied for time normalization. As shown in the diagram, two different signals with $Tx$ and $Ty$ time instants, are time-normalized to have $T$ time instants. $\phi$ is ...
Anantha Krishnan's user avatar
0 votes
0 answers
64 views

I am studying HMM from "Fundamentals of Speech Recognition" by Rabiner. Regarding the problem of how to adjust the parameters of a HMM, the proposed method was Baum Welch method (Expectation-...
Anantha Krishnan's user avatar
1 vote
0 answers
390 views

I want to ask you a question about the waveform synthesis or more spesifically speech synthesis. Most of the state-of-the art papers use mel-spectrograms as their inputs, because it mimics the human ...
Yalçın Cenik's user avatar
1 vote
0 answers
276 views

In "Mel spectrogram" or "Mel filterbanks", what does Mel mean and why is it capitalized ? It doesn't seem to be the name of a person.
f10w's user avatar
  • 111
0 votes
0 answers
152 views

After I use deep learning algorithm to enhance the speech, the speech will still have a weak background noise.The background noise of this audio has little effect on the calculation of SNR, but it ...
Killuaisaack's user avatar

15 30 50 per page
1
2 3 4 5
19