Questions tagged [speech-processing]
Speech processing is the study of speech signals and the processing methods of these signals.
281 questions
1 vote
0 answers
37 views
Implementing computation of complex cepstrum from scratch
So I have been trying to implement the complex cepstrum from scratch as part of learning in Speech signal processing course. The hidden catch is the removal of linear phase term, which my professor ...
0 votes
1 answer
117 views
Cutoff frequency of 1st-order high-pass pre-emphasis filter?
Consider a 1st order high-pass filter like this: $$y[n] = x[n] - \alpha x[n-1]$$ I found that in Praat's manual the relationship of its cutoff frequency $f_c$ and $\alpha$ is illustrated as: $$\alpha=...
0 votes
0 answers
36 views
Speech recognition. Building word-level HMM from phone-level HMMs. Transtion matrix
I am implementing my HMM-GMM speech recognition model. Right now I am facing a problem described below. Given phone-level HMMs A and B, build word-level HMM C. In this questions lets assume that ...
0 votes
0 answers
33 views
How to Extract Pitch Bandwidth?
Using Praat to extract the bandwidth of formants, I noticed there is no option to extract the bandwidth of the pitch. Since the F0 values do not match the pitch values, I cannot apply the same method ...
0 votes
2 answers
74 views
Interpreting spectrum from waveforms from simple and complex examples
I am reading Jurafsky and Martin's Speech and Language Processing Chapter 28 on Phonetics (pages 15,16) and they introduce waveforms and spectrum. What I don't understand is how they came from a ...
0 votes
1 answer
414 views
How do I extract a part of an audio clip whose start and end times are given into a .wav file?
I have data on which I have performed Voice Activity Detection (VAD) and this returns a file containing columns of data in the following order : Segment Id, Audio file name, Start time, End time. For ...
3 votes
1 answer
109 views
Deriving the posterior distribution parameters of a normal distribution in the context of dereverberation
Context: $\bar{\Theta}$ is the room regression filter coefficients (RRC); $$X_{t} = \bar{\Theta}^{H}\bar{X}_{t-1} + s_{t}$$ means in words: the filter that defines how the room causes reverberation to ...
2 votes
0 answers
89 views
A filter to remove f0 and lower harmonics from the signal
I'm studying the perception of vowel formants (resonances of the vocal tract) and need to create stimuli where the signal below the first (lowest) formant is removed. I have some synthesised vowels ...
1 vote
1 answer
59 views
Two level Dynamic Programming
I am going through Fundamentals of Speech Recognition (Rabiner). I stumble upon the concept of Two Level Dynamic Programming . Can you suggest me any online resources to study the same?
2 votes
1 answer
129 views
How does Pre-emphasis mitigate finite precision effects?
"The digitised speech signal $s(n)$ is put through a low order digital system (usually first order FIR filter) to spectrally flatten the signal and make it less susceptible to finite precision ...
1 vote
1 answer
73 views
Constraints in Dynamic Time Warping for Speech
The Dynamic time warping is applied for time normalization. As shown in the diagram, two different signals with $Tx$ and $Ty$ time instants, are time-normalized to have $T$ time instants. $\phi$ is ...
0 votes
0 answers
64 views
Understanding Baum's auxiliary function used in Hidden Markov Model
I am studying HMM from "Fundamentals of Speech Recognition" by Rabiner. Regarding the problem of how to adjust the parameters of a HMM, the proposed method was Baum Welch method (Expectation-...
1 vote
0 answers
390 views
Advantage and Disadvantage using Mel Spectrograms over STFT in speech/waveform synthesis
I want to ask you a question about the waveform synthesis or more spesifically speech synthesis. Most of the state-of-the art papers use mel-spectrograms as their inputs, because it mimics the human ...
1 vote
0 answers
276 views
In "Mel spectrogram" or "Mel filterbanks", what does Mel mean and why is it often capitalized? [closed]
In "Mel spectrogram" or "Mel filterbanks", what does Mel mean and why is it capitalized ? It doesn't seem to be the name of a person.
0 votes
0 answers
152 views
How do I eliminate background noise?
After I use deep learning algorithm to enhance the speech, the speech will still have a weak background noise.The background noise of this audio has little effect on the calculation of SNR, but it ...