1
$\begingroup$

I am currently looking at the STFT for Librosa. I was wondering how to understand the output of the STFT function, specifically what kind of frequencies the different values represent.

Say I have a n_fft of 256, that means that the shape output of the STFT will be 129 (1+n_fft/2). I therefore understand that for each frame I have 129 bins, these bins represent some sort of frequency within this frame. I am assuming that the bins start with a frequency of 0hz and go up to some value, with each bin representing an equally large range.

But how do I figure out what the maximum frequency used for STFT is? And what about the range of each of the 129 bins?

Just to be clear, I have tried looking at the source-code and at the documentation, but I have not really become much wiser.

$\endgroup$
2
  • $\begingroup$ have a look on this question: FFT resolution $\endgroup$ Commented May 8, 2019 at 8:53
  • $\begingroup$ So if my sampling frequency is 16000, and I have 129 bins, then the resolution is ~124, meaning that each of the bins has a frequency range of 124? Is that a correct way of looking at it? $\endgroup$ Commented May 10, 2019 at 9:47

1 Answer 1

1
$\begingroup$

Having a look into the sources of librosa.display.specshow reveals how bins are converted into frequencies internally: The plotting uses librosa.core.fft_frequencies, which shows that it is basically the same as numpy.fft.fftfreq:

>>> librosa.fft_frequencies(sr=22050, n_fft=16) array([ 0. , 1378.125, 2756.25 , 4134.375, 5512.5 , 6890.625, 8268.75 , 9646.875, 11025. ]) 
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.