1
$\begingroup$

I am using FFTs to perform real-time convolution of audio signals on small embedded microcontrollers. I am thinking about testing different/smaller data types to represent the complex bins. I am particularly interested in the q15 fixed point type that would be supported by an int16. I know that the usual one scales the output of the IFFT by the size N.

  1. what effect does this have on the forward transform? Does it mean that my real input values will be scaled by N after the forward transform or only by N/2.

  2. the audio signals I am working with range from -1 to 1. is it possible to calculate the absolute maximum values that can be generated by a transform of size N?

  3. are the real and imaginary parts effected the same way?

$\endgroup$

1 Answer 1

3
$\begingroup$

Tricky.

When choosing quantization for any signal $x[t]$ your SNR (signal to noise ratio) for the quantization noise is determined by the crest factor of the signal which is the peak divided by the RMS value, i.e.

$$C_x = \frac{x_{peak}}{x_{rms}} \tag{1} $$

If you quantize with N bits at the clipping level, your quantization step, will be (assuming signed signals)

$$\Delta_q = \frac{x_{peak}}{2^{N-1}} \tag{2}$$

and the quantization noise in dB is

$$ L_q = 20\log_{10}\frac{\Delta_q}{\sqrt{12}} = 20\log_{10}\frac{x_{peak}}{2^{N-1}\sqrt{12}} \tag{3}$$

and you SNR becomes

$$ SNR = 20\log_{10} x_{rms} - L_q = 20\log_{10} 2^{N-1}\sqrt{12}\frac{x_{rms}}{x_{peak}} = 20\log_{10} \frac{2^{N-1}\sqrt{12}}{C_x}\tag{4}$$

So it's inversely proportional to the Crest factor. Most audio signals have a "moderate" Crest factor in the time domain, maybe 15 dB or so, which gives you an SNR of 86dB or thereabouts for 16-bit quantization. Not great, but not terrible either.

However the Crest factor in the frequency domain can be much higher. An extreme case is a full scale sine wave, which has a crest factor of roughy

$$C_{sine,frequency} = \sqrt{\frac{M_{FFT}}{2}} \tag{5}$$

Where $M_{FFT}$ is the FFT length. For an FFT length of 2048 and 16 bit quantization this comes out to be a whopping 30 dB, reduces your SNR to the 70 dB range.

You have to decide whether this is acceptable for your application or you have to deploy some dynamic stage scaling scheme in your FFT.

$\endgroup$
2
  • $\begingroup$ Thanks for the answer. I wasn't aware of the crest factor, very good to know. I'm still unsure about the values to expect after the forward transform. Does a transform of 2048 mean that I can expect values as high as 2048 in either real or imaginary part of the bins? $\endgroup$ Commented Jan 23, 2024 at 17:54
  • 1
    $\begingroup$ If you use "textbook" scaling convention the max would be 1024 for a 2048 point FFT. There are better scaling options though, a good starting point scaling both forward and backward transform by $1/\sqrt{M_{FFT}}$. A fixed point FFT with a fairly limited word width is quite complicated and requires a quite a bit of careful headroom management unless you can tolerate a lot of extra noise. $\endgroup$ Commented Jan 23, 2024 at 20:54

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.