Inter-filter bit width

Question

I'm designing a $2$-stage filter and was simulating the effect of quantization to a certain number of bits at the filter output. The result I got was far from expected - and would appreciate any intuition from the experts here.

The high-level idea is:

Filter input data is $N$ bits.
$\rm Stage_1$ is a half-band interpolation by $2$.
Internally the filter accuracy is maintained (to approx $30$ bits), but the output is rounded to $M$ bits.
The $2^{\textrm{nd}}$ stage takes the $M$ bits in, filters it at full accuracy (again a half-band interpolation by $2$ filter, with full precision internally)
The output of the filter is rounded to $P$ bits.

Initially I set $N=M=P$ to be $16$-bits. The filters themselves have a stop-band attenuation of $\approx90\textrm{ dB}$. What surprised me was $16$-bits between filters was not enough - at least in terms of the frequency response.

So if I do not quantize the inter-filter values I of course get ideal behaviour and the overall frequency response (measured via an FFT of an impulse response) is the combination of each individual frequency response - and my stop-band attenuation is $\approx90\textrm{ dB}$. Of course the input bit-width does not matter here.

However, if I quantize to $16$-bits between stages, with rounding, I would have thought this more than sufficient to preserve the filter frequency response accuracy - but I was wrong - the stop-band attenuation dropped to $\approx84\textrm{ dB}$ in places. It required $18$-bits inter-filter bit-width in order to maintain performance.

So the question is - is there a rule of thumb for how many bits to preserve. Why is $16$-bits not enough for $\approx90\textrm{ dB}$ attenuation?

Looking at the question I cannot see if it's a floating point or fixed point or an integer implementation. I think it matters in defining the dynamic range... — Fat32
– Fat32, Commented Jun 16, 2016 at 21:37
This is a fixed-point implementation. I used floating point only to check the delta between the ideal case and the fixed-point case. — John McGrath
– John McGrath, Commented Jun 16, 2016 at 22:03
ok. I suspect if it's $20 \log_{10}(2^{15}) = 90.30$, very roughly, is the reason why you'll have difficulty in representing a 90 dB dynamic range under 16-Bit uniform quantization. For nonuniform quantization (FPU) things can change. — Fat32
– Fat32, Commented Jun 16, 2016 at 22:25
Does the output from the first stage look alright after re-quantizing to 16-bit? It could be a quantization artifact, so depending on how you measure, it could include non-linear effects. Have you tried dithering when re-quantizing? — Arnfinn
– Arnfinn, Commented Jun 17, 2016 at 0:04
In general as a rule of thumb, to maintain a 90 dB signal path SNR, the number of bits in the signal path must be 15 bits ((90-1.76)/6 if a sine wave, more if the peak-avg ratio in your signal is higher), so 16 bits sounds right. The coefficients should be 2 bits larger to ensure the filter rejection will exceed the 90dB (so 18 bits), and then all taps are accumulated with extended precision (at least $log_2(K)$ for a K-tap FIR. The final accumulated result can be scaled back to 16 bits, and you should have no problem with achieving your rejection and SNR requirements. — Dan Boschen
– Dan Boschen, Commented Jun 17, 2016 at 19:46

Dan Boschen · Accepted Answer · 2016-06-18 21:07:24Z

A source of dynamic range issues can be quantization leading to noise increase in your summation if in the datapath or errors in the frequency response if the coefficients themselves have too much rounding error. Quantization noise for an $K$-tap FIR filter can increase as much as $10 \log_{10}K$ in dB when you scale the signal at the input of the filter. This is the reason for extended precision accumulators; good practice is to let the filter grow the signal and then attenuate after the filter. Also equally important is that your coefficients should be (as a rule-of-thumb) at least $DR/6 + 2$ bits in precision where $DR$ is the stop-band rejection requirement of your filter.

To the extent that each signal at each tap in your filter can be considered uncorrelated (and this holds if the signal is white, however your first filter will provide some correlation between delay taps reducing the total increase in the noise).

See the figures pasted below that compare the three ways to scale the signal.

To explain these figures, this is the case of a filter with a fixed precision datapath, and a case where due to the input level of the signal, the filter would overflow (since the signal within the bandwidth will increase by a factor of $K$ in magnitude). To keep the filter from overflowing, the filter designer can scale the input, or scale the coefficients, or (the right answer) is allow the filter to grow in precision, and then scale the output. In the first two cases, the noise would be the quantization noise due to the scaling, and grow in magnitude by the $\sqrt{K}$ (stated differently, variance grows as $K$, standard deviation grows by $\sqrt{K}$), in the last case the noise is just the quantization noise due to the scaling (6.02 dB/bit + 1.76dB for a full scale sine wave) without further growth.

(Note $\Delta^2/12$ denotes the variance due to quantization noise, where $\Delta$ is one LSB step size)

Yeah not too hard as they are screen captures from old power point slides I have... it is confusing otherwise so glad you pointed that out. Otherwise I hope the explanation is clear enough. — Dan Boschen
– Dan Boschen, Commented Jun 17, 2016 at 19:44
well, i can't tell if the OP's filters are FIR or IIR. also i never heard of the "rule-of-thumb" that the coefficients should be 2 bits wider than the signal data path. they're not directly related. the width of the data words has to do with quantization noise (as you are illustrating) but the width of the coefficient words is related to how precisely the placement of the poles and zeros of the filter and how closely the filter matches its target frequency response. coefficient quantization is particularly a problem for the stopband. — robert bristow-johnson
– robert bristow-johnson, Commented Jun 17, 2016 at 19:49
Hi Robert, I developed that rule of thumb myself for the case of an FIR filter in practice after reviewing what stobband rejection can be achieved for several cases of rounding. You can also see this by reviewing the formula for the passband response of a filter as a series of spinning phasors each scaled by the filter coefficients, with increasing rate. (For example, the frequency response for a 3 tap FIR is $H(\omega) = c_1+c_2e^{-j\omega}+c_3e^{-j2\omega} $). From this you may better be able to see the impact of coefficient rounding on the frequency response. — Dan Boschen
– Dan Boschen, Commented Jun 17, 2016 at 20:00
i just use MATLAB or Octave to plot responses. but if you have 32-bit data paths (because you want to keep the quantization noise to a minimum), that does not mean you need 34-bit coefficients. and it goes the other way. one might be satisfied with 16-bit data paths, but need coefficients that are wider than 18 bits. — robert bristow-johnson
– robert bristow-johnson, Commented Jun 17, 2016 at 20:05

John McGrath · Accepted Answer · 2016-06-18 14:44:21Z

I have observed an interesting result - which I think highlights perhaps a weakness in using an impulse response through the quantized data-path to measure the overall performance. In all cases tests below my coefficients are the same, and are always quantized. The results are:

Using no datapath quantization - I get the overall ideal filter-chain response when doing an fft of the impulse response.
As in my initial post - when I quantized the data-path to 16-bits at the output of each filter stage, and re-doing the impulse response, the fft of this showed a degraded result - which led to the original post
I ran another test - this time using a sweep of individual coherent sine waves and doing an FFT of the filtered result. Running this through the same quantized data-path as in 2) shows the true frequency response (albeit in a non-mathematical way) - and I was surprised to see that the stop-band profile matched the ideal result and not the degraded stop-band that produced via 2)

So the good news is it appears 16-bit between filter stages is good enough to ensure the performance I need. The interesting thing appears to be the fact that the impulse response test does not correctly show this.

I'd be interested to hear if others have seen something like this, and could suggest an alternative method to measure the true frequency response of a quantized filter chain.

You wrote that you are finding the frequency response by doing an FFT- The continuous frequency response for a digital FIR is determined from the DTFT, $H(\omega)=\sum_{n=0}^{N-1}c_ne^{jn\omega}$ as opposed to the DFT (which the FFT implements). The DTFT is a continuous function in frequency while the FFT is discrete. The two will be identical at the FFT samples points for the FFT that is the same length as your filter, but you are likely interpolating by some method to get the additional samples which would cause response errors if not done as a DTFT. — Dan Boschen
– Dan Boschen, Commented Jun 18, 2016 at 21:11
To see a response that matches your tone testing, either use the freqz command in Matlab/Octave (freqz([coeff]), or zero pad your FIR filter coefficients prior to taking the FFT. Zero padding a sequence prior to taking the FFT will properly interpolate samples in the frequency domain for creating samples of the continuous frequency response, and these interpolated samples will be exactly on the DTFT. (In Matlab/Octave the fft includes a zero padding option: fft(coeff,M) for example will pad out your sequence to M samples total and this should accurately give you your frequency response. — Dan Boschen
– Dan Boschen, Commented Jun 18, 2016 at 21:25
Quantization is a non-linear operation, so the transfer-function in terms of frequency response is not (fully) defined. The transfer will depend on input, so you will get inconsistent results. Dithering mostly solve these inconsistencies for quantization or re-quantizaton. — Arnfinn
– Arnfinn, Commented Jun 18, 2016 at 21:45
The frequency response due to quantization effects of the coefficients themselves is precisely defined. It is just a rounding of numbers and the resulting response is exact. @Arnfinn the effect you are describing is quantization of the datapath but that does not apply to rounding of the coefficients (to make it clear consider a system with a 32 bit datapath but the coefficients are rounded to 16 bits (not the output of the multiplier, only the coefficients themselves). — Dan Boschen
– Dan Boschen, Commented Jun 19, 2016 at 2:58
@Dan Boschen: I thought he did re-quantize between filters, potentially introducing some distortion.. maybe I misunderstood, sorry about that — Arnfinn
– Arnfinn, Commented Jun 19, 2016 at 3:40

Stack Exchange Network

Inter-filter bit width

2 Answers 2

Linked

Hot Network Questions

Inter-filter bit width

2 Answers 2

Linked

Related

Hot Network Questions