I'm designing a $2$-stage filter and was simulating the effect of quantization to a certain number of bits at the filter output. The result I got was far from expected - and would appreciate any intuition from the experts here.
The high-level idea is:
- Filter input data is $N$ bits.
- $\rm Stage_1$ is a half-band interpolation by $2$.
- Internally the filter accuracy is maintained (to approx $30$ bits), but the output is rounded to $M$ bits.
- The $2^{\textrm{nd}}$ stage takes the $M$ bits in, filters it at full accuracy (again a half-band interpolation by $2$ filter, with full precision internally)
- The output of the filter is rounded to $P$ bits.
Initially I set $N=M=P$ to be $16$-bits. The filters themselves have a stop-band attenuation of $\approx90\textrm{ dB}$. What surprised me was $16$-bits between filters was not enough - at least in terms of the frequency response.
So if I do not quantize the inter-filter values I of course get ideal behaviour and the overall frequency response (measured via an FFT of an impulse response) is the combination of each individual frequency response - and my stop-band attenuation is $\approx90\textrm{ dB}$. Of course the input bit-width does not matter here.
However, if I quantize to $16$-bits between stages, with rounding, I would have thought this more than sufficient to preserve the filter frequency response accuracy - but I was wrong - the stop-band attenuation dropped to $\approx84\textrm{ dB}$ in places. It required $18$-bits inter-filter bit-width in order to maintain performance.
So the question is - is there a rule of thumb for how many bits to preserve. Why is $16$-bits not enough for $\approx90\textrm{ dB}$ attenuation?


