0
$\begingroup$

I am seeking information about the compression ratios achieved at each stage of DCT-based compression processes. Specifically, I'm interested in understanding the compression ratio immediately after applying the Discrete Cosine Transform (DCT), and how this ratio changes following quantization. Despite reviewing numerous sources, I have yet to find a systematic analysis detailing these specific progression stages. Could someone please recommend research that evaluates the impact of each step on the overall compression ratio? Understanding the individual contributions of each phase is crucial for fully appreciating the efficiency of the entire compression process

$\endgroup$
5
  • $\begingroup$ we can't know what your ratios are without you actually telling us about the coder you're working with (they are different!) and without knowing the nature of your signals. Typically, whenever someone proposes a methodology, they will have metrics in their papers, so the relevant research is probably the original papers by the authors and followups. $\endgroup$ Commented Apr 25, 2024 at 17:48
  • $\begingroup$ Can you please recommend any paper that conducts the research in a stepwise manner (irrespective of the coder they are using)? $\endgroup$ Commented Apr 25, 2024 at 18:16
  • $\begingroup$ How about you name a compressor you're interested in, and pick the paper that introduced it? Throwing arbitrary papers at you will not really help anyone. And frankly, I don't think the thing you think exists exists for every coder; quite the contrary, in most encoders all of the compression happens in a very last step, so it's always 0% of compression for all steps but the step that actually reduces quanitzation and then applies an entropy encoder. That's not surprising – things like the DCT are invertible and hence cannot themselves be compressors. $\endgroup$ Commented Apr 25, 2024 at 18:17
  • $\begingroup$ @SahilSharma There is no such thing like "irrespective of the coder they are using". A 1D audio codec works completely different from a 2D image codec even though both use a DCT. The DCT itself does not compress or decode/encode. What matters is what you do with the transformed data. $\endgroup$ Commented Apr 25, 2024 at 18:19
  • $\begingroup$ I understand your point, but my query specifically concerns the method for calculating the compression ratio immediately after applying the Discrete Cosine Transform (DCT). My interest lies in understanding how the precision of the coefficients is adjusted post-transform. Could you recommend or share any research or implementations that detail how the compression ratio is computed at this stage? Even though it is indeed encoder-dependent, I just need a proof. $\endgroup$ Commented Apr 25, 2024 at 20:41

2 Answers 2

0
$\begingroup$

Starting from a unit8 image, a classical II-DCT often yields real numbers, since $\sin$ and $\cos$ can be irrational.

Therefore even the first step even does not compress, turning 8bit integers into 32-bit floats; a potential 4-fold redundancy.

Then you quantize, and data gets more discrete. At this point, you may: round the first step, through a quantized table. There, you can somehow hints at a result, summing all token's length value in a big table. Details ahead.

$\endgroup$
0
$\begingroup$

The classical lossy codecs can perhaps be simplified as [transform]->[quantize]->[entropy coding]. The end-goal is to achieve a reduction in file size / bandwidth (compression ratio) at minimal perceptual loss, but I don't think it makes sense to attribute the compression of each component. Any more than it makes sense to attribute my knowledge (or lack thereof) in signal processing to my kidneys. I need my kidneys in order to live and I need to live in order to practice signal processing.

DCT (or similar transform) has several functions, I believe.

  1. "Energy compaction" means that it has been shown to behave somewhat like PCA/KLT for the types of input often enountered, where most of the energy/variance is pushed to a few bins, leaving other bins largely empty/noisy/residual.
  2. A structured representation sorting the signal according to "frequency", meaning that perceptual characteristics that depend on frequency can be exploited (reduced accuracy in high frequencies, masking,...)
  3. FFT-like symmetry that can be used for efficient computation
  4. Integer approximations that means that encoder and decoder can be perfect inverse and hw independent even when calculated and transmitted at finite precision

A simple and common application would be to use the DCT/quantization to produce long runs of (near) zero coefficients. Those can be coded efficiently using run-length type compression.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.