1
$\begingroup$

I've always been comfortable with the way the DFT picks out the magnitude of a frequency component, but much less so the phase. Mathematically, I know they're fairly similar - if we assume $x[n]$ is a complex exponential with frequency that lines up with a DFT frequency bin, then the magnitude and phase each just kind of "pop out" of the sum:

$$X[k_0] = \sum_{n=0}^{N-1} x[n] \cdot e^{-j \frac{2\pi}{N} k_0 n} = \sum_{n=0}^{N-1} A e^{j\left(\frac{2\pi}{N} k_0 n + \phi\right)} \cdot e^{-j \frac{2\pi}{N} k_0 n} = A e^{j\phi} \sum_{n=0}^{N-1} 1 = N A e^{j\phi}$$

The difference comes in my intuitive picture: I’ve always just gone ahead and thought of the DFT as lining up your basis function against the signal and tweaking the phase of the basis exponential until it correlates the most (is most similar to) your original signal - until the time average of the point-wise product of the signals is maximized. Then whatever phase maximizes this is the phase of the corresponding DFT coefficient, and the magnitude of that sum is its magnitude. But obviously there's no "phase tweaking" going on mathematically. (Whereas the summing for the magnitude is right there in the definition.) So the first part of my question is:

Question A: Is this intuitive picture actually equivalent to the DFT calculation (in that it has the same results), and if so, is there’s a good way of showing that it’s true?

Now, for sure what it’s doing is finding the phase coefficients for a DFS representation of your signal (as can be seen in the IDFT synthesis equation). But it’s less clear to me why finding DFS coefficients would be accomplished by finding the phase which lines each basis function up against your signal the best it can. Now, if one of your basis functions matches the signal frequency, it’s clear: lining up that basis function against your signal and twiddling the phase/mag until it matches, is exactly what gives you the right amplitude/phase for the DFS representation. Then for the other bins, you would twiddle the phase knob for a while, notice that no matter what the phase is the sum of the pointwise product is zero, so you set the magnitude to zero and move on.

However, consider the case where none of your basis frequencies match the signal (e.g. the case with spectral leakage). If you're trying to get the most "bang for your buck” out of these nearby frequencies, then maybe setting the phase so that it correlates the most with your original signal is the right move. You’re using these components to build your signal, and maybe the best way to do this is to set the phase of each such that they’re "as close as possible" to your signal. But it’s not obvious to me that this "bang for your buck" approach would lead to the correct DFS representation. You need things to cancel off destructively and constructively, and merely making everything as "close as possible" to the original doesn’t seem like it would obviously accomplish that. I'll note that proving that setting the phase to maximize correlation would also lead to the phase terms in the DFS representation would be a satisfactory answer to my Question A.

Now for the second part of the question: In this answer I show that referencing the phase to the center of the signal with ‘fftshift()’ is necessary for getting even a semi-accurate phase measurement when your input sinusoid doesn’t line up exactly with a basis function (second plot, lower left - "zero centered" = the "correct phase" is the phase in the center of the signal, and the purple "shifted" dots are the ones with fftshift()). Moreover, it makes the phase consistently near the input sinusoid phase for nearby bins (see zero padded version in same plot, lower right - this “rock steadiness” is also corroborated by this answer on the same thread). Again, fftshift()-ing the signal effectively re-references your phase to the middle of the signal, rather than the beginning. Or equivalently, it multiplies all of your frequency bins by $(-1)^k$ - this should make sense, as if every sinusoid is $N$ periodic, shifting it over by $N/2$ is going to either shift the phase by $\pi$ or $2\pi$ (depending on if the integer number of cycles of the frequency that fit in the window is odd or even). This "rock steadiness" seems to indicate that those nearby bins are being lined up with the original signal such that they are in phase with it - at least, in the middle of the signal. This brings us to:

Question B: Why would the DFT line up the basis function in these bins so that it's (approximately) close to the original signal in phase - not at the beginning of the signal, but in the middle?

If I assume that the DFT calculation is trying to "set the phase" of the basis functions so that they correlate the most with the original signal, then I've come up with the following rough argument for why it would be best to line it up with the middle of the signal, rather than at the beginning. Assume you have a signal at frequency $f_0$ and nearby basis sinusoid at frequency $f_0 + \epsilon$. First, let's see what happens if you try lining it up so the basis sinusoid is in phase with your signal at the beginning. By the end of the signal, even the small difference in frequency will have caused the basis function to be significantly misaligned. I've plotted a signal along with the real part of a basis sinusoid (I believe we can ignore the imaginary part since it will get cancelled off by its negative frequency partner, but maybe this is part of my problem).

beginning alignment

If, on the other hand, you line it up so they are in phase in the middle, it seems to me that it will be more aligned overall. It will still be misaligned by the edges, but less so - you've “averaged out” the misalignment on both sides.

middle alignment

However, I tried many different lengths of signals, and found that - while the time averaged pointwise product was usually greater in the middle-phase-alignment case - sometimes it's greater when they begin in phase, if only slightly. I'm not sure if this disproves my theory, or if it's because I am neglecting the imaginary part of the basis sinusoid (as mentioned above), or if it's an effect of the following fact: the phase of a bin nearby the signal frequency is not going to be exactly in phase with the signal (even if you fftshift()), it's going to be slightly off (the rock-steadiness shown above isn't perfectly rock-steady), and that small adjustment is enough to make the middle-aligned version win out in the end.

$\endgroup$
6
  • 1
    $\begingroup$ When talking about "phase" with signal processing, I actually find it MUCH easier to throw away the notion that phase is the time delay between two sinusoids. What is actually consistent is that phase is a rotation on the complex plane. And then with that, throw away using sinusoids to explain the basic concepts and go right to the spinning phasor: $e^{j\omega t}$-- ultimately that makes so much more sense. I make that point in this video starting at 13:26. Once you really grasp that, I believe it will answer your fundamental question here. youtube.com/watch?v=RxQWk1PjJLQ&t=792s $\endgroup$ Commented Feb 16 at 2:30
  • $\begingroup$ empty, you keep asking the same question even after it's been answered for you. $\endgroup$ Commented Feb 17 at 4:39
  • $\begingroup$ @robert I thought your answers very effectively demonstrated situations where the fftshift() circular shift has a favorable effect. But there are also "why" and "how" questions I have, such as this one. I do not believe this has been addressed in another answer, but would be more than happy to be proven wrong! $\endgroup$ Commented Feb 18 at 22:24
  • $\begingroup$ @dan Totally agree! But as far as I can tell, I'm not thinking about phase as a time delay between two sinusoids. I can see why you'd think that, but I'm talking about phase difference because it's precisely the difference I'm interested in - the difference in (instantaneous) phase of a sinusoidal input signal and in a basis exponential of the nearest frequency bin, and the inevitable fluctuation of this difference when the input frequency does not align with the bin. I used the real part of the basis function here in plots for ready 2D visualization of this fluctuation. $\endgroup$ Commented Feb 18 at 22:25
  • 1
    $\begingroup$ @empty-inch I am on board it you mean exponential not a real sinusoid, otherwise there are two basis functions to compare to and they can interact $\endgroup$ Commented Feb 19 at 0:13

1 Answer 1

4
$\begingroup$

I recommend taking a look at my answer to a related question here about how a DFT is calculated on a sample by sample basis.

Rather than thinking of your signal in terms of frequency and phase, think of it purely as a function of phase:

$$x[n]= A e^{j(2 \pi f_0 t_n +\theta)}= A e^{j\phi[n]}$$

In the above equation, we specified a constant frequency w.r.t. time $t_n$, but our signal could be more general and have a phase changing at a non-constant rate that could be instead equivalently represented by multiple frequency components. What's important is that we realize it is some signal that exists (and rotates if it's not DC!) in the complex plane.

We can look at the kernel of the Discrete Fourier Transform (DFT) as also being a phasor in the complex plane:

$$X[k]=\sum_{n=0}^{N-1}x[n]e^{-j2\pi k n/N}$$

$$e^{-j2\pi k n/N}\rightarrow e^{j(-\Phi_k [n])}$$

Rewriting your first equation as,

$$X[k_0] = \sum_{n=0}^{N-1} x[n] \cdot e^{j(-\Phi_k [n])} = \sum_{n=0}^{N-1} A e^{j \phi[n]} \cdot e^{j(-\Phi_k[n])} = A \sum_{n=0}^{N-1} e^{j(\phi[n]-\Phi_k[n])}$$

we can see that for each sample $n$, the DFT calculates a phasor with an angle as the difference between the input signal and the kernel before averaging them all together. I like to thing of a DFT bin calculation as comparing, at any given sample, the measured phase and some expected phase. If both phase functions, $\phi[n]$ and $\Phi[n]$ are changing at the same rate, then there is a constant offset angle for every sample and the summation will basically grow that DFT bin in that angular direction in the complex plane. So to your first question:

Question A: Is this intuitive picture actually equivalent to the DFT calculation (in that it has the same results), and if so, is there’s a good way of showing that it’s true?

Not really. The DFT makes no assumptions about what your signal looks like and is not trying to optimize/maximize anything (though you might perform that search on a DFT's result!). The DFT simply compares, on a sample-by-sample basis, the signal to a set of complex exponentials defined by the frequencies you choose and averages them together. If it so happens that your signal matches a given set of complex exponentials associated with some frequency, that summation just means your signal has consistent (apparent) phase offsets with the kernel. I say apparent because if your signal is aliased, then you just have consistent phase offsets at the time instances you sample but a continuous waveform would not.

I think that the Discrete Fourier Series interpretation is a bit misleading for the understanding the primary topic of the question and is probably worth another question (one that has probably been answered on this site before).

Your comment about spectral leakage encourages me to dig a little deeper into this explanation. Spectral leakage is exactly the case in which your signal doesn't have good phase offset matching for any kernel you compare against (e.g., the bins in your DFT), and so your DFT representation tries to represent that signal using a bunch of different frequencies (which may or may not be satisfying for your opinionated or automated analysis). Window functions help control this leakage by reducing the contribution of the edges of the signal and smoothly focusing the phase comparisons to the middle samples of the signal, which makes it easier to represent with a fewer number of DFT components/bins.

But sometimes even windowing is not enough to satisfy our desires for signal detection! If your signal has a variable frequency, such as a chirp, you need to have an appropriate kernel that also has the phases of a chirp such as the kernel of the Fractional Fourier Transform. If your frequency is constant but your amplitude changes exponentially, you may desire a kernel that accounts for that, as the Chirp Z-transform can. The whole point of all these transforms is the same: have a kernel whose phase changes very similarly to that of your expected signal (and preferably in an orthogonal manner for uniqueness of detection/estimation).

Question B: Why would the DFT line up the basis function in these bins so that it's (approximately) close to the original signal in phase - not at the beginning of the signal, but in the middle?

This goes back to the averaging of phase offsets by the DFT. The DFT doesn't care when any phase offset occurred, just an average of all offsets that exist. FFT shifting just changes what that offset is for each sample and it's coincidence that the average phase offset occurs in the middle of the signal. What if your signal has a phase offset that is not biased (e.g., half the time it leads the kernel by 90° and the other half of the time it lags by 90°)? You will get destructive interference; the sum of complex exponentials will result somewhere near the origin of the complex plane - a time in which its important to review both the phase and amplitude of the DFT bin.

$\endgroup$
7
  • $\begingroup$ Thanks!! Could you explain further why weighting the phase comparisons more in the middle of the signal via a window function makes it easier to represent the signal with less frequencies? I expect the answer to this will help me understand the original motivation for this thread, which was to understand why an input sinusoid with frequency between bins only has its phase correctly identified when you "ask" the DFT what the instantaneous phase is in the middle of the signal by fftshift()ing (second plot, first column here dsp.stackexchange.com/a/96231/74786) $\endgroup$ Commented Feb 18 at 22:48
  • $\begingroup$ As for A, of course the DFT is not making any assumptions about your signal or optimizing anything. And I'm sure you're right that the DFS interpretation is not the right viewpoint for this particular question - the picture of the phase of the DFT output as the average of the phase offsets between the signal and the basis function is much more helpful. But can you give me an example of a situation where the phase of the DFT output is not the phase which maximizes the pointwise product between the input signal and the corresponding basis function with that phase? $\endgroup$ Commented Feb 18 at 22:50
  • $\begingroup$ @empty-inch, The point about windowing making signal representation with fewer frequencies should probably be restated into a tighter group of frequencies. This is because the spectral response of window functions have two qualities: 1) A wider main lobe and 2) lower sideband levels. If you calculate the DFT of a Hann windowed sinusoid, it will always have strong energy grouped at ~3 bins and low energy at the rest. In contrast, a rectangular window will either have all the energy focused at one bin (and other bins fall nulls of the spectral response) OR spectral leakage will distribute... $\endgroup$ Commented Feb 19 at 3:56
  • $\begingroup$ that energy into many bins because the side lobes are relatively high. I'm not fully understanding your second question, as you refer to phase in several ways. If you assume an input and kernel with constant frequency (i.e., a DFT of some sinusoid), you have 1) the phase of the sinusoid, $\theta$, relative to some time, 2) the instantaneous phase, $\phi[n] = 2 \pi f_0 t_n + \theta$, 3) the instantaneous phase of the kernal, $\Phi_k[n]$, and the resulting phase of the DFT at the $k$'th bin, $\arg{X[k]}$. If $\theta$ is relative to the start of the kernel ($\phi_k[0] = 0$), then... $\endgroup$ Commented Feb 19 at 4:10
  • $\begingroup$ $\arg{X[k]} = \theta$ when the frequency of the input matches the frequency of the bin. As the bin frequency deviates from the input's frequency, phase error will deviate linearly. Remember that because the kernel is complex, the product of the input and the kernel is also a complex value and changing the phase of the kernel simply performs a rotation on the result. The magnitude does not change and therefore nothing can be "maximized" by changing the kernel phase. $\endgroup$ Commented Feb 19 at 4:20

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.