I've always been comfortable with the way the DFT picks out the magnitude of a frequency component, but much less so the phase. Mathematically, I know they're fairly similar - if we assume $x[n]$ is a complex exponential with frequency that lines up with a DFT frequency bin, then the magnitude and phase each just kind of "pop out" of the sum:
$$X[k_0] = \sum_{n=0}^{N-1} x[n] \cdot e^{-j \frac{2\pi}{N} k_0 n} = \sum_{n=0}^{N-1} A e^{j\left(\frac{2\pi}{N} k_0 n + \phi\right)} \cdot e^{-j \frac{2\pi}{N} k_0 n} = A e^{j\phi} \sum_{n=0}^{N-1} 1 = N A e^{j\phi}$$
The difference comes in my intuitive picture: I’ve always just gone ahead and thought of the DFT as lining up your basis function against the signal and tweaking the phase of the basis exponential until it correlates the most (is most similar to) your original signal - until the time average of the point-wise product of the signals is maximized. Then whatever phase maximizes this is the phase of the corresponding DFT coefficient, and the magnitude of that sum is its magnitude. But obviously there's no "phase tweaking" going on mathematically. (Whereas the summing for the magnitude is right there in the definition.) So the first part of my question is:
Question A: Is this intuitive picture actually equivalent to the DFT calculation (in that it has the same results), and if so, is there’s a good way of showing that it’s true?
Now, for sure what it’s doing is finding the phase coefficients for a DFS representation of your signal (as can be seen in the IDFT synthesis equation). But it’s less clear to me why finding DFS coefficients would be accomplished by finding the phase which lines each basis function up against your signal the best it can. Now, if one of your basis functions matches the signal frequency, it’s clear: lining up that basis function against your signal and twiddling the phase/mag until it matches, is exactly what gives you the right amplitude/phase for the DFS representation. Then for the other bins, you would twiddle the phase knob for a while, notice that no matter what the phase is the sum of the pointwise product is zero, so you set the magnitude to zero and move on.
However, consider the case where none of your basis frequencies match the signal (e.g. the case with spectral leakage). If you're trying to get the most "bang for your buck” out of these nearby frequencies, then maybe setting the phase so that it correlates the most with your original signal is the right move. You’re using these components to build your signal, and maybe the best way to do this is to set the phase of each such that they’re "as close as possible" to your signal. But it’s not obvious to me that this "bang for your buck" approach would lead to the correct DFS representation. You need things to cancel off destructively and constructively, and merely making everything as "close as possible" to the original doesn’t seem like it would obviously accomplish that. I'll note that proving that setting the phase to maximize correlation would also lead to the phase terms in the DFS representation would be a satisfactory answer to my Question A.
Now for the second part of the question: In this answer I show that referencing the phase to the center of the signal with ‘fftshift()’ is necessary for getting even a semi-accurate phase measurement when your input sinusoid doesn’t line up exactly with a basis function (second plot, lower left - "zero centered" = the "correct phase" is the phase in the center of the signal, and the purple "shifted" dots are the ones with fftshift()). Moreover, it makes the phase consistently near the input sinusoid phase for nearby bins (see zero padded version in same plot, lower right - this “rock steadiness” is also corroborated by this answer on the same thread). Again, fftshift()-ing the signal effectively re-references your phase to the middle of the signal, rather than the beginning. Or equivalently, it multiplies all of your frequency bins by $(-1)^k$ - this should make sense, as if every sinusoid is $N$ periodic, shifting it over by $N/2$ is going to either shift the phase by $\pi$ or $2\pi$ (depending on if the integer number of cycles of the frequency that fit in the window is odd or even). This "rock steadiness" seems to indicate that those nearby bins are being lined up with the original signal such that they are in phase with it - at least, in the middle of the signal. This brings us to:
Question B: Why would the DFT line up the basis function in these bins so that it's (approximately) close to the original signal in phase - not at the beginning of the signal, but in the middle?
If I assume that the DFT calculation is trying to "set the phase" of the basis functions so that they correlate the most with the original signal, then I've come up with the following rough argument for why it would be best to line it up with the middle of the signal, rather than at the beginning. Assume you have a signal at frequency $f_0$ and nearby basis sinusoid at frequency $f_0 + \epsilon$. First, let's see what happens if you try lining it up so the basis sinusoid is in phase with your signal at the beginning. By the end of the signal, even the small difference in frequency will have caused the basis function to be significantly misaligned. I've plotted a signal along with the real part of a basis sinusoid (I believe we can ignore the imaginary part since it will get cancelled off by its negative frequency partner, but maybe this is part of my problem).
If, on the other hand, you line it up so they are in phase in the middle, it seems to me that it will be more aligned overall. It will still be misaligned by the edges, but less so - you've “averaged out” the misalignment on both sides.
However, I tried many different lengths of signals, and found that - while the time averaged pointwise product was usually greater in the middle-phase-alignment case - sometimes it's greater when they begin in phase, if only slightly. I'm not sure if this disproves my theory, or if it's because I am neglecting the imaginary part of the basis sinusoid (as mentioned above), or if it's an effect of the following fact: the phase of a bin nearby the signal frequency is not going to be exactly in phase with the signal (even if you fftshift()), it's going to be slightly off (the rock-steadiness shown above isn't perfectly rock-steady), and that small adjustment is enough to make the middle-aligned version win out in the end.


fftshift()circular shift has a favorable effect. But there are also "why" and "how" questions I have, such as this one. I do not believe this has been addressed in another answer, but would be more than happy to be proven wrong! $\endgroup$