I'm reading Brian McFee's excellent book Digital Signals Theory.
He presents the DFT as a measure of similarity $S$ between a digital signal $x[n]$ with $N$ samples and a set of reference signals $y_m[n]$, where $m$ is one of the set of analysis frequencies $m = 0, 1, 2, ... N-1$. So the total similarity $S(x, y_m)$ is the sum across all analysis frequencies. (This presentation makes a lot of sense to me, but I haven't seen it in other sources, and it's central to my question, so I'm stating it first.)
He first introduces this similarity using cosine waves. Then, to motivate the need to use the complex sinusoid $e^{-j\phi}$ as our reference signals / basis function, he shows that, in the worst case scenario, measuring the similarity of a signal $x[n]$ at an analysis frequency $m$ with the cosine waves could give us a similarity score of 0.
I'll show the derivation since that's what my question is about.
Let $x[n]$ be a sine wave at analysis frequency $f_m=3/40$ (where the signal has a duration of $N=40$ samples and we complete $m=3$ cycles per period at the sampling frequency $f_s=20$). $$ x[n] = sin(2\pi \cdot \frac{3}{40} \cdot n) $$
We can convert this sin wave into a standard cosine: $$ \begin{align} x[n] &= \text{sin}(2 \pi \cdot \frac{3}{N} \cdot n) &\text{by definition of }x \newline &= \text{cos}(\frac{\pi}{2} - 2\pi \cdot \frac{3}{N} \cdot n) &\text{by the conversion rule} \newline &= \text{cos}(2\pi \cdot \frac{3}{N} \cdot n - \frac{\pi}{2}) &\text{cos}(\phi) = \text{cos}(-\phi) \newline \end{align} $$
Then if we measure similarity with our cosine waves we have: $$ \begin{align} S(x, y_m) &= \sum_{n=0}^{N-1} \text{cos}(2\pi \cdot \frac{3}{N} \cdot n - \frac{\pi}{2}) \cdot \text{cos}(2\pi \cdot \frac{m}{N} \cdot n) \newline &= \frac{1}{2} \sum_{n=0}^{N-1} \left( \text{cos}(2\pi \cdot \frac{3 + m}{N} \cdot n - \frac{\pi}{2}) + \text{cos}(2\pi \cdot \frac{3 - m}{N} \cdot n - \frac{\pi}{2}) \right) \newline \end{align} $$ where the second line follows from the product-sum rule from trigonemetry.
When $m = 3$, the first term sums to $0$ (because the sum of samples of any sinusoid that completes a whole number of cycles in the duration of the signal will be zero). The second term $(3 -m = 0)$ will simplify to $$ \text{cos} \left(2\pi \cdot \frac{0}{N} \cdot n - \frac{\pi}{2} \right) = \text{cos} \left( - \frac{\pi}{2} \right) = \text{cos} \left( \frac{\pi}{2} \right) = 0. $$
So in the worse case, when the phase offset is $\phi = \frac{\pi}{2}$, we'd get $S = 0$.
Finally, the chapter states that more generally, for a signal with offset $\phi$ we have the similarity scores
$$ S(x, y_m) = \begin{cases} \frac{N}{2} \cdot \text{cos}(\phi) & \text{if } m \neq \frac{N}{2} \cr N \cdot \text{cos}(\phi) & \text{if } m = \frac{N}{2} \end{cases} $$
and that this can be shown by plugging in to the derivation above.
Intuitively this last claim makes sense to me. The similarity score will vary with $\phi$.
But I can't get this derivation to work.
Instead I get $$ \begin{align} S(x, y_m) &= \sum_{n=0}^{N-1} \text{cos}(2\pi \cdot \frac{m}{N} \cdot n + \phi) \cdot \text{cos}(2\pi \cdot \frac{m}{N} \cdot n) \newline &= \frac{1}{2} \sum_{n=0}^{N-1} \left( \text{cos}(2\pi \cdot \frac{m + m}{N} \cdot n + \phi) + \text{cos}(\phi) \right) \newline \end{align} $$
And I don't see how to reduce this to the cases above.
What am I missing?
I realize this might be simple math mistake, and less of a DSP-specific question, but I'm hoping this forum will be quicker to spot it. Thanks!