tl;dr: perspective-correct interpolation of NDCs doesn't work; they need to be linearly interpolated in screen-space instead.
This answer uses math notation from this paper, section 3: interpolating vertex attributes. (I highly recommend reading the entire paper to understand how perspective-correct interpolation of vertex attributes works; the following discussion is based on it.)
Why does the code in the tutorial work?
The hardware does perspective-correct interpolation of vertex clip-space coordinates to yield a pixel's clip-space coordinates. Then the perspective divide (by the interpolated w-coordinate) in the pixel shader transforms that into NDC.
But let's dive a bit deeper and see what the interpolation really does under the hood:
Let $I_1$ and $I_2$ be clip-space coordinates of two vertices, and let $Z_1$ and $Z_2$ be their respective view-space z-coordinate (or clip-space w-coordinate). We want to find a pixel's NDCs, or mathematically: $I_t/Z_t$ where $I_t$ is its clip-space coordinates and $Z_t$ is the view-space z-coordinate.
Now recall that NDCs are coordinates of a 3D object after projecting it on the 2D projection window (screen-space). So to find a pixel's NDCs in screen-space, it suffices to linearly interpolate the vertices' NDCs in screen-space. Mathematically:
$$ \tag{*}\label{*} \frac{I_t}{Z_t}=\frac{I_1}{Z_1}+s\left(\frac{I_2}{Z_2}-\frac{I_1}{Z_1}\right) $$
($s$ is a screen-space barycentric coordinate of a triangle, see in the paper linked above).
To get this, we first let the hardware do perspective-correct interpolation of the vertex clip-space coordinates. This is equation (16) in the paper and it gives us $I_t$ - the pixel's clip-space coords: $$ I_t=\left[\frac{I_1}{Z_1}+s\left(\frac{I_2}{Z_2}-\frac{I_1}{Z_1}\right)\right]Z_t $$ Then in the pixel shader we divide by w which is simply the $Z_t$ of a pixel (remember that w gets interpolated as well), which gives us $I_t/Z_t$ (NDC) as desired.
So the hardware does the perspective divide and interpolation for us, which gives the pixel's NDCs, but it also multiplies by $Z_t$ which transforms from NDC back to clip-space. And what the perspective divide in the pixel shader really does is transform back to NDC by cancelling out the $Z_t$.
What if we do the division in the vertex shader instead?
The vertex NDCs are $I_1/Z_1$ and $I_2/Z_2$ and the perspective-correct interpolation equation gives: $$ \left[\frac{I_1}{Z_1^2}+s\left(\frac{I_2}{Z_2^2}-\frac{I_1}{Z_1^2}\right)\right]Z_t $$ which is clearly not $I_t/Z_t$ as defined in $\eqref{*}$. The reason this equation doesn't work in this case is that its derivation requires the attribute to vary linearly across the triangle in 3D space (view space). But screen-space NDC coords don't vary linearly in 3D space because of the perspective divide, so the premise is broken. (Clip-space coords do vary linearly in view-space because they are the result of a linear transform.) Perspective-correct interpolation of values that have undergone perspective divide doesn't work. They are not in 3D space anymore, rather in 2D projection space and thus require linear interpolation in screen-space to get correct results. In practice it's accomplished using perspective-correct interpolation of clip-space coords, plus perspective divide per-pixel.
Fun fact: $Z_{ndc}$ has the form $Z_{ndc}=A+\frac{B}{z}$ (where $z$ is in view space) and is also linearly interpolated in screen-space for z-buffering:
- Perspective-correct interpolation of clip-space z of the form $z'=Az+B$: $$ \left[\frac{Az_1+B}{z_1}+s\left(\frac{Az_2+B}{z_2}-\frac{Az_1+B}{z_1}\right)\right]z_t = \left\{A+\frac{B}{z_1}+s\left[A+\frac{B}{z_2}-\left(A+\frac{B}{z_1}\right)\right]\right\}z_t $$
- Perspective divide per-pixel (div by $z_t$): $$ A+\frac{B}{z_1}+s\left[A+\frac{B}{z_2}-\left(A+\frac{B}{z_1}\right)\right] $$
Thus both steps amount to a lerp of $Z_{ndc}$.
Projtexture accessing functions that do the divide for you. I'm surprised that the tutorial doesn't bother to use them, since letting you know they exist is half the point of such a tutorial in the first place. $\endgroup$