tl;dr: perspective-correct interpolation of NDCs doesn't give us what we want; we need to _linearly_ interpolate them in screen-space.
----------------------------------------
This answer uses math notation from [this paper](https://www.comp.nus.edu.sg/~lowkl/publications/lowk_persp_interp_techrep.pdf), _section 3: interpolating vertex attributes_. (_I highly recommend reading the entire paper_ to understand how perspective-correct interpolation of vertex attributes works; the following discussion is based on it.)
Let $I_1$ and $I_2$ be _clip-space_ coordinates of two vertices, and let $Z_1$ and $Z_2$ be their respective view-space _z_-coordinate (or clip-space _w_-coordinate).
We want to find a pixel's NDCs, or mathematically: $I_t/Z_t$ where $I_t$ is its clip-space coordinates and $Z_t$ is the view-space _z_-coordinate.
Now recall that NDCs are coordinates of a 3D object _after projecting it on the 2D projection window_ (screen-space). So to find a pixel's NDCs in screen-space, it suffices to linearly interpolate the vertices' NDCs _in screen-space_. Mathematically:
$$
\tag{*}\label{*}
\frac{I_t}{Z_t}=\frac{I_1}{Z_1}+s\left(\frac{I_2}{Z_2}-\frac{I_1}{Z_1}\right)
$$
($s$ is a screen-space barycentric coordinate, see in the paper linked above).
To get this, we first let the hardware do perspective-correct interpolation of the vertex _clip-space_ coordinates. This is **equation (16)** in the paper and it gives us $I_t$ - the pixel's _clip-space_ coords:
$$
I_t=\left[\frac{I_1}{Z_1}+s\left(\frac{I_2}{Z_2}-\frac{I_1}{Z_1}\right)\right]Z_t
$$
Then in the pixel shader we divide by _w_ which is simply the $Z_t$ of a pixel (remember that _w_ gets interpolated as well), which gives us $I_t/Z_t$ (NDC) as desired.
So _the hardware does the perspective divide and interpolation for us_, which gives the pixel's NDCs, but it also multiplies by $Z_t$ which transforms from NDC back to clip-space. So all that's left to do is transform back to NDC by cancelling out the $Z_t$ using division (another perspective divide).
---
Now let's see what happens if we do the division in the vertex shader instead:
The vertex NDCs are $I_1/Z_1$ and $I_2/Z_2$ and the perspective-correct interpolation equation gives:
$$
\left[\frac{I_1}{Z_1^2}+s\left(\frac{I_2}{Z_2^2}-\frac{I_1}{Z_1^2}\right)\right]Z_t
$$
which is clearly not $I_t/Z_t$ as defined in $\eqref{*}$. So perspective-correct interpolation of NDCs is wrong and _linear_ interpolation in screen-space is really what we want.
And yes, to get correctly interpolated NDCs we need to do the perspective division _per-pixel_.