I'm trying to understand the connection between the geometric interpretation of solving an inconsistent system $A\mathbf{x} = \mathbf{b}$ and the name "least squares."
I understand the geometric approach perfectly. When an exact solution doesn't exist, the best approximate solution $\hat{\mathbf{x}}$ is the one that makes the vector $A\hat{\mathbf{x}}$ the closest possible vector to $\mathbf{b}$ within the column space of $A$. This closest vector is the orthogonal projection of $\mathbf{b}$ onto the column space.
This leads to the condition that the error vector, $\mathbf{e} = \mathbf{b} - A\hat{\mathbf{x}}$, must be orthogonal to the column space of $A$. This orthogonality is expressed by the normal equation: $$A^T(\mathbf{b} - A\hat{\mathbf{x}}) = \mathbf{0} \implies A^T A \hat{\mathbf{x}} = A^T \mathbf{b}$$ This derivation is based entirely on the geometry of vector spaces and the concept of orthogonality.
My confusion is with the terminology. I know that one can also find this solution by using calculus to minimize the sum of the squared errors. This calculus-based method is clearly a "least squares" problem.
My question is: Why do we also call the purely geometric/matrix method a "least squares" solution? Is it simply because its solution happen to be identical to the one derived from the calculus method? Or is there a more fundamental reason that the geometric condition of orthogonality is itself an expression of "least squares," independent of the calculus perspective?
Thank you.