It is stated that the equation for the best $\hat{x}$ uses the covariance matrix $V$ such that $$ A^TV^{-1}A\hat{x}=A^TV^{-1}b $$ and the error vector is $\vec{e}=(b-Ax)^TV^{-1}(b-Ax)$ and we need to minimize $e=\sum_{i=1}^m\dfrac{(b-Ax)_i^2}{\sigma_i^2}$
How do I make sense of this idea of weighted least squares in comparison with ordinary least square method ?
And how does the inverse of variances become the respective weights ?
My Understanding
Least square
We have a system $Ax=b$ and $b$ need not have to be in the column space of $A$.
We need the vector $A\hat{x}$ in $C(A)$ thats best fit to $b$, ie., closest to $b$.
So we need to minimize the error vector $\vec{e}=b-Ax$, ie., we need to minimize $e=e_1^2+\cdots+e_m^2=(b-Ax)^T(b-Ax)=||b-Ax||^2=\sum_{i=1}^m(b-Ax)_i^2$
Weighted Least square
I think here we are assigning weights for each errors such that each error is not equally likely, say $w_i$
Then we need to minimize the error $e=\sum_{i=1}^m w_ie_i=(b-Ax)^TW(b-Ax)=\sum_{i=1}^m w_i(b-Ax)_i^2$.
But how does the matrix for the weights $W$ is the inverse of the covariance matrix $V^{-1}$ ?
Reference: Page 557, Introduction to Linear Algebra, Gilbert Strang
Note: I am actually looking for an exaplanation based on linear algebra, by not going deep into statistics

