So, I was learning about the curve fitting methods, and the teacher was showing an example where he would have some data where each point was represented with $(x_1, x_2, x_3)$ and we had a measurement $y$ at that point. We wanted to approximate the $y$ with a linear combination of polynomials.
We first defined:
$$ \begin{matrix} p_1(x_1, x_2, x_3) = 1 & p_2(x_1, x_2, x_3) = x_1 & p_3(x_1, x_2, x_3) = x_2 \\ p_4(x_1, x_2, x_3) = x_3 & p_5(x_1, x_2, x_3) = x_1^2 & p_6(x_1, x_2, x_3) = x_2^2 \\ p_7(x_1, x_2, x_3) = x_3^2 & p_8(x_1, x_2, x_3) = x_1x_2 & p_9(x_1, x_2, x_3) = x_1x_3 \\ p_{10}(x_1, x_2, x_3) = x_2x_3 \end{matrix} $$
We proceeded to define a matrix $A$ such that:
$$ A = \begin{bmatrix} \left<p_1, p_1\right> & \left<p_1, p_2\right> & \dots & \left<p_1, p_{10}\right> \\ \left<p_2, p_1\right> & \left<p_2, p_2\right> & \dots & \left<p_2, p_{10}\right> \\ \vdots & \vdots & \ddots & \vdots \\ \left<p_{10}, p_1\right> & \left<p_{10}, p_2\right> & \dots & \left<p_{10}, p_{10}\right> \\ \end{bmatrix} $$
And $b$ such that:
$$ b = \begin{bmatrix} \left<y, p_1\right> \\ \left<y, p_2\right> \\ \vdots \\ \left<y, p_{10}\right> \\ \end{bmatrix} $$
Finally, we proceeded to solve the equation $A\vec{c} = \vec{b}$ where $\vec{c}$ is the vector with the coefficients such that (using $c$ because $x$ was already used for the features):
$$ f(x_1, x_2, x_3) = \sum c_ip_i(x_1, x_2, x_3) $$
The thing is, I didn't really understand why we need to compute the inner product of those polynomials to create those matrices, my first idea would be to just compute the polynomials and fit using the Moore-Penrose inverse, what is the difference between those two approaches? I didn't really get the teacher's explanation and I can't find the explanation in any book.
Edit 1 So, I wasn't very clear on how I would approach the problem, instead of my professors solution, so I wanted to clarify with an example, let's say I had the data
$$ \begin{array}{|cc|c|} \hline x_1 & x_2 & y \\ \hline 1 & 2 & 1\\ 1 & 3 & 2\\ 2 & 1 & 3\\ 2 & 2 & 2\\ \hline \end{array} $$
And I wanted to approximate with $f(x, y) = c_1 + c_2x_1 + c_3x_2 + c_4x_1^2 c_5x_2^2 + c_6x_1x_2$ I would create a matrix A like that:
$$ A = \begin{bmatrix} 1 & x_1 & x_2 & x_1^2 & x_2^2 & x_1x_2 \end{bmatrix} = \begin{bmatrix} 1 & 1 & 2 & 1 & 4 & 2 \\ 1 & 1 & 3 & 1 & 9 & 3 \\ 1 & 2 & 1 & 4 & 1 & 2 \\ 1 & 2 & 2 & 4 & 4 & 4 \end{bmatrix} $$
And have $b$ like that:
$$ b = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 2 \end{bmatrix} $$
and proceed to solve $A\vec{c} = \vec{b}$ to find the coefficients, without computing the inner product.
Edit 2 So, another clarification would be on how the inner product is defined on my teacher's example, let's say we have the same data as my example before, then:
$$ \left<p_2, p_3\right> = 1\cdot2 + 1\cdot3 + 2\cdot1 + 2\cdot2 = 11 $$
Also, right now it's an exact fit because I have more features than data, which I did to leave the example shorter, but you can imagine that in the real scenario I have much more data. Also, the best-fit is the one that minimizes the squared error.