Return to Answer

replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

My favorite explanation of linear regression is geometric, but not visual. It treats the data set as a single point in a high-dimensional space, rather than breaking it up into a cloud of points in two-dimensional space.

The area $a$ and price $p$ of a house are a pair of numbers, which you can think of as the coordinates of a point $(a, p)$ in two-dimensional space. The areas $a_1, \ldots, a_{1000}$ and prices $p_1, \ldots, p_{1000}$ of a thousand houses are a thousand pairs of numbers, which you can think of as the coordinates of a point $$D = (a_1, \ldots, a_{1000}, p_1, \ldots, p_{1000})$$ in two-thousand-dimensional space. For convenience, I'll call two-thousand-dimensional space "data space." Your data set $D$ is a single point in data space.

If the relationship between area and price were perfectly linear, the point $D$ would sit in a very special region of data space, which I'll call the "linear sheet." It consists of the points $$M(\rho, \beta) = (a_1, \ldots, a_{1000}, \rho a_1 + \beta, \ldots, \rho a_{1000} + \beta).$$ The numbers $\rho$ and $\beta$ are allowed to vary, but $a_1, \ldots, a_{1000}$ are fixed to be the same areas that appear in your data set. I'm calling the linear sheet a "sheet" because it's two-dimensional: a point on it is specified by the two coordinates $\rho$ and $\beta$. If you want to get a sense of how the linear sheet is shaped, imagine a thin, straight wire stretched across three-dimensional space. The linear sheet is like that: it's perfectly flat, and its dimension is very low compared to the dimension of the space it sits inside.

In a real neighborhood, the relationship between area and price won't be perfectly linear, so the point $D$ won't sit exactly on the linear sheet. However, it might sit very close to the linear sheet. The goal of linear regression is to find the point $M(\rho, \beta)$ on the linear sheet which sits the closest to the data point $D$. That point is the best linear model for the data.

Using the Pythagorean theorem, you can figure out that the square of the distance between $D$ and $M(\rho, \beta)$ is $$[p_1 - (\rho a_1 + \beta)]^2 + \ldots + [p_{1000} - (\rho a_{1000} + \beta)]^2.$$ In other words, the distance between the data point and the model point is the total squared error of the model! Minimizing the total squared error of a model is the same thing as minimizing the distance between the model and the data in data space.

As Chris Rackauckas pointed out pointed out, calculus gives a very practical way to find the coordinates $\rho$ and $\beta$ that minimize the distance between $D$ and $M(\rho, \beta)$.

As Chris Rackauckas pointed out, calculus gives a very practical way to find the coordinates $\rho$ and $\beta$ that minimize the distance between $D$ and $M(\rho, \beta)$.

Source Link

answered Apr 3, 2016 at 19:28

Vectornaut

As Chris Rackauckas pointed out, calculus gives a very practical way to find the coordinates $\rho$ and $\beta$ that minimize the distance between $D$ and $M(\rho, \beta)$.