This follows the videos at Khan Academy for proving the formula for a linear regression line. Thanks, Sal!
\[SE_{line} = \sum\limits_{i=1}^n (y_i - (mx_i+b))^2\]
\[SE_{line} = (y_1 - (mx_1+b))^2 + (y_2 - (mx_2+b))^2 + \dots + (y_n - mx_n+b))^2\]
\[= y_1^2 - 2y_1(mx_1+b) + (mx_1+b)^2\] \[+ y_2^2 - 2y_2(mx_2+b) + (mx_2+b)^2\] \[\vdots\] \[+ y_n^2 - 2y_n(mx_n+b) + (mx_n+b)^2\]
\[=y_1^2 - 2y_1mx_1 - 2y_1b + m^2x_1^2 + 2mx_1b + b^2\] \[+y_2^2 - 2y_2mx_2 - 2y_2b + m^2x_2^2 + 2mx_2b + b^2\] \[\vdots\] \[+y_n^2 - 2y_nmx_n - 2y_nb + m^2x_n^2 + 2mx_nb + b^2\]
\[=(y_1^2 + y_2^2+\dots+y_n^2) -2m(y_1x_1 + y_2x_2 + \dots + y_nx_n) - 2b(y_1+y_2+\dots+y_n)+2m(x_1+x_2+\dots+x_n) + nb^2\]
Note that:
\[\overline{y^2} = \frac{y_1^2+y_2^2+\dots+y_n^2}{n}\]
And so:
\[y_1^2+y_2^2+\dots+y_n^2 = n\overline{y^2}\]
Similarly:
\[x_1y_1 + x_2y_2 + \dots+x_ny_n = n\overline{xy}\]
\[SE_{line} = n\overline{y^2} - 2mn\overline{xy} -2bn\overline{y} + m^2n\overline{x^2} + 2mbn\overline{x} + nb^2\]
Another way of working out the above:
\[SE_{line} = \sum\limits_{i=1}^n (y_i - (mx_i + b))^2\] \[= \sum\limits_{i=1}^n y_i^2 - 2y_i(mx_i + b) + (mx_i + b)^2\] \[=\sum\limits_{i=1}^n y_i^2 -2my_ix_i - 2y_ib + m^2x_i^2 + 2mx_ib + b^2\] \[= n\overline{y^2} - 2mn\overline{xy} -2bn\overline{y} + m^2n\overline{x^2} + 2mnb\overline{x} + nb^2\]
Now we can optimize (by minimizing) the above expression. It represents a surface. Everything can be considered a constant except the m’s and the b’s. The latter can vary to form a surface in three dimensions. So m and b are both axes, and the squared error is the third axis. A three-dimenaional parabola is formed. The goal is to find the lowest possible point in this three-dimensional parabola, i.e.:
\[\frac{\delta SE}{\delta m} = 0\]
(this is the partial derivative for the slope)
And:
\[\frac{\delta SE}{\delta b} = 0\]
(this is the partial derivative for the y-intercept)
So the next step is to take the partial derivative of \(SE_{line}\) with respect to \(m\).
\[SE_{line} = n\overline{y_2} - 2mn\overline{xy} -2bn\overline{y} + m^2n\overline{x^2} + 2mbn\overline{x} + nb^2\]
The first term, \(n\overline{y_2}\), has no \(m\) term in it, so it is a constant. This is also true of the third term, \(-2bn\overline{y}\), and the last term, \(nb^2\).
So:
\[\frac{\delta SE}{\delta m} = -2n\overline{xy} + 2mn\overline{x^2} +2bn\overline{x}\]
\[\frac{\delta SE}{\delta b} = -2n\overline{y} + 2mn\overline{x} + 2nb\]
Now we solve for 0 for each of these partial derviatives.
First, for m:
\[-2n\overline{xy} + 2mn\overline{x^2} +2bn\overline{x} = 0\] \[2n(-\overline{xy} + \overline{x^2}m + b\overline{x}) = 0\] \[-\overline{xy} + \overline{x^2}m + b\overline{x} = 0\]
Second, for b (first, second and fourth terms are all constants):
\[-2n\overline{y} + 2mn\overline{x} + 2nb = 0\] \[-\overline{y} + m\overline{x} + b = 0\]
Now rewrite, moving toward \(mx+b\) form:
\[m\overline{x^2} + b\overline{x}= \overline{xy}\] \[m\overline{x} + b = \overline{y}\]
We want both of these in \(mx + b\) form, and the second is already there. We can see that the point \((\overline{x},\overline{y})\) lies on the optimized/minimized line.
So for the first, divide both sides by \(\overline{x}\):
\[m\frac{\overline{x^2}}{\overline{x}} + b = \frac{\overline{xy}}{\overline{x}}\]
Now we have another point on the minimized line, \(\frac{\overline{x^2}}{\overline{x}}, \frac{\overline{xy}}{\overline{x}}\)
Now we can finish the problem two ways, 1) use the two points to find the line or 2) solve both equations.
Subtract one equation from the other (multiply one by -1 first, then add them):
\[m\overline{x} + b = \overline{y}\] \[-m\frac{\overline{x^2}}{\overline{x}} - b = -\frac{\overline{xy}}{\overline{x}}\]
This results in:
\[m(\overline{x} - \frac{\overline{x^2}}{\overline{x}}) = \overline{y} - \frac{\overline{xy}}{\overline{x}}\] \[m = \frac{\overline{y} - \frac{\overline{xy}}{\overline{x}}}{\overline{x} - \frac{\overline{x^2}}{\overline{x}}}\]
If you compare this to the two points we found earlier, you see that this is the exact same result if we used those two points to determine the slope: the change in y’s over the change in x’s.
Next, simplify by multiplying numerator and denominator by \(\overline{x}\):
\[m = \frac{\overline{y} - \frac{\overline{x}\overline{y}}{\overline{x}}}{\overline{x} - \frac{\overline{x^2}}{\overline{x}}} \times \frac{\overline{x}}{\overline{x}} = \frac{\overline{x}\overline{y}-\overline{xy}}{(\overline{x})^2 - \overline{x^2}}\]
Now you can plug in the actual values to find \(m\), and then use it to solve for \(b\) in \(m\overline{x} + b = \overline{y}\), or:
\[b = \overline{y} - m\overline{x}\]