The goal of the least squares method is to select the parameters \(\hat{\beta}_1\) and \(\hat{\beta}_2\) such that the sum of squared residuals (SSR) is minimized. The residuals (\(e_i\)) are defined as:
\[ e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \]
The sum of squared residuals (SSR) is:
\[ SSR = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \right)^2 \]
To minimize \(SSR\), we take the partial derivatives of \(SSR\) with respect to \(\hat{\beta}_1\) and \(\hat{\beta}_2\), set them to zero, and solve for \(\hat{\beta}_1\) and \(\hat{\beta}_2\). This gives us the normal equations:
\[ \frac{\partial SSR}{\partial \hat{\beta}_1} = -2 \sum_{i=1}^n (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \] \[ \frac{\partial SSR}{\partial \hat{\beta}_2} = -2 \sum_{i=1}^n x_i (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \]
Solving these equations yields the least squares estimators:
\[ \hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x} \] \[ \hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
Here: - \(\bar{y}\) is the mean of \(y\), - \(\bar{x}\) is the mean of \(x\), - \(\hat{\beta}_1\) is the intercept, - \(\hat{\beta}_2\) is the slope.
Total variation is the sum of squared deviations of \(y_i\) from the mean \(\bar{y}\):
\[ \text{Total Variation} = \sum_{i=1}^n (y_i - \bar{y})^2 \]
Unexplained variation is the sum of squared residuals (\(e_i\)):
\[ \text{Unexplained Variation} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \]
SUM
function in
Google Sheets.SUM
function in
Google Sheets.Instruction: Use a t-distribution table to find the critical value for a 95% confidence level with \(n-2\) degrees of freedom.
Why?: If the t-statistic exceeds the critical value, we reject the null hypothesis (\(H_0: \beta = 0\)).
Rule-of-thumb for large \(n\) : reject \(H_0\) if \(t_{\hat{\beta}_2} < -2\) or \(t_{\hat{\beta}_2} > 2\).