Session Overview


Session 1: Calculating Residuals, \(R^2\), and Error Variance (45 minutes)

1. Refresher on Parameter Selection by Least Squares Method

The goal of the least squares method is to select the parameters \(\hat{\beta}_1\) and \(\hat{\beta}_2\) such that the sum of squared residuals (SSR) is minimized. The residuals (\(e_i\)) are defined as:

\[ e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \]

The sum of squared residuals (SSR) is:

\[ SSR = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \right)^2 \]

To minimize \(SSR\), we take the partial derivatives of \(SSR\) with respect to \(\hat{\beta}_1\) and \(\hat{\beta}_2\), set them to zero, and solve for \(\hat{\beta}_1\) and \(\hat{\beta}_2\). This gives us the normal equations:

\[ \frac{\partial SSR}{\partial \hat{\beta}_1} = -2 \sum_{i=1}^n (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \] \[ \frac{\partial SSR}{\partial \hat{\beta}_2} = -2 \sum_{i=1}^n x_i (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \]

Solving these equations yields the least squares estimators:

\[ \hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x} \] \[ \hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

Here: - \(\bar{y}\) is the mean of \(y\), - \(\bar{x}\) is the mean of \(x\), - \(\hat{\beta}_1\) is the intercept, - \(\hat{\beta}_2\) is the slope.


2. Refresher on Residuals and \(R^2\) (10 minutes)

  • Objective: Recap the concepts of residuals and \(R^2\) and their interpretations.
  • Key Points:
    • Residuals (\(e_i\)): \[ e_i = y_i - \hat{y}_i = y_i - (\alpha + \hat{\beta}_2 x_i) \]
      • What is it?: The difference between the actual value (\(y_i\)) and the predicted value (\(\hat{y}_i\)).
      • Why is it important?: Residuals help us measure the error in our predictions. Smaller residuals indicate a better-fitting model.
    • \(R^2\) (Coefficient of Determination): \[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} = 1 - \frac{\text{Unexplained Variation}}{\text{Total Variation}}=1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]
Step 1: Define Total Variation

Total variation is the sum of squared deviations of \(y_i\) from the mean \(\bar{y}\):

\[ \text{Total Variation} = \sum_{i=1}^n (y_i - \bar{y})^2 \]

Step 2: Define Unexplained Variation

Unexplained variation is the sum of squared residuals (\(e_i\)):

\[ \text{Unexplained Variation} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \]

  • What is it?: \(R^2\) measures the proportion of variation in \(y\) that is explained by \(x\).
  • Why is it important?: A high \(R^2\) (close to 1) indicates that the model explains a large portion of the variation in \(y\). A low \(R^2\) (close to 0) indicates that the model does not explain much of the variation.

3. Hands-On Calculation of Residuals and \(R^2\) (15 minutes)

  • Objective: Calculate residuals and \(R^2\) manually using Google Sheets.
  • Exercise: Use the Advertising-Sales dataset.
    • Step 1: We calculated sum of residuals \(e_i^2\) in the previous session. Therefore, compute and \((y_i - \bar{y})^2\) for each observation.
      • Instruction: Square the residuals and deviations from the mean.
    • Step 2: Sum \((y_i - \bar{y})^2\) to get \(\sum (y_i - \bar{y})^2\).
      • Instruction: Use the SUM function in Google Sheets.
      • Why?: These sums are used to calculate \(R^2\).
    • Step 3: Calculate \(R^2\) using the formula. \[ R^2 = 1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]
      • Instruction: Divide the sums to compute \(R^2\).
      • Why?: \(R^2\) tells us how well the model explains the variation in \(y\). For example, if \(R^2 = 0.85\), it means that 85% of the variation in sales is explained by advertising spending.
    • Discussion: What does \(R^2\) tell us about the model’s explanatory power? How well does advertising spending explain sales? Are there other factors that might influence sales?

4. Estimate of Error Variance (\(s^2\)) and Why We Use \(s\) (15 minutes)

  • Objective: Calculate the estimate of error variance (\(s^2\)) and explain why we use \(s\) (the standard error of the regression) instead of the variance of the regression error term.
  • Key Points:
    • Estimate of Error Variance (\(s^2\)): \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
      • What is it?: \(s^2\) estimates the variance of the error term (\(\varepsilon_i\)) in the regression model.
      • Why is it important?: It measures the variability of the residuals, which helps us assess the accuracy of the regression model.
    • Why We Use \(s\) Instead of the Variance of the Regression Error Term:
      • The true variance of the regression error term (\(\sigma^2\)) is unknown in practice. We estimate it using \(s^2\), which is based on the residuals (\(e_i\)).
      • \(s\) (the standard error of the regression) is the square root of \(s^2\): \[ s = \sqrt{s^2} \]
      • \(s\) is used in hypothesis testing, confidence intervals, and prediction intervals because it provides a measure of the spread of the residuals around the regression line.
      • Using \(s\) instead of the true variance (\(\sigma^2\)) accounts for the fact that we are working with sample data and need to estimate the variability of the errors.
    • Why \(n-2\)?:
      • \(n-2\) is the degrees of freedom, where \(n\) is the number of observations and 2 is the number of parameters estimated (\(\hat{\beta}_1\) and \(\hat{\beta}_2\)).
      • Why is it important?: Dividing by \(n-2\) instead of \(n\) gives us an unbiased estimate of the error variance.
  • Exercise: Use the Advertising-Sales dataset.
    • Step 1: Compute \(s^2\) using the formula. \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
      • Instruction: Divide the sum of squared residuals by \(n-2\).
      • Why?: This gives us an unbiased estimate of the error variance.
    • Step 2: Compute \(s\) (the standard error of the regression).
      • Instruction: Take the square root of \(s^2\).
      • Why?: \(s\) is used in hypothesis testing.
    • Discussion: Why do we use \(s\) instead of the true variance (\(\sigma^2\))? How does \(s\) help us understand the uncertainty in our regression model?

Session 2: Variance of \(\hat{\beta}_2\), Hypothesis Testing, and Confidence Interval

2.1 Variance of \(\hat{\beta}_2\)

  • Objective: Calculate the variance of the slope coefficient (\(\sigma_{\hat{\beta}_2}^2\)).
  • Exercise: Use the Advertising-Sales dataset.
    • Step 1: Compute \((x_i - \bar{x})^2\) for each observation.
      • Instruction: Square the deviations from the mean.
      • Why?: These calculations are part of the formula for the variance of the slope coefficient.
    • Step 2: Sum \((x_i - \bar{x})^2\) to get \(\sum (x_i - \bar{x})^2\).
      • Instruction: Use the SUM function in Google Sheets.
      • Why?: This sum is used to calculate the variance of the slope coefficient.
    • Step 3: Calculate \(\sigma_{\hat{\beta}_2}^2\) using the formula. \[ \sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2} \]
      • Instruction: Divide \(s^2\) by the sum of squared deviations.
      • Why?: \(\sigma_{\hat{\beta}_2}^2\) tells us the variability of the slope coefficient, which helps us assess the precision of our estimate.

Interpretation of the Formula

1. Numerator: Estimate of Error Variance (\(s^2\))

  • What it represents:
    \(s^2\) is the estimated variance of the regression errors (\(\epsilon_i\)). It measures how much the actual data points (\(y_i\)) deviate from the predicted values (\(\hat{y}_i\)) on average.
  • Why it matters:
    • A larger \(s^2\) means the model has higher prediction error (residuals are large), leading to greater uncertainty in \(\hat{\beta}_2\).
    • A smaller \(s^2\) implies the regression line fits the data tightly, reducing uncertainty in the slope estimate.

2. Denominator: Sum of Squared Deviations of \(x\) (\(\sum (x_i - \bar{x})^2\))

  • What it represents:
    This term measures the spread/variability of the independent variable (\(x\)) around its mean (\(\bar{x}\)).
  • Why it matters:
    • A larger denominator (more spread in \(x\)) means the slope estimate \(\hat{\beta}_2\) is more precise (lower variance). Intuitively, if \(x\) varies widely, it’s easier to detect its relationship with \(y\).
    • A smaller denominator (less spread in \(x\)) makes \(\hat{\beta}_2\) less precise (higher variance). If all \(x_i\) are close to \(\bar{x}\), small changes in \(y\) could drastically alter the slope.

2.2 Hypothesis Testing for the Slope Parameter \(\hat{\beta}_2\)

Objective: Test whether advertising spending has a statistically significant effect on sales.
Rationale: Even if \(\hat{\beta}_2 = 3\) (each $1,000 on advertising increases sales by $3,000), this could be due to random sampling error. Hypothesis testing tells us if the true population slope \(\beta_2\) is different from zero.

  • Step 0: State the Null and Alternative Hypotheses

  • Null hypothesis (\(H_0\)): \(\beta_2 = 0\)
    Meaning: There is no linear relationship between advertising and sales. Any observed slope is purely by chance.

  • Alternative hypothesis (\(H_1\)): \(\beta_2 \neq 0\) (two‑tailed test)
    Meaning: There is a linear relationship (positive or negative).
    Why two‑tailed? We don’t assume direction – advertising could theoretically hurt sales (negative slope) or help (positive slope).

Real‑world interpretation:
- \(H_0\): “Our advertising campaign has no impact on sales.”
- \(H_1\): “Advertising does affect sales.”

  • Step 1: Compute the Standard Error of the Slope (\(s_{\hat{\beta}_2}\))

From the variance formula:

\[ \sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2} \] \[ s_{\hat{\beta}_2} = \sqrt{\sigma_{\hat{\beta}_2}^2} \]

  • \(s_{\hat{\beta}_2}\) measures the typical sampling variability of \(\hat{\beta}_2\).

  • A smaller \(s_{\hat{\beta}_2}\) means the estimate is more precise.

  • Step 2: Calculate the t‑Statistic

\[ t_{\hat{\beta}_2} = \frac{\hat{\beta}_2 - 0}{s_{\hat{\beta}_2}} = \frac{\hat{\beta}_2}{s_{\hat{\beta}_2}} \]

  • This tells us how many standard errors \(\hat{\beta}_2\) is away from the null hypothesis value (zero).

  • Large absolute t‑statistic → evidence against \(H_0\).

  • Step 3: Compare to Critical Value or Use Rule‑of‑Thumb

Degrees of freedom (df) = \(n - 2\) (we estimated two parameters: intercept and slope).

Option A: Exact critical value (small samples)

  • For a 95% confidence level (\(\alpha = 0.05\)) and two‑tailed test, look up \(t_{\alpha/2, df}\) in a t‑table.
  • Reject \(H_0\) if \(|t_{\hat{\beta}_2}| > t_{\alpha/2, df}\).

Option B: Rule‑of‑thumb for large \(n\) (e.g., \(n > 30\))

  • Reject \(H_0\) if \(|t_{\hat{\beta}_2}| > 2\) (approximately 95% confidence).
  • If \(|t_{\hat{\beta}_2}| > 3\) → very strong evidence.

2.3 Construct a 95% Confidence Interval for \(\beta_2\)

  • Step 4: Calculate the 95% confidence interval for the slope coefficient

\[ \hat{\beta}_2 \pm t_{\alpha/2, df} \cdot s_{\hat{\beta}_2} \]

  • Interpretation: We are 95% confident that the true slope lies within this interval.
  • If the interval does not contain 0, we reject \(H_0\) at the 5% significance level (consistent with the t‑test).

Example with Advertising‑Sales Data (Assume \(n = 25\))

Given: - \(\hat{\beta}_2 = 3.0\) - \(s_{\hat{\beta}_2} = 0.5\) - \(df = 23\)

t‑statistic:
\[ t = \frac{3.0}{0.5} = 6.0 \]

Critical value (from t‑table, \(\alpha=0.05\), two‑tailed, df=23): about 2.069.

Since \(6.0 > 2.069\), we reject \(H_0\).

Conclusion: There is strong statistical evidence that advertising spending affects sales (the true slope is not zero).

95% Confidence Interval:
\[ 3.0 \pm 2.069 \times 0.5 = 3.0 \pm 1.0345 \quad \Rightarrow \quad (1.9655,\; 4.0345) \]
We are 95% confident that each additional $1,000 on advertising increases sales by between $1,965 and $4,035.


Discussion Questions

  1. What if the t‑statistic were 1.5 with \(n=25\)?
    → Fail to reject \(H_0\); we cannot conclude advertising matters (the observed slope might be due to chance).

  2. Why use a two‑tailed test?
    → Because we don’t know whether advertising could backfire (negative slope). A one‑tailed test would only look for a positive effect, which is less conservative.

  3. How does the sample size affect the test?

    • Larger \(n\) → more degrees of freedom → smaller critical values → easier to reject \(H_0\).
    • Also, \(s_{\hat{\beta}_2}\) tends to decrease with \(n\) (more data → more precise estimate).
  4. Practical significance vs. statistical significance

    • A very small slope (e.g., \(\hat{\beta}_2 = 0.01\)) could be statistically significant if \(n\) is huge, but it may not be practically meaningful for business decisions. Always check the confidence interval bounds.

Summary Table of Hypothesis Testing Steps

Step Action Formula / Tool
1 State \(H_0\) and \(H_1\) \(H_0: \beta_2 = 0\); \(H_1: \beta_2 \neq 0\)
2 Choose significance level (\(\alpha\)) Usually 0.05 (95% confidence)
3 Compute \(s_{\hat{\beta}_2}\) \(s_{\hat{\beta}_2} = \sqrt{ s^2 / \sum (x_i - \bar{x})^2 }\)
4 Calculate t‑statistic \(t = \hat{\beta}_2 / s_{\hat{\beta}_2}\)
5 Find critical value \(t_{\alpha/2, n-2}\) t‑table or rule‑of‑thumb (\(\approx 2\))
6 Compare Reject \(H_0\) if \(|t| > t_{\text{crit}}\)
7 (Optional) Build confidence interval \(\hat{\beta}_2 \pm t_{\text{crit}} \cdot s_{\hat{\beta}_2}\)
8 Interpret in context “Advertising has a significant positive effect on sales.”

Project Reminder: For your dataset, apply these same hypothesis testing steps.
- Write the null and alternative hypotheses in words and symbols.
- Compute the t‑statistic and state whether you reject \(H_0\) at \(\alpha = 0.05\).
- Provide a business interpretation.