Teaching Notes for Applied Econometrics and Economic Modelling Lab Session

Session Overview

Duration: 2 sessions of 45 minutes each
Tools: Google Sheets (for manual calculations)
Objective: Students will manually calculate key regression estimators, including R-squared ($R^2$), estimate of error variance, prediction interval for $y$, variance for the slope coefficient, and test for the slope parameter. This will help them understand the intuition behind regression analysis, interpret the results, and apply the model to real-world scenarios.
Focus: Hands-on calculations, intuitive explanations, and real-world interpretations using the Advertising-Sales dataset.

Session 1: Calculating Residuals, $R^2$, and Error Variance (45 minutes)

1. Refresher on Parameter Selection by Least Squares Method

The goal of the least squares method is to select the parameters $\hat{\beta}_1$ and $\hat{\beta}_2$ such that the sum of squared residuals (SSR) is minimized. The residuals ($e_i$) are defined as:

\[ e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \]

The sum of squared residuals (SSR) is:

\[ SSR = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \right)^2 \]

To minimize $SSR$, we take the partial derivatives of $SSR$ with respect to $\hat{\beta}_1$ and $\hat{\beta}_2$, set them to zero, and solve for $\hat{\beta}_1$ and $\hat{\beta}_2$. This gives us the normal equations:

\[ \frac{\partial SSR}{\partial \hat{\beta}_1} = -2 \sum_{i=1}^n (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \] \[ \frac{\partial SSR}{\partial \hat{\beta}_2} = -2 \sum_{i=1}^n x_i (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \]

Solving these equations yields the least squares estimators:

\[ \hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x} \] \[ \hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

Here: - $\bar{y}$ is the mean of $y$, - $\bar{x}$ is the mean of $x$, - $\hat{\beta}_1$ is the intercept, - $\hat{\beta}_2$ is the slope.

2. Refresher on Residuals and $R^2$ (10 minutes)

Objective: Recap the concepts of residuals and $R^2$ and their interpretations.
Key Points:
- Residuals ($e_i$): \[ e_i = y_i - \hat{y}_i = y_i - (\alpha + \hat{\beta}_2 x_i) \]
  - What is it?: The difference between the actual value ($y_i$) and the predicted value ($\hat{y}_i$).
  - Why is it important?: Residuals help us measure the error in our predictions. Smaller residuals indicate a better-fitting model.
- $R^2$ (Coefficient of Determination): \[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} = 1 - \frac{\text{Unexplained Variation}}{\text{Total Variation}}=1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]

Step 1: Define Total Variation

Total variation is the sum of squared deviations of $y_i$ from the mean $\bar{y}$:

\[ \text{Total Variation} = \sum_{i=1}^n (y_i - \bar{y})^2 \]

Step 2: Define Unexplained Variation

Unexplained variation is the sum of squared residuals ($e_i$):

\[ \text{Unexplained Variation} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \]

What is it?: $R^2$ measures the proportion of variation in $y$ that is explained by $x$.
Why is it important?: A high $R^2$ (close to 1) indicates that the model explains a large portion of the variation in $y$. A low $R^2$ (close to 0) indicates that the model does not explain much of the variation.

3. Hands-On Calculation of Residuals and $R^2$ (15 minutes)

Objective: Calculate residuals and $R^2$ manually using Google Sheets.
Exercise: Use the Advertising-Sales dataset.
- Step 1: We calculated sum of residuals $e_i^2$ in the previous session. Therefore, compute and $(y_i - \bar{y})^2$ for each observation.
  - Instruction: Square the residuals and deviations from the mean.
- Step 2: Sum $(y_i - \bar{y})^2$ to get $\sum (y_i - \bar{y})^2$.
  - Instruction: Use the SUM function in Google Sheets.
  - Why?: These sums are used to calculate $R^2$.
- Step 3: Calculate $R^2$ using the formula. \[ R^2 = 1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]
  - Instruction: Divide the sums to compute $R^2$.
  - Why?: $R^2$ tells us how well the model explains the variation in $y$. For example, if $R^2 = 0.85$, it means that 85% of the variation in sales is explained by advertising spending.
- Discussion: What does $R^2$ tell us about the model’s explanatory power? How well does advertising spending explain sales? Are there other factors that might influence sales?

4. Estimate of Error Variance ($s^2$) and Why We Use $s$ (15 minutes)

Objective: Calculate the estimate of error variance ($s^2$) and explain why we use $s$ (the standard error of the regression) instead of the variance of the regression error term.
Key Points:
- Estimate of Error Variance ($s^2$): \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
  - What is it?: $s^2$ estimates the variance of the error term ($\varepsilon_i$) in the regression model.
  - Why is it important?: It measures the variability of the residuals, which helps us assess the accuracy of the regression model.
- Why We Use $s$ Instead of the Variance of the Regression Error Term:
  - The true variance of the regression error term ($\sigma^2$) is unknown in practice. We estimate it using $s^2$, which is based on the residuals ($e_i$).
  - $s$ (the standard error of the regression) is the square root of $s^2$: \[ s = \sqrt{s^2} \]
  - $s$ is used in hypothesis testing, confidence intervals, and prediction intervals because it provides a measure of the spread of the residuals around the regression line.
  - Using $s$ instead of the true variance ($\sigma^2$) accounts for the fact that we are working with sample data and need to estimate the variability of the errors.
- Why $n-2$?:
  - $n-2$ is the degrees of freedom, where $n$ is the number of observations and 2 is the number of parameters estimated ($\hat{\beta}_1$ and $\hat{\beta}_2$).
  - Why is it important?: Dividing by $n-2$ instead of $n$ gives us an unbiased estimate of the error variance.
Exercise: Use the Advertising-Sales dataset.
- Step 1: Compute $s^2$ using the formula. \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
  - Instruction: Divide the sum of squared residuals by $n-2$.
  - Why?: This gives us an unbiased estimate of the error variance.
- Step 2: Compute $s$ (the standard error of the regression).
  - Instruction: Take the square root of $s^2$.
  - Why?: $s$ is used in hypothesis testing and prediction intervals.
- Discussion: Why do we use $s$ instead of the true variance ($\sigma^2$)? How does $s$ help us understand the uncertainty in our regression model?

Session 2: Prediction Interval, Variance of $\hat{\beta}_2$ and Hypothesis Testing

2.1 Prediction Interval for $y$ (10 minutes)

Objective: Calculate the prediction interval for $y$ manually using Google Sheets.
Exercise: Use the Advertising-Sales dataset.
- Step 1: Predict sales for a week with $7,000 and $8,000 spent on advertising.
  - Instruction: Use the regression equation $\hat{y} = \hat{\beta}_1 + \hat{\beta}_2 x$ to make the predictions.
  - Why?: Predictions help us understand how the model can be applied in real-world scenarios.
- Step 2: Calculate the prediction interval for $y$.
  - Instruction: Use the formula: \[ \text{Prediction Interval} = (\hat{y}_0 - ks, \hat{y}_0 + ks) \] where $k = 2$ for an approximate 95% prediction interval.
  - Why?: The prediction interval gives us a range of values within which we expect the actual value of $y$ to fall.
- Discussion: What does the prediction interval tell us about the uncertainty in our predictions? How can we use this interval in decision-making?

2.2 Variance of $\hat{\beta}_2$

Objective: Calculate the variance of the slope coefficient ($\sigma_{\hat{\beta}_2}^2$).
Exercise: Use the Advertising-Sales dataset.
- Step 1: Compute $(x_i - \bar{x})^2$ for each observation.
  - Instruction: Square the deviations from the mean.
  - Why?: These calculations are part of the formula for the variance of the slope coefficient.
- Step 2: Sum $(x_i - \bar{x})^2$ to get $\sum (x_i - \bar{x})^2$.
  - Instruction: Use the SUM function in Google Sheets.
  - Why?: This sum is used to calculate the variance of the slope coefficient.
- Step 3: Calculate $\sigma_{\hat{\beta}_2}^2$ using the formula. \[ \sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2} \]
  - Instruction: Divide $s^2$ by the sum of squared deviations.
  - Why?: $\sigma_{\hat{\beta}_2}^2$ tells us the variability of the slope coefficient, which helps us assess the precision of our estimate.

Interpretation of the Formula

1. Numerator: Estimate of Error Variance ($s^2$)

What it represents:
$s^2$ is the estimated variance of the regression errors ($\epsilon_i$). It measures how much the actual data points ($y_i$) deviate from the predicted values ($\hat{y}_i$) on average.
Why it matters:
- A larger $s^2$ means the model has higher prediction error (residuals are large), leading to greater uncertainty in $\hat{\beta}_2$.
- A smaller $s^2$ implies the regression line fits the data tightly, reducing uncertainty in the slope estimate.

2. Denominator: Sum of Squared Deviations of $x$ ($\sum (x_i - \bar{x})^2$)

What it represents:
This term measures the spread/variability of the independent variable ($x$) around its mean ($\bar{x}$).
Why it matters:
- A larger denominator (more spread in $x$) means the slope estimate $\hat{\beta}_2$ is more precise (lower variance). Intuitively, if $x$ varies widely, it’s easier to detect its relationship with $y$.
- A smaller denominator (less spread in $x$) makes $\hat{\beta}_2$ less precise (higher variance). If all $x_i$ are close to $\bar{x}$, small changes in $y$ could drastically alter the slope.

2.3 Hypothesis Testing for $\hat{\beta}_2$ (15 minutes)

Objective: Perform a hypothesis test for the slope parameter ($\hat{\beta}_2$) manually using Google Sheets.
Exercise: Use the Advertising-Sales dataset.
- Step 1: Compute the standard error of the slope coefficient ($s_{\hat{\beta}_2}$).
  - Instruction: Take the square root of $\sigma_{\hat{\beta}_2}^2$.
  - Why?: $s_{\hat{\beta}_2}$ measures the standard deviation of the slope coefficient, which is used in hypothesis testing.
- Step 2: Calculate the t-statistic for the slope coefficient.
  - Instruction: Use the formula: \[ t_{\hat{\beta}_2} = \frac{\hat{\beta}_2}{s_{\hat{\beta}_2}} \]
  - Why?: The t-statistic helps us test whether the slope coefficient is significantly different from zero.
- Step 3: Compare the t-statistic to the critical value.
  - Instruction: Use a t-distribution table to find the critical value for a 95% confidence level with $n-2$ degrees of freedom.
  - Why?: If the t-statistic exceeds the critical value, we reject the null hypothesis ($H_0: \beta = 0$).
  - Rule-of-thumb for large $n$ : reject $H_0$ if $t_{\hat{\beta}_2} < -2$ or $t_{\hat{\beta}_2} > 2$.
- Discussion: What does the t-statistic tell us about the significance of the slope coefficient? How does this affect our interpretation of the regression model?

2.4 Confidence Interval for$\hat{\beta}_2$

Step 4: Calculate the 95% confidence interval for the slope coefficient manually using Google Sheets.
- Instruction: Use the formula: \[ \text{95% Confidence Interval} = \hat{\beta}_2 \pm t_{\alpha/2, n-2} \cdot s_{\hat{\beta}_2} \] where $t_{\alpha/2, n-2}$ is the critical value from the t-distribution table.
- Why?: The confidence interval gives us a range of values within which we expect the true slope coefficient to lie.
Discussion: What does $\sigma_{\hat{\beta}_2}^2$ tell us about the precision of the slope coefficient? How does it affect our confidence in the regression model?

Why This Matters

Policy Decisions: If $\hat{\beta}_2$ measures the effect of advertising on sales, a smaller $\text{Var}(\hat{\beta}_2)$ means we can be more confident in allocating budgets.
Model Reliability: High variance in $\hat{\beta}_2$ suggests the estimated relationship is sensitive to small changes in data (e.g., outliers).

Homework/Follow-Up

Assignment: Students should manually calculate $s^2$, the prediction interval for $y$, $\sigma_{\hat{\beta}_2}^2$, and the t-statistic for Income-Consumption dataset (provided) and submit their calculations.

Teaching Notes for Applied Econometrics and Economic Modelling Lab Session - Day 2

2025-03-24

Session Overview

Session 1: Calculating Residuals, \(R^2\), and Error Variance (45 minutes)

1. Refresher on Parameter Selection by Least Squares Method

2. Refresher on Residuals and \(R^2\) (10 minutes)

Step 1: Define Total Variation

Step 2: Define Unexplained Variation

3. Hands-On Calculation of Residuals and \(R^2\) (15 minutes)

4. Estimate of Error Variance (\(s^2\)) and Why We Use \(s\) (15 minutes)

Session 2: Prediction Interval, Variance of \(\hat{\beta}_2\) and Hypothesis Testing

2.1 Prediction Interval for \(y\) (10 minutes)

2.2 Variance of \(\hat{\beta}_2\)

Interpretation of the Formula

1. Numerator: Estimate of Error Variance (\(s^2\))

2. Denominator: Sum of Squared Deviations of \(x\) (\(\sum (x_i - \bar{x})^2\))

2.3 Hypothesis Testing for \(\hat{\beta}_2\) (15 minutes)

2.4 Confidence Interval for\(\hat{\beta}_2\)

Why This Matters

Homework/Follow-Up