Teaching Notes for Applied Econometrics and Economic Modelling Lab Session

Session Overview

Duration: 2 sessions of 45 minutes each
Tools: Google Sheets (for manual calculations)
Objective: Students will manually calculate key regression estimators, including R-squared ($R^2$), estimate of error variance, prediction interval for $y$, variance for the slope coefficient, and test for the slope parameter. This will help them understand the intuition behind regression analysis, interpret the results, and apply the model to real-world scenarios.
Focus: Hands-on calculations, intuitive explanations, and real-world interpretations using the Advertising-Sales dataset.

Session 1: Calculating Residuals, $R^2$, and Error Variance (45 minutes)

1. Refresher on Parameter Selection by Least Squares Method

The goal of the least squares method is to select the parameters $\hat{\beta}_1$ and $\hat{\beta}_2$ such that the sum of squared residuals (SSR) is minimized. The residuals ($e_i$) are defined as:

\[ e_i = y_i - \hat{y}_i = y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \]

The sum of squared residuals (SSR) is:

\[ SSR = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - (\hat{\beta}_1 + \hat{\beta}_2 x_i) \right)^2 \]

To minimize $SSR$, we take the partial derivatives of $SSR$ with respect to $\hat{\beta}_1$ and $\hat{\beta}_2$, set them to zero, and solve for $\hat{\beta}_1$ and $\hat{\beta}_2$. This gives us the normal equations:

\[ \frac{\partial SSR}{\partial \hat{\beta}_1} = -2 \sum_{i=1}^n (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \] \[ \frac{\partial SSR}{\partial \hat{\beta}_2} = -2 \sum_{i=1}^n x_i (y_i - \hat{\beta}_1 - \hat{\beta}_2 x_i) = 0 \]

Solving these equations yields the least squares estimators:

\[ \hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x} \] \[ \hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

Here: - $\bar{y}$ is the mean of $y$, - $\bar{x}$ is the mean of $x$, - $\hat{\beta}_1$ is the intercept, - $\hat{\beta}_2$ is the slope.

2. Refresher on Residuals and $R^2$ (10 minutes)

Objective: Recap the concepts of residuals and $R^2$ and their interpretations.
Key Points:
- Residuals ($e_i$): \[ e_i = y_i - \hat{y}_i = y_i - (\alpha + \hat{\beta}_2 x_i) \]
  - What is it?: The difference between the actual value ($y_i$) and the predicted value ($\hat{y}_i$).
  - Why is it important?: Residuals help us measure the error in our predictions. Smaller residuals indicate a better-fitting model.
- $R^2$ (Coefficient of Determination): \[ R^2 = \frac{\text{Explained Variation}}{\text{Total Variation}} = 1 - \frac{\text{Unexplained Variation}}{\text{Total Variation}}=1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]

Step 1: Define Total Variation

Total variation is the sum of squared deviations of $y_i$ from the mean $\bar{y}$:

\[ \text{Total Variation} = \sum_{i=1}^n (y_i - \bar{y})^2 \]

Step 2: Define Unexplained Variation

Unexplained variation is the sum of squared residuals ($e_i$):

\[ \text{Unexplained Variation} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n \left( y_i - \hat{y}_i \right)^2 \]

What is it?: $R^2$ measures the proportion of variation in $y$ that is explained by $x$.
Why is it important?: A high $R^2$ (close to 1) indicates that the model explains a large portion of the variation in $y$. A low $R^2$ (close to 0) indicates that the model does not explain much of the variation.

3. Hands-On Calculation of Residuals and $R^2$ (15 minutes)

Objective: Calculate residuals and $R^2$ manually using Google Sheets.
Exercise: Use the Advertising-Sales dataset.
- Step 1: We calculated sum of residuals $e_i^2$ in the previous session. Therefore, compute and $(y_i - \bar{y})^2$ for each observation.
  - Instruction: Square the residuals and deviations from the mean.
- Step 2: Sum $(y_i - \bar{y})^2$ to get $\sum (y_i - \bar{y})^2$.
  - Instruction: Use the SUM function in Google Sheets.
  - Why?: These sums are used to calculate $R^2$.
- Step 3: Calculate $R^2$ using the formula. \[ R^2 = 1 - \frac{\sum e_i^2}{\sum (y_i - \bar{y})^2} \]
  - Instruction: Divide the sums to compute $R^2$.
  - Why?: $R^2$ tells us how well the model explains the variation in $y$. For example, if $R^2 = 0.85$, it means that 85% of the variation in sales is explained by advertising spending.
- Discussion: What does $R^2$ tell us about the model’s explanatory power? How well does advertising spending explain sales? Are there other factors that might influence sales?

4. Estimate of Error Variance ($s^2$) and Why We Use $s$ (15 minutes)

Objective: Calculate the estimate of error variance ($s^2$) and explain why we use $s$ (the standard error of the regression) instead of the variance of the regression error term.
Key Points:
- Estimate of Error Variance ($s^2$): \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
  - What is it?: $s^2$ estimates the variance of the error term ($\varepsilon_i$) in the regression model.
  - Why is it important?: It measures the variability of the residuals, which helps us assess the accuracy of the regression model.
- Why We Use $s$ Instead of the Variance of the Regression Error Term:
  - The true variance of the regression error term ($\sigma^2$) is unknown in practice. We estimate it using $s^2$, which is based on the residuals ($e_i$).
  - $s$ (the standard error of the regression) is the square root of $s^2$: \[ s = \sqrt{s^2} \]
  - $s$ is used in hypothesis testing, confidence intervals, and prediction intervals because it provides a measure of the spread of the residuals around the regression line.
  - Using $s$ instead of the true variance ($\sigma^2$) accounts for the fact that we are working with sample data and need to estimate the variability of the errors.
- Why $n-2$?:
  - $n-2$ is the degrees of freedom, where $n$ is the number of observations and 2 is the number of parameters estimated ($\hat{\beta}_1$ and $\hat{\beta}_2$).
  - Why is it important?: Dividing by $n-2$ instead of $n$ gives us an unbiased estimate of the error variance.
Exercise: Use the Advertising-Sales dataset.
- Step 1: Compute $s^2$ using the formula. \[ s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2 \]
  - Instruction: Divide the sum of squared residuals by $n-2$.
  - Why?: This gives us an unbiased estimate of the error variance.
- Step 2: Compute $s$ (the standard error of the regression).
  - Instruction: Take the square root of $s^2$.
  - Why?: $s$ is used in hypothesis testing.
- Discussion: Why do we use $s$ instead of the true variance ($\sigma^2$)? How does $s$ help us understand the uncertainty in our regression model?

Session 2: Variance of $\hat{\beta}_2$, Hypothesis Testing, and Confidence Interval

2.1 Variance of $\hat{\beta}_2$

Objective: Calculate the variance of the slope coefficient ($\sigma_{\hat{\beta}_2}^2$).
Exercise: Use the Advertising-Sales dataset.
- Step 1: Compute $(x_i - \bar{x})^2$ for each observation.
  - Instruction: Square the deviations from the mean.
  - Why?: These calculations are part of the formula for the variance of the slope coefficient.
- Step 2: Sum $(x_i - \bar{x})^2$ to get $\sum (x_i - \bar{x})^2$.
  - Instruction: Use the SUM function in Google Sheets.
  - Why?: This sum is used to calculate the variance of the slope coefficient.
- Step 3: Calculate $\sigma_{\hat{\beta}_2}^2$ using the formula. \[ \sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2} \]
  - Instruction: Divide $s^2$ by the sum of squared deviations.
  - Why?: $\sigma_{\hat{\beta}_2}^2$ tells us the variability of the slope coefficient, which helps us assess the precision of our estimate.

Interpretation of the Formula

1. Numerator: Estimate of Error Variance ($s^2$)

What it represents:
$s^2$ is the estimated variance of the regression errors ($\epsilon_i$). It measures how much the actual data points ($y_i$) deviate from the predicted values ($\hat{y}_i$) on average.
Why it matters:
- A larger $s^2$ means the model has higher prediction error (residuals are large), leading to greater uncertainty in $\hat{\beta}_2$.
- A smaller $s^2$ implies the regression line fits the data tightly, reducing uncertainty in the slope estimate.

2. Denominator: Sum of Squared Deviations of $x$ ($\sum (x_i - \bar{x})^2$)

What it represents:
This term measures the spread/variability of the independent variable ($x$) around its mean ($\bar{x}$).
Why it matters:
- A larger denominator (more spread in $x$) means the slope estimate $\hat{\beta}_2$ is more precise (lower variance). Intuitively, if $x$ varies widely, it’s easier to detect its relationship with $y$.
- A smaller denominator (less spread in $x$) makes $\hat{\beta}_2$ less precise (higher variance). If all $x_i$ are close to $\bar{x}$, small changes in $y$ could drastically alter the slope.

2.2 Hypothesis Testing for the Slope Parameter $\hat{\beta}_2$

Objective: Test whether advertising spending has a statistically significant effect on sales.
Rationale: Even if $\hat{\beta}_2 = 3$ (each $1,000 on advertising increases sales by $3,000), this could be due to random sampling error. Hypothesis testing tells us if the true population slope $\beta_2$ is different from zero.

Step 0: State the Null and Alternative Hypotheses
Null hypothesis ($H_0$): $\beta_2 = 0$
Meaning: There is no linear relationship between advertising and sales. Any observed slope is purely by chance.
Alternative hypothesis ($H_1$): $\beta_2 \neq 0$ (two‑tailed test)
Meaning: There is a linear relationship (positive or negative).
Why two‑tailed? We don’t assume direction – advertising could theoretically hurt sales (negative slope) or help (positive slope).

Real‑world interpretation:
- $H_0$: “Our advertising campaign has no impact on sales.”
- $H_1$: “Advertising does affect sales.”

Step 1: Compute the Standard Error of the Slope ($s_{\hat{\beta}_2}$)

From the variance formula:

\[ \sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2} \] \[ s_{\hat{\beta}_2} = \sqrt{\sigma_{\hat{\beta}_2}^2} \]

$s_{\hat{\beta}_2}$ measures the typical sampling variability of $\hat{\beta}_2$.
A smaller $s_{\hat{\beta}_2}$ means the estimate is more precise.
Step 2: Calculate the t‑Statistic

\[ t_{\hat{\beta}_2} = \frac{\hat{\beta}_2 - 0}{s_{\hat{\beta}_2}} = \frac{\hat{\beta}_2}{s_{\hat{\beta}_2}} \]

This tells us how many standard errors $\hat{\beta}_2$ is away from the null hypothesis value (zero).
Large absolute t‑statistic → evidence against $H_0$.
Step 3: Compare to Critical Value or Use Rule‑of‑Thumb

Degrees of freedom (df) = $n - 2$ (we estimated two parameters: intercept and slope).

Option A: Exact critical value (small samples)

For a 95% confidence level ($\alpha = 0.05$) and two‑tailed test, look up $t_{\alpha/2, df}$ in a t‑table.
Reject $H_0$ if $|t_{\hat{\beta}_2}| > t_{\alpha/2, df}$.

Option B: Rule‑of‑thumb for large $n$ (e.g., $n > 30$)

Reject $H_0$ if $|t_{\hat{\beta}_2}| > 2$ (approximately 95% confidence).
If $|t_{\hat{\beta}_2}| > 3$ → very strong evidence.

2.3 Construct a 95% Confidence Interval for $\beta_2$

Step 4: Calculate the 95% confidence interval for the slope coefficient

\[ \hat{\beta}_2 \pm t_{\alpha/2, df} \cdot s_{\hat{\beta}_2} \]

Interpretation: We are 95% confident that the true slope lies within this interval.
If the interval does not contain 0, we reject $H_0$ at the 5% significance level (consistent with the t‑test).

Example with Advertising‑Sales Data (Assume $n = 25$)

Given: - $\hat{\beta}_2 = 3.0$ - $s_{\hat{\beta}_2} = 0.5$ - $df = 23$

t‑statistic:
\[ t = \frac{3.0}{0.5} = 6.0 \]

Critical value (from t‑table, $\alpha=0.05$, two‑tailed, df=23): about 2.069.

Since $6.0 > 2.069$, we reject $H_0$.

Conclusion: There is strong statistical evidence that advertising spending affects sales (the true slope is not zero).

95% Confidence Interval:
\[ 3.0 \pm 2.069 \times 0.5 = 3.0 \pm 1.0345 \quad \Rightarrow \quad (1.9655,\; 4.0345) \]
We are 95% confident that each additional $1,000 on advertising increases sales by between $1,965 and $4,035.

Discussion Questions

What if the t‑statistic were 1.5 with $n=25$?
→ Fail to reject $H_0$; we cannot conclude advertising matters (the observed slope might be due to chance).
Why use a two‑tailed test?
→ Because we don’t know whether advertising could backfire (negative slope). A one‑tailed test would only look for a positive effect, which is less conservative.
How does the sample size affect the test?
- Larger $n$ → more degrees of freedom → smaller critical values → easier to reject $H_0$.
- Also, $s_{\hat{\beta}_2}$ tends to decrease with $n$ (more data → more precise estimate).
Practical significance vs. statistical significance
- A very small slope (e.g., $\hat{\beta}_2 = 0.01$) could be statistically significant if $n$ is huge, but it may not be practically meaningful for business decisions. Always check the confidence interval bounds.

Summary Table of Hypothesis Testing Steps

Step	Action	Formula / Tool
1	State $H_0$ and $H_1$	$H_0: \beta_2 = 0$; $H_1: \beta_2 \neq 0$
2	Choose significance level ($\alpha$)	Usually 0.05 (95% confidence)
3	Compute $s_{\hat{\beta}_2}$	$s_{\hat{\beta}_2} = \sqrt{ s^2 / \sum (x_i - \bar{x})^2 }$
4	Calculate t‑statistic	$t = \hat{\beta}_2 / s_{\hat{\beta}_2}$
5	Find critical value $t_{\alpha/2, n-2}$	t‑table or rule‑of‑thumb ($\approx 2$)
6	Compare	Reject $H_0$ if $\|t\| > t_{\text{crit}}$
7	(Optional) Build confidence interval	$\hat{\beta}_2 \pm t_{\text{crit}} \cdot s_{\hat{\beta}_2}$
8	Interpret in context	“Advertising has a significant positive effect on sales.”

Project Reminder: For your dataset, apply these same hypothesis testing steps.
- Write the null and alternative hypotheses in words and symbols.
- Compute the t‑statistic and state whether you reject $H_0$ at $\alpha = 0.05$.
- Provide a business interpretation.

Step	Action	Formula / Tool
1	State \(H_0\) and \(H_1\)	\(H_0: \beta_2 = 0\); \(H_1: \beta_2 \neq 0\)
2	Choose significance level (\(\alpha\))	Usually 0.05 (95% confidence)
3	Compute \(s_{\hat{\beta}_2}\)	\(s_{\hat{\beta}_2} = \sqrt{ s^2 / \sum (x_i - \bar{x})^2 }\)
4	Calculate t‑statistic	\(t = \hat{\beta}_2 / s_{\hat{\beta}_2}\)
5	Find critical value \(t_{\alpha/2, n-2}\)	t‑table or rule‑of‑thumb (\(\approx 2\))
6	Compare	Reject \(H_0\) if \(\|t\| > t_{\text{crit}}\)
7	(Optional) Build confidence interval	\(\hat{\beta}_2 \pm t_{\text{crit}} \cdot s_{\hat{\beta}_2}\)
8	Interpret in context	“Advertising has a significant positive effect on sales.”

Teaching Notes for Applied Econometrics and Economic Modelling Lab Session - Day 2

Gül Ertan Özgüzer

2026-04-09

Session Overview

Session 1: Calculating Residuals, \(R^2\), and Error Variance (45 minutes)

1. Refresher on Parameter Selection by Least Squares Method

2. Refresher on Residuals and \(R^2\) (10 minutes)

Step 1: Define Total Variation

Step 2: Define Unexplained Variation

3. Hands-On Calculation of Residuals and \(R^2\) (15 minutes)

4. Estimate of Error Variance (\(s^2\)) and Why We Use \(s\) (15 minutes)

Session 2: Variance of \(\hat{\beta}_2\), Hypothesis Testing, and Confidence Interval

2.1 Variance of \(\hat{\beta}_2\)

Interpretation of the Formula

1. Numerator: Estimate of Error Variance (\(s^2\))

2. Denominator: Sum of Squared Deviations of \(x\) (\(\sum (x_i - \bar{x})^2\))

2.2 Hypothesis Testing for the Slope Parameter \(\hat{\beta}_2\)

Option A: Exact critical value (small samples)

Option B: Rule‑of‑thumb for large \(n\) (e.g., \(n > 30\))

2.3 Construct a 95% Confidence Interval for \(\beta_2\)

Example with Advertising‑Sales Data (Assume \(n = 25\))

Discussion Questions

Summary Table of Hypothesis Testing Steps