Session Overview


Session 1: Hypothesis testing and confidence Intervals (45 minutes),

**1.1 Recap on the results of exercises.

  • Linear regression Model(population):

\[ \text{Sales_i} = \beta_1 + \beta_2 \times \text{Advertising_i} + \epsilon_i \]

  • \(\beta_1 + \beta_2 \times \text{Advertising_i}\) is the population regression line.

  • Create a table showing the values of \(\hat{\beta}_2\), \(s_{\hat{\beta}_2}\), and \(t-stat\).

\(\hat{\beta}_2\) 0.017
\(s_{\hat{\beta}_2}\) 0.00136
\(t\)-stat 12.4
  • \(R^2\)=0.75

  • \(s\) (the standard error of the regression)= 8.74

  • Recap the concepts of hypothesis testing.

1.2 Confidence Intervals

  • Objective: Recap the concepts of confidence intervals.
  • Key Points:
    • Confidence Interval for the Slope:
      • What is it?: A range of values within which the true slope coefficient is expected to lie with a certain level of confidence (e.g., 95%).
      • Why is it important?: It provides a measure of the uncertainty around the slope estimate.

1.3 Hands-On Calculation of Confidence Interval for the Slope

  • Objective: Calculate the confidence interval for the slope manually using Google Sheets.
  • Exercise: Use the Advertising-Sales dataset.
    • Step 1: Compute the standard error of the slope coefficient (\(s_{\hat{\beta}_2}\)).
      • Instruction: Use the formula: \[ s_{\hat{\beta}_2} = \sqrt{\frac{s^2}{\sum (x_i - \bar{x})^2}} \]
      • Why?: This measures the variability of the slope estimate.
    • Step 2: Calculate the 95% confidence interval for the slope.
      • Instruction: Use the formula: \[ \text{CI} = \hat{\beta}_2 \pm 2 \times s_{\hat{\beta}_2} \]
      • Why?: This gives a range within which the true slope is likely to lie.
    • Discussion: What does the confidence interval tell us about the slope? How does it help in interpreting the regression results?

Session 2: Forecasting with Linear and Non-Linear Models (45 minutes)

2.1 Linear Model for Men and Women

  • Objective: Fit a linear trend model to the Olympic 100m dataset for men and women.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Fit a linear trend model to the data.
      • Instruction: Use the formula: \[ W_i = \beta_1 + \beta_2 G_i + \epsilon_i \] where \(W_i\) is the winning time (seconds) and \(G_i\) is the year for i = 1,…,15 (from 1=1948 to 15=2004).
        • Find \(\hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\) and \(\hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x}\).
      • Why?: This provides a baseline forecast based on a linear trend.
    • Step 2: Calculate the outcomes of the linear model.
      • Instruction: Compute \(s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2\) and \(\sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2}\)

      • Create a table summarizing the results.

      • Construct a 95% CI for the slope coefficients.

      • A 95% two-sided confidence interval for the slope coefficient is an interval that contains the true value of the slope coefficient with a 95% probability; that is, it contains the true value of the slope coefficient in 95% of all possible randomly drawn samples.

      • Women seem to have made most progress.

      • Model assumes fixed gain (in seconds per game).

      • Negative slope means winning times decrease over time (improvement).

2.2 Non-Linear (Exponential) Model for Men and Women

  • Objective: Fit a non-linear (exponential) trend model to the data and transform it into a log-linear form.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Fit a non-linear (exponential) trend model to the data.
      • Instruction: Use the formula: \[ W_i = e^{\beta_1 + \beta_2 G_i + \epsilon_i} \]
      • Why?: This model can capture exponential trends in the data and provide a more accurate forecast.
    • Step 2: Transform the non-linear model into a log-linear form.
      • Instruction: Take the natural logarithm of both sides of the non-linear model: \[ \ln(W_i) = \beta_1 + \beta_2 G_i + \epsilon_i \]
      • Why?: This allows us to use linear regression techniques on the transformed model.
    • Step 3: Calculate the outcomes of the non-linear model.
      • Instruction: Compute the predicted winning times for men and women using the non-linear model.
      • Why?: This gives the forecasted winning times based on the non-linear trend.

2.3 Forecasting Winning Times for 2008 and 2012

  • Objective: Use the fitted models to forecast winning times for men and women in the Olympic games of 2008 and 2012.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Forecast winning times for 2008 and 2012 using the linear model.
      • Instruction: Plug in the years 2008 and 2012 into the linear equation: \[ W_i = \beta_1 + \beta_2 G_i \]
      • Why?: This provides a forecast based on the linear trend.
    • Step 2: Forecast winning times for 2008 and 2012 using the non-linear model.
      • Instruction: Plug in the years 2008 and 2012 into the log-linear equation and then exponentiate the result: \[ W_i = e^{\beta_1 + \beta_2 G_i} \]
      • Why?: This provides a forecast based on the non-linear trend.
    • Discussion: How do the forecasts differ between the linear and non-linear models? Which model is more realistic?

2.4 Introduction to GRETL and Verification of Results

  • Objective: Introduce GRETL and verify the results of the linear and non-linear models.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Load the dataset into GRETL.
      • Instruction: Use the GRETL interface to load the dataset.
      • Why?: This allows us to perform regression analysis using GRETL.
    • Step 2: Perform linear regression in GRETL.
      • Instruction: Use GRETL to run the linear regression and compare the results with the manual calculations.
      • Why?: This verifies the accuracy of the manual calculations.
    • Step 3: Perform non-linear regression in GRETL.
      • Instruction: Use GRETL to run the non-linear regression and compare the results with the manual calculations.
      • Why?: This verifies the accuracy of the manual calculations.
    • Discussion: How do the GRETL results compare with the manual calculations? What are the benefits of using GRETL for regression analysis?

Homework/Follow-Up