Session Overview


Session 1: Forecasting with Linear and Non-Linear Models (45 minutes)

2.1 Linear Model forWomen

  • Objective: Fit a linear regression model to the Olympic 100m dataset for women.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Fit a linear regression model to the data.
      • Instruction: Use the formula: \[ W_i = \beta_1 + \beta_2 G_i + \epsilon_i \] where \(W_i\) is the winning time (seconds) and \(G_i\) is the year for i = 1,…,15 (from 1=1948 to 15=2004).
        • Find \(\hat{\beta}_2 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\) and \(\hat{\beta}_1 = \bar{y} - \hat{\beta}_2 \bar{x}\).
      • Why?: This provides a baseline forecast based on a linear trend.
    • Step 2: Calculate the outcomes of the linear model.
      • Instruction: Compute \(s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2\) and \(\sigma_{\hat{\beta}_2}^2 = \frac{s^2}{\sum (x_i - \bar{x})^2}\)
      • Create a table summarizing the results.
      • Construct a 95% CI for the slope coefficients.
      • A 95% two-sided confidence interval for the slope coefficient is an interval that contains the true value of the slope coefficient with a 95% probability; that is, it contains the true value of the slope coefficient in 95% of all possible randomly drawn samples.

Results

- Women seem to have made progress in every four years.
- Model assumes   fixed gain (in seconds per game).
- Negative slope means winning times decrease over time (improvement).
- Each additional year (moving from one Olympics to the next, every 4 years) is associated with a decrease of 0.063 seconds in the winning time.
- Over the full period (from 1948 to 2004, i.e., 14 steps): 14×(−0.063)≈−0.88 seconds improvement.
- Winning times are getting faster (lower times = better performance).
- We are 95% confident that the true annual (per‑Olympics) improvement lies between 0.0386 and 0.0873 seconds.
-  Since the entire interval is below zero, the improvement is statistically significant (the slope is not zero).
- Practical significance: Even the slowest estimated improvement (0.0386 sec per Olympics) accumulates to about 0.54 seconds over 14 Olympics – a meaningful margin in a 100m sprint.
- About 67.2% of the variation in women’s winning times is explained by the linear model over time.
- The remaining 32.8% is due to other factors (e.g., weather, wind, individual athlete performance, measurement error).
- This indicates a moderately strong linear trend – the Olympic year is a useful predictor, but there is still noticeable scatter around the line.
- Standard error of the regression s=0.204 seconds, which means that the typical prediction error when using this model is about 0.20 seconds.
- In sprinting, 0.2 seconds is substantial (often several places in a close race). This confirms that while the trend is clear, **individual race outcomes can deviate noticeably from the line.**
- ∣−5.165∣>2.160 → reject the null hypothesi.s There is strong statistical evidence that women’s 100m times have been decreasing (improving) over time.

Conclusion

The linear model confirms a clear downward trend in women’s 100m winning times from 1948 to 2004. The negative slope is statistically significant, and the confidence interval provides a plausible range for the rate of improvement. However, the moderate \(R^2\) and residual standard error of 0.2 seconds remind us that year alone does not perfectly predict winning times – other fact$ors matter in any given Olympics.

2.2 Non-Linear (Exponential) Model for Women

  • Objective: Fit a non-linear (exponential) regression model to the data and transform it into a log-linear form.
  • Exercise: Use the Olympic 100m dataset.

Why consider a non‑linear trend?

The linear model assumes winning times decrease by the same absolute number of seconds every Olympics (e.g., –0.063 sec per Games). But in reality, improvement often slows down over time. Athletes start from a higher baseline and make big leaps early, then improvements become smaller as they approach human limits. An exponential decay model captures this pattern: times decrease by a constant percentage each period, not a constant amount.

  • Step 1: Fit a non-linear (exponential) regression model to the data.
    • Instruction: Use the formula: \[ W_i = e^{\beta_1 + \beta_2 G_i + \epsilon_i} \]
    • Why?: This model can capture exponential trends in the data and provide a more accurate forecast. Unlike the linear model (constant absolute drop), the exponential model implies a constant percentage drop. That means the absolute drop gets smaller as times get faster – which is more realistic for elite sport.
  • Step 2: Transform the non-linear model into a log-linear form.
    • Instruction: Take the natural logarithm of both sides of the non-linear model: \[ \ln(W_i) = \beta_1 + \beta_2 G_i + \epsilon_i \]
    • Why?: This allows us to use linear regression techniques on the transformed model. Now it’s a linear model in the log of time. We can estimate the coefficients using ordinary least squares on \(ln(w_i)\).
  • Step 3: Calculate the outcomes of the non-linear model.
    • Instruction: Compute the predicted winning times for women using the non-linear model.
    • Why?: This gives the forecasted winning times based on the non-linear trend.

2.3 Forecasting Winning Times for 2008 and 2012

  • Objective: Use the fitted models to forecast winning times for women in the Olympic games of 2008 and 2012.
  • Exercise: Use the Olympic 100m dataset.
    • Step 1: Forecast winning times for 2008 and 2012 using the linear model.
      • Instruction: Plug in the years 2008 and 2012 into the linear equation: \[ W_i = \beta_1 + \beta_2 G_i \]
      • Why?: This provides a forecast based on the linear trend.
    • Step 2: Forecast winning times for 2008 and 2012 using the non-linear model.
      • Instruction: Plug in the years 2008 and 2012 into the log-linear equation and then exponentiate the result: \[ W_i = e^{\beta_1 + \beta_2 G_i} \]
      • Why?: This provides a forecast based on the non-linear trend.
    • Discussion: How do the forecasts differ between the linear and non-linear models? Which model is more realistic?

Session 2

2.4 Introduction to GRETL and Verification of Results

Objective: Introduce GRETL and verify the results of the linear and non‑linear models using the Olympic 100m dataset.

Exercise: Use the Olympic 100m dataset.

Step 1: Load the dataset into GRETL

Instruction: - Open GRETL. - Option A – Manual entry:

  1. Click File → New → Dataset.
  2. Set number of observations = 15, number of variables = 2.
  3. Name the variables: G (year index, 1 to 15) and W (winning time in seconds).
  4. Enter the data from the table below.

Why?: Loading the data correctly is the first step to using GRETL for regression analysis.

Step 2: Perform linear regression in GRETL

Instruction:

Why?: Running the linear regression in GRETL will produce the same coefficients (\(\hat{\beta}_1, \hat{\beta}_2\)), \(R^2\), and standard errors you calculated manually. Compare the GRETL output with your hand calculations.

Step 3: Perform non‑linear (log‑linear) regression in GRETL

Instruction:

Why?: This estimates the exponential model \(\ln(W_i) = \beta_1 + \beta_2 G_i + \varepsilon_i\). Exponentiate the fitted values to get predicted winning times in seconds.

Step 4: Verify your manual calculations

Instruction:

Why?: GRETL provides a reliable benchmark. Verifying your manual results builds confidence in your understanding of regression calculations.

Step 5: (Optional) Forecast for 2008 and 2012

Instruction (to extend the dataset and forecast):

  1. Add observations for 2008 and 2012:

    • Click Edit → Add observations → number = 2.
    • In the new rows (16 and 17), set G = 16 and G = 17. Leave W blank.
  2. Re‑run the linear regression (Model → Ordinary Least Squares with W and G).

  3. From the model window, click Analysis → Forecasts.

    • GRETL will predict W for the missing rows.
  4. For the log‑linear model:

    • Re‑run the regression with l_W and G.
    • Use Analysis → Forecasts. To obtain forecasts in original units, check “Exponential forecast” (or manually compute exp(predicted_log)).

Why?: Forecasting helps you understand how each model extrapolates beyond the observed data.

Discussion