Session Overview
- Duration: 2 sessions of 90 minutes each
- Tools: Google Sheets (for manual
calculations).
- Objective: Students will analyze the
Olympic 100m dataset using linear and non-linear
models. They will also learn how to log-linearize non-linear
models.
- Focus: Hands-on calculations, intuitive
explanations, and forecasting using both linear and non-linear
models.
Session 1: Forecasting with Linear and Non-Linear Models (45
minutes)
2.1 Linear Model forWomen
- Objective: Fit a linear regression model to the
Olympic 100m dataset for women.
- Exercise: Use the Olympic 100m
dataset.
- Step 1: Fit a linear regression model to the data.
- Instruction: Use the formula: \[
W_i = \beta_1 + \beta_2 G_i + \epsilon_i
\] where \(W_i\) is the winning
time (seconds) and \(G_i\) is the year
for i = 1,…,15 (from 1=1948 to 15=2004).
- Find \(\hat{\beta}_2 = \frac{\sum (x_i -
\bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\) and \(\hat{\beta}_1 = \bar{y} - \hat{\beta}_2
\bar{x}\).
- Why?: This provides a baseline forecast based on a
linear trend.
- Step 2: Calculate the outcomes of the linear model.
- Instruction: Compute \(s^2 = \frac{1}{n-2} \sum_{i=1}^{n} e_i^2\)
and \(\sigma_{\hat{\beta}_2}^2 =
\frac{s^2}{\sum (x_i - \bar{x})^2}\)
- Create a table summarizing the results.
- Construct a 95% CI for the slope coefficients.
- A 95% two-sided confidence interval for the slope coefficient is an
interval that contains the true value of the slope coefficient with a
95% probability; that is, it contains the true value of the slope
coefficient in 95% of all possible randomly drawn samples.
Results
- Women seem to have made progress in every four years.
- Model assumes fixed gain (in seconds per game).
- Negative slope means winning times decrease over time (improvement).
- Each additional year (moving from one Olympics to the next, every 4 years) is associated with a decrease of 0.063 seconds in the winning time.
- Over the full period (from 1948 to 2004, i.e., 14 steps): 14×(−0.063)≈−0.88 seconds improvement.
- Winning times are getting faster (lower times = better performance).
- We are 95% confident that the true annual (per‑Olympics) improvement lies between 0.0386 and 0.0873 seconds.
- Since the entire interval is below zero, the improvement is statistically significant (the slope is not zero).
- Practical significance: Even the slowest estimated improvement (0.0386 sec per Olympics) accumulates to about 0.54 seconds over 14 Olympics – a meaningful margin in a 100m sprint.
- About 67.2% of the variation in women’s winning times is explained by the linear model over time.
- The remaining 32.8% is due to other factors (e.g., weather, wind, individual athlete performance, measurement error).
- This indicates a moderately strong linear trend – the Olympic year is a useful predictor, but there is still noticeable scatter around the line.
- Standard error of the regression s=0.204 seconds, which means that the typical prediction error when using this model is about 0.20 seconds.
- In sprinting, 0.2 seconds is substantial (often several places in a close race). This confirms that while the trend is clear, **individual race outcomes can deviate noticeably from the line.**
- ∣−5.165∣>2.160 → reject the null hypothesi.s There is strong statistical evidence that women’s 100m times have been decreasing (improving) over time.
Conclusion
The linear model confirms a clear downward trend in women’s 100m
winning times from 1948 to 2004. The negative slope is statistically
significant, and the confidence interval provides a plausible range for
the rate of improvement. However, the moderate \(R^2\) and residual standard error of 0.2
seconds remind us that year alone does not perfectly predict winning
times – other fact$ors matter in any given Olympics.
2.2 Non-Linear (Exponential) Model for Women
- Objective: Fit a non-linear (exponential)
regression model to the data and transform it into a log-linear
form.
- Exercise: Use the Olympic 100m
dataset.
Why consider a non‑linear trend?
The linear model assumes winning times decrease by the same absolute
number of seconds every Olympics (e.g., –0.063 sec per Games). But in
reality, improvement often slows down over time. Athletes start from a
higher baseline and make big leaps early, then improvements become
smaller as they approach human limits. An exponential decay model
captures this pattern: times decrease by a constant percentage each
period, not a constant amount.
- Step 1: Fit a non-linear (exponential) regression
model to the data.
- Instruction: Use the formula: \[
W_i = e^{\beta_1 + \beta_2 G_i + \epsilon_i}
\]
- Why?: This model can capture exponential trends in
the data and provide a more accurate forecast. Unlike the linear model
(constant absolute drop), the exponential model implies a
constant percentage drop. That means the absolute drop
gets smaller as times get faster – which is more realistic for elite
sport.
- Step 2: Transform the non-linear model into a
log-linear form.
- Instruction: Take the natural logarithm of both
sides of the non-linear model: \[
\ln(W_i) = \beta_1 + \beta_2 G_i + \epsilon_i
\]
- Why?: This allows us to use linear regression
techniques on the transformed model. Now it’s a linear model in the log
of time. We can estimate the coefficients using ordinary least squares
on \(ln(w_i)\).
- Step 3: Calculate the outcomes of the non-linear
model.
- Instruction: Compute the predicted winning times
for women using the non-linear model.
- Why?: This gives the forecasted winning times based
on the non-linear trend.
2.3 Forecasting Winning Times for 2008 and 2012
- Objective: Use the fitted models to forecast
winning times for women in the Olympic games of 2008 and 2012.
- Exercise: Use the Olympic 100m
dataset.
- Step 1: Forecast winning times for 2008 and 2012
using the linear model.
- Instruction: Plug in the years 2008 and 2012 into
the linear equation: \[
W_i = \beta_1 + \beta_2 G_i
\]
- Why?: This provides a forecast based on the linear
trend.
- Step 2: Forecast winning times for 2008 and 2012
using the non-linear model.
- Instruction: Plug in the years 2008 and 2012 into
the log-linear equation and then exponentiate the result: \[
W_i = e^{\beta_1 + \beta_2 G_i}
\]
- Why?: This provides a forecast based on the
non-linear trend.
- Discussion: How do the forecasts differ between the
linear and non-linear models? Which model is more realistic?
Session 2
2.4 Introduction to GRETL and Verification of
Results
Objective: Introduce GRETL and verify the results of
the linear and non‑linear models using the Olympic 100m dataset.
Exercise: Use the Olympic 100m dataset.
Step 1: Load the dataset into GRETL
Instruction: - Open GRETL. - Option A –
Manual entry:
- Click File → New → Dataset.
- Set number of observations =
15, number of variables =
2.
- Name the variables:
G (year index, 1 to 15) and
W (winning time in seconds).
- Enter the data from the table below.
Option B – Import CSV:
- Create a CSV file with columns
G and
W.
- Click File → Open data → Import → CSV and select
the file.
Why?: Loading the data correctly is the first step
to using GRETL for regression analysis.
Step 4: Verify your manual calculations
Instruction:
- Compare the GRETL output with the numbers you computed by hand:
- Linear model: \(\hat{\beta}_2\)
should equal –0.063; \(s\) (standard
error of regression) ≈ 0.204; \(R^2\) ≈
0.672.
- Log‑linear model: \(\hat{\beta}_2\)
should equal –0.00563; \(s^2\) (error
variance) ≈ 0.0416.
- If they match, your manual work is correct. If not, check your
arithmetic.
Why?: GRETL provides a reliable benchmark. Verifying
your manual results builds confidence in your understanding of
regression calculations.
Step 5: (Optional) Forecast for 2008 and 2012
Instruction (to extend the dataset and
forecast):
Add observations for 2008 and 2012:
- Click Edit → Add observations → number = 2.
- In the new rows (16 and 17), set
G = 16 and
G = 17. Leave W blank.
Re‑run the linear regression (Model → Ordinary
Least Squares with W and G).
From the model window, click Analysis →
Forecasts.
- GRETL will predict
W for the missing rows.
For the log‑linear model:
- Re‑run the regression with
l_W and G.
- Use Analysis → Forecasts. To obtain forecasts in
original units, check “Exponential forecast” (or
manually compute
exp(predicted_log)).
Why?: Forecasting helps you understand how each
model extrapolates beyond the observed data.
Discussion
- How do the GRETL results compare with your manual
calculations?
- What are the benefits of using GRETL for regression analysis (speed,
accuracy, diagnostic tools, ease of forecasting)?
- Which model (linear or exponential) gives more realistic forecasts
for future Olympics, and why?