2026-02-06

Dataset Airmiles

We are going to explore regression by look at the airmiles data set.

data(airmiles)
head(df)
  year miles
1 1937   412
2 1938   480
3 1939   683
4 1940  1052
5 1941  1385
6 1942  1418

Airmiles

Linear trend

`geom_smooth()` using formula = 'y ~ x'

R code explaining linear trend

Here is the code in R that shows how we built our linear trend model. We can see that geom_smooth adds a nice addition in a trend line as well to see where the airmiles data is going.

ggplot(df, aes(x=year,y=miles))+
  geom_point()+
  geom_smooth(method="lm",se=TRUE) +
  labs(
    title="Linear Trend with 95% Confidence Band",
    x="Year",
    y="Passenger miles (millions)"
  )

Regresion Model

Modeling airline passenger miles as a linear function of time

\[ \text{Miles}_t = \beta_0 + \beta_1 \cdot \text{Year}_t + \varepsilon_t,\quad \varepsilon_t \sim N(0, \sigma^2) \]

  • Miles is measured in Millions
  • \(\beta_0\): intercept
  • \(\beta_1\): average yearly change in passenger miles
  • (_t: random error term

Hypothesis for an upward trend

Testing if airline passenger miles have increased over time. We can see that our null hypothesis is that airline passanger miles have not increased over time.

\[ H_0:beta_1=0 \qquad \text{vs.}\qquad H_A:\beta_1>0 \]

The test statistic for the slope is:

\[ t=\frac{\hat{\beta}_1-0}{SE(\hat{\beta}_1)} \]

  • A small p-value provides evidence of a positve time trend -This corresponds to the slope shown in the regression plot

Interactive time series of Airline Miles increasing by year