Time Series Forecasting

library(faraway)
## Warning: package 'faraway' was built under R version 3.4.3
data(airpass)
attach(airpass)
head(airpass)
##   pass     year
## 1  112 49.08333
## 2  118 49.16667
## 3  132 49.25000
## 4  129 49.33333
## 5  121 49.41667
## 6  135 49.50000

We begin by plotting the data to get a general idea of what kind of model would be the best fit:

plot(pass ~ year, type = "l")

The plot shows that there is positive correlation between year and the number of passengers. However, it is dificult to determing whether the relationship is linear or not.

Next, we create a linear model:

mod <- lm(pass ~ year)
summary(mod)
## 
## Call:
## lm(formula = pass ~ year)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -93.858 -30.727  -5.757  24.489 164.999 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1474.771     61.106  -24.14   <2e-16 ***
## year           31.886      1.108   28.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46.06 on 142 degrees of freedom
## Multiple R-squared:  0.8536, Adjusted R-squared:  0.8526 
## F-statistic: 828.2 on 1 and 142 DF,  p-value: < 2.2e-16

The p-value for year are very small, so we can conlude that there is a linear relationship between year and passengers.

We can look at the residual plot to see if a linear model is the best possible model:

plot(mod)

Since the residuals are not evenly dispersed around zero, it seems that a linear model is the best to fit this data.

We can look at a quadratic model instead of a linear model to see if that will fit the relationship better.

mod_quad <- lm(pass ~ year + I(year^2))
summary(mod_quad)
## 
## Call:
## lm(formula = pass ~ year + I(year^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -100.353  -27.339   -7.442   21.603  146.116 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 1570.5174  1053.9017   1.490  0.13841   
## year         -79.2078    38.4007  -2.063  0.04098 * 
## I(year^2)      1.0092     0.3487   2.894  0.00441 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.91 on 141 degrees of freedom
## Multiple R-squared:  0.8618, Adjusted R-squared:  0.8599 
## F-statistic: 439.8 on 2 and 141 DF,  p-value: < 2.2e-16

The summary tells us that the quadratic term is significant, so this model is a better than the linear model. This model has a second term that helps fit the data that the linear model could not fit.

We look at the residual plot to see if it looks better than the previous one.

plot(mod_quad)

By looking at the residual plot and the summary, we can confirm that the quadratic model is more accurate that the linear model.

We can use the quadratic model to forcast what the number of passengers will likely be in 1962:

coef(mod) %*% c(1, 62)
##          [,1]
## [1,] 502.1735
coef(mod_quad) %*% c(1, 62, 62^2)
##          [,1]
## [1,] 538.9268