Using Dr. Seaver’s airline data, we are to fit the no-change model as well as three other naive models. Evaluations of the various models utilizing MSE, RMSE, MPE, and MAPE are presented, and the best model is then used for a forecast over a short horizon.
setwd("~/Documents/time_series/Homework_2")
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 6.2
library(ggplot2)
library(ggfortify)
# Reading in data
AIRLINE <- read.csv("airline.csv")
We first begin by creating a dataframe from Dr. Seaver’s data that will allow us to work with the time series conveniently:
# Remove 'trend' from data
AIRLINE$Trend <- NULL
# Remove 'commuters'
AIRLINE$Commuters <- NULL # When ready, remove this and reload the csv
# Remove 'Year'
AIRLINE$Year <- NULL # When ready, remove this and reload the csv
After formatting the dataframe, create time series objects for \(R\) and plot the series.
# Creating a time series object.
commuter_series <- ts(data = AIRLINE, start = 1982, end = 2014, frequency = 1)
# Plots time series for both 'Commuters' and 'Majair'
autoplot(commuter_series, ts.colour = "red")
# Creating a ggplot object for commuters
#commuters_plot <- ggplot(data = air_series, aes(x = Year, y = Commuters))
# Creating ggplot object for majair
#majair_plot <- ggplot(data = majair_series, aes(x = Year, y = Majair))
# Plotting
#commuters_plot + geom_line() # need to rescale x
#majair_plot + geom_line()
We will consider four naive models: the No-Change Model, the Percent Change Model, the Proportion of Most Recent Change Model, and one slightly more complex naive model.
The No-Change Model is a random walk model given by
\[{ \hat { y } }_{ t }={ y }_{ t-1 }\]
where our estimate, \({ \hat { y} }\), is equal to the previous observation.
# No-change model (also called lag_1)
no_change <- lag(commuter_series, k = 1)
# Append no change to data.frame
#AIRLINE$no_change <- no_change
# Printing accuracy evaluations for model
accuracy(no_change, commuter_series)
## ME RMSE MAE MPE MAPE ACF1
## Test set -0.0040625 0.9641236 0.7471875 -4.782192 24.80711 -0.5307255
## Theil's U
## Test set 1.023921
The percent change model is small extension of the no change model, where our predicted output is some proportion, \(p\), of the no change model given by
\[{ \hat { y } }_{ t }=(1+p)\cdot { y }_{ t-1 }\]
# Setting p, the percent change
p <- 0.05
# Percent change model
percent_change <- (1+p)*(no_change)
# Append model to data.frame
#AIRLINE$percent_change <- percent_change
# Printing accuracy evaluations for model
accuracy(percent_change, commuter_series)
## ME RMSE MAE MPE MAPE ACF1
## Test set -0.1597188 1.00149 0.7745625 -10.0213 26.32945 -0.529056
## Theil's U
## Test set 1.054298
Next we will consider a model as the ‘proportion of most recent change’ given by
\[{ \hat { y } }_{ t }={ y }_{ t-1 }+(1+p)({ y }_{ t-1 }-{ y }_{ t-2 })\]
where \(p\) is the percent change.
# Creating lag two component for next model.
lag_2 <- lag(commuter_series, 2)
# Initializng percent value
p <- 0.05
# Proportion of most recent change model
proportion_change <- no_change + (1 + p)*(no_change-lag_2)
# Append model to data.frame
#AIRLINE$proportion_change <- proportion_change
# Printing accuracy evaluations for model
accuracy(proportion_change, commuter_series)
## ME RMSE MAE MPE MAPE ACF1
## Test set -0.01251613 1.731603 1.336903 -5.080532 43.98106 -0.6769156
## Theil's U
## Test set 1.791769
Our final (slightly more complex) model is given by
\[{ \hat { y } }_{ t }={ y }_{ t-1 }\left( \frac { { y }_{ t-1 } }{ { y }_{ t-2 } } \right) +{ y }_{ t-1 }\]
# Final Model
complex_model <- no_change*(no_change/lag_2) + no_change
# Append model to data.frame
#AIRLINE$complex_model <- complex_model
# Printing accuracy evaluations for model
accuracy(complex_model, commuter_series)
## ME RMSE MAE MPE MAPE ACF1
## Test set -3.460607 4.422238 3.553702 -119.2907 121.1653 -0.4517987
## Theil's U
## Test set 4.366796
For clarity, repeated here are the model evaluations measuring forecast accuracies:
| Model | ME | RMSE | MAE | MPE | MAPE | ACF1 |
|---|---|---|---|---|---|---|
| No Change Model | -0.203 | 4.795 | 3.797 | -23.630 | 56.817 | -0.414 |
| Percent Change Model | -0.624 | 4.956 | 3.922 | -29.812 | 60.168 | -0.412 |
| Proportion Change Model | 0.016 | 8.269 | 7.073 | -22.357 | 103.335 | -0.649 |
| Complex Model | -11.555 | 17.865 | 12.816 | -190.096 | 202.439 | -0.431 |
Based upon the above measures of accuracy, we find that the No Change Model performs best. This is unsatisfying; the model sets a low threshold to compare with other, more complicated models, predicting each new value to be that of the previous.
Utilizing a \(95\)% confidence interval, the forecast for the No Change Model gives the forllowing immediate forecast horizon:
# Creating a forecast object and printing point estimate
(no_change_forecast <- forecast(no_change, h = 1, level = 95))
## Point Forecast Lo 95 Hi 95
## 2014 2.740404 1.438502 4.042306
# Printing a plot of the forecast to illustrate confidence interval
autoplot(no_change_forecast, ts.colour = "red")