For the week two discussion, I had a bit of trouble with the data in the link provided, so I found my own. I got it here here I decided to try to forecast air travelers arriving to New Zeland. The time period in the data is from is from:
df <- read.csv("C:/Users/jawilliams/Desktop/Forecasting/Data/NZAirPassengers.csv")
print(paste("The first observation occurs on", head(df$DATE, 1)))
## [1] "The first observation occurs on 2000M01"
print(paste("The last observation occurs on", tail(df$DATE,1)))
## [1] "The last observation occurs on 2012M10"
str(df)
## 'data.frame': 154 obs. of 3 variables:
## $ DATE : Factor w/ 154 levels "2000M01","2000M02",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Arrivals : int 284361 273092 234368 263813 202172 200423 281012 234306 232494 296586 ...
## $ Departures: int 288701 252533 286140 290177 235108 222173 250872 245299 242026 244084 ...
So 1/1/200 - 11/1/2012. Cool. Let’s make a ts and plot. As stated prior, I am interested in passengers arriving to NZ in this time period.
library(ggplot2)
library(forecast)
arrivals <- ts(df$Arrivals, start = c(2000, 1), end = c(2012,10), frequency = 12)
autoplot(arrivals)+
scale_x_continuous(breaks=seq(2000, 2012, 1)) +
ggthemes::theme_calc()
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
So We see an upward trend. There is some components of seasonality where towards the middle parts of the year arrivals dips, comes back up, and dips again. In the beginning it spikes sharply. Let’s decompose this series using both additive decomposition. The default setting for the function is additive.
autoplot(decompose(arrivals)) +
ggtitle("Addative Decomp of NZ Air Passenger Arrivals")
Now let’s use the multiplicative decomposition.
autoplot(decompose(arrivals, type = "multi")) +
ggtitle("Multi Decomp of NZ Air Passenger Arrivals")
There is really not much of a difference here. The plot’s are all pretty much the same. Noticeably, the random plot did not really get more random between either. Let’s use the additive series. We will use STL decomposition. Per the documentation of the function in the forecasts package, “STL Forecasts of STL objects are obtained by applying a non-seasonal forecasting method to the seasonally adjusted data and re-seasonalizing using the last year of the seasonal component.” Let’s plot this decomposition as compared to the other two to see if its decomposition does a better job with the random component, and the seasonal component variation. Not really going to play around with the params. Then plot the season a
autoplot(stl(arrivals, s.window = "periodic")) +
ggtitle("STL Decomp of NZ Air Passenger Arrivals")
No change. Let’s just use the additive method for simplicity’s sake. we are going to fit it, the forecast out another two years using which will decompose the time series using STL, forecast the seasonally adjusted series, and return the reseasonalised forecasts..
arr_preds <- stlf(arrivals, method='naive', h = 24)
autoplot(arr_preds)