library(fpp2)
library(readxl)

Forecasting

3.1

a.usnetelec

a<-usnetelec
lambdaa <- BoxCox.lambda(a)
lambdaa
## [1] 0.5167714
adf <- cbind(Raw = a, Transformed = BoxCox(a, lambdaa)) 
autoplot(adf, facet=TRUE) + 
    labs(title = "Monthly Copper Prices",
         x = "Year", y = "")

b.usgdp

b<-usgdp
lambdab <- BoxCox.lambda(b)
lambdab
## [1] 0.366352
bdf <- cbind(Raw = b, Transformed = BoxCox(b, lambdab)) 
autoplot(bdf, facet=TRUE) + 
    labs(title = "US GDP",
         x = "Year", y = "")

c.mcopper

c<-mcopper
lambdac <- BoxCox.lambda(c)
lambdac
## [1] 0.1919047
cdf <- cbind(Raw = c, Transformed = BoxCox(c, lambdac)) 
autoplot(cdf, facet=TRUE) + 
    labs(title = "Monthly Copper Prices",
         x = "Year", y = "")

d. enplanements

d<-enplanements
lambdad <- BoxCox.lambda(d)
lambdad
## [1] -0.2269461
ddf <- cbind(Raw = d, Transformed = BoxCox(d, lambdad)) 
autoplot(ddf, facet=TRUE) + 
    labs(title = "US Domestic Enplanements",
         x = "Year", y = "")

3.2

The Box-Cox transformation is unhelpful because the variation does not increase or decrease however instead follows a complex pattern. The variation increases slowly, then increases rapidly, and decreases slowly towards the end. The transformed time series shows a pattern with a non-constant variance.

two<-cangas
lambdatwo <- BoxCox.lambda(two)
lambdatwo
## [1] 0.5767759
twodf <- cbind(Raw = two, Transformed = BoxCox(two, lambdatwo)) 
autoplot(twodf, facet=TRUE) + 
    labs(title = "Monthly Canadian Gas Production",
         x = "Year", y = "")

3.3

For retail data, I choose the 7th variable column in the retail dataset which is close to 1.

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
colID <- colnames(retaildata)[8]
myts <- ts(retaildata[ , colID], frequency=12, start=c(1982,4))

lambdathree <- BoxCox.lambda(myts)
lambdathree
## [1] 0.9165544
threedf <- cbind(Raw = myts, Transformed = BoxCox(myts, lambdathree)) 
autoplot(threedf, facet=TRUE) + 
  labs(title = "Turnover - NSW",
       x = "Month", y = "")

3.8

a. Split data into training and test sets

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

b. Plot the data

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test") + 
  geom_vline(xintercept = 2011, lty = 2) + 
  labs(title = "Turnover - NSW", 
       x = "Month", y = "") + 
  guides(colour=guide_legend(title="Series"))

c. Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

d. Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)
##                     ME     RMSE      MAE      MPE      MAPE      MASE      ACF1
## Training set  9.460661 26.30758 21.23363 4.655690 12.762886 1.0000000 0.8070166
## Test set     17.212500 21.26067 17.39583 4.748234  4.807728 0.8192584 0.4843871
##              Theil's U
## Training set        NA
## Test set     0.6934111

e. Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 856.11, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residuals are strongly correlated and appear normally distributed however the distribution shown in the histogram appears slightly skewed. The left tail which indicates that the forecasts are biased.

f. How sensitive are the accuracy measures to the training/test split?

The accuracy measures are quite sensitive to the training/test split. The values are significantly different between the two. This would suggest that the model does not generalize well.