Question 3.1

library(fpp2)
library(ggplot2)

Question 3.1.1

autoplot(usnetelec) + 
  ylab("billion kwh") +  
  xlab("Year") +
  ggtitle("Annual US net electricity generation (billion kwh) for 1949-2003")

l_usnetelec <- BoxCox.lambda(usnetelec)
l_usnetelec
## [1] 0.5167714
## [1] 0.5167714
autoplot(BoxCox(usnetelec,l_usnetelec)) +  
  ggtitle("Box Cox Transformation of Annual US Net Electricity Generation")

Inference: The time series plot of annual electricity generation in the US shows an upward trend, little variance and no seasonality. . The BoxCox.lambda function was used to choose a value for lambda to make the size of the seasonal variation constant. The value of lambda chosen is 0.5167. The transformed data is more linear and has less variance.

Question 3.1.2

autoplot(usgdp) + 
  ylab("US Dollars") +  
  xlab("Year") +
  ggtitle("Quarterly US GDP")

l_usgdp <- BoxCox.lambda(usgdp)
l_usgdp
## [1] 0.366352
autoplot(BoxCox(usgdp,l_usgdp)) +  
  ggtitle("Box Cox Transformation of Quarterly US GDP")

inference: The plot of quarterly US GDP shows an upward trend and no seasonality. The BoxCox.lambda function was used to choose a value for lambda to make the size of the seasonal variation constant. The value of lambda chosen is 0.37. The transformed data is more linear and has less variation than the original data.

Question 3.1.3

autoplot(mcopper) + 
  ylab("pounds per ton") +  
  xlab("Year") +
  ggtitle("Monthly copper price")

l_mcopper <- BoxCox.lambda(mcopper)
l_mcopper
## [1] 0.1919047
autoplot(BoxCox(mcopper,l_mcopper)) +  
  ggtitle("Box Cox Transformation of Monthly copper price")

inference: The plot of monthly copper prices shows an upward trend and cyclic behavior. There is less variation between 1960 and 1970 and a sharp increase in price around 2007. To make the size of the seasonal variation constant, the BoxCox.lambda function was used and was set to 0.1919. The transformed data shows more consistent variation throughout and a less prominent spike.

Question 3.1.4

autoplot(enplanements) + 
  ylab("millions") +  
  xlab("Year") +
  ggtitle("US domestic enplanements")

l_enplanements <- BoxCox.lambda(enplanements)
l_enplanements
## [1] -0.2269461
autoplot(BoxCox(enplanements,l_enplanements)) +  
  ggtitle("Box Cox Transformation of US domestic enplanements")

inference: The plot of monthly US domestic enplanements shows an upward trend and a seasonality of 1 year. The BoxCox.lambda function was used to choose a value for lambda to make the size of the seasonal variation constant. The value of lambda chosen is -0.23. The transformed data has less seasonal variation throughout.

Question 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas) + 
  ylab("billion cubic metres") +  
  xlab("Year") +
  ggtitle("Monthly Canadian gas production")

l_cangas <- BoxCox.lambda(cangas)
l_cangas
## [1] 0.5767759
autoplot(BoxCox(cangas,l_cangas)) +  
  ggtitle("Box Cox Transformation of Monthly Canadian gas production")

Inference: The plot of monthly Canadian gas production displays a seasonality of 1 year and a seasonal variance that is low through the early part of the data, The variance between 1978 and 1988 are a bit higher and again smaller from 1988 through 2005. Because the seasonal variation increases and then decreases, the Box Cox transformation cannot be used to make the seasonal variation uniform.

Question 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349399C"],frequency=12, start=c(1982,4))
autoplot(myts) + ylab("Retail Clothing Sales") + ggtitle("New South Wales - Clothing Sales")

l_retail <- BoxCox.lambda(myts)
l_retail
## [1] 0.02074707
autoplot(BoxCox(myts,l_retail)) +  
  ggtitle("Box Cox Transformation of Retail Clothing Sales in New South Wales")

Inference: The plot of clothing sales in New South Wales shows an upward trend and a seasonality of 1 year. The seasonal variation increases with time. The BoxCox.lambda function was used to choose a value for lambda to make the size of the seasonal variation constant. The value of lambda chosen is 0.02. The transformed data has less seasonal variation throughout.

Question 3.8

For your retail time series (from Exercise 3 in Section 2.10): split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)  

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train) 

Compare the accuracy of your forecasts against the actual values stored in myts.test

accuracy(fc,myts.test)
##                     ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set  9.007207 21.13832 16.58859 4.224080 7.494415 1.000000 0.5277855
## Test set     10.362500 21.50499 18.99583 2.771495 5.493632 1.145115 0.7420700
##              Theil's U
## Training set        NA
## Test set     0.3223094

Compare the accuracy of your forecasts against the actual values stored in myts.test

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 342, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

How sensitive are the accuracy measures to the training/test split?

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 342, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The errors for the test and train set are fairly similar. The test set has slightly larger errors that the training set for the mean error, root mean square error, mean absolute error, mean absolute scaled error and auto correlation function. The test set has a lower error for the mean percentage error and the mean absolute percentage error.