For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
usnetelec usgdp mcopper enplanements
library(fpp2)
## Loading required package: ggplot2
## Loading required package: forecast
## Warning: package 'forecast' was built under R version 3.6.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Loading required package: fma
## Loading required package: expsmooth
library(ggplot2)
usnetelec
autoplot(usnetelec) + ylab("Annual US Electricity Generation") + ggtitle("Annual US Net Electricity Generation")+ xlab("Year")
usnetelec_lamda <- BoxCox.lambda(usnetelec)
usnetelec_lamda
## [1] 0.5167714
autoplot(BoxCox(usnetelec,usnetelec_lamda)) + ggtitle("Box Cox Transformation of Annual US Net Electricity Generation") + xlab("Year")
Both graphs shows linear upward trend, and the box cox transformation of usnetelec is 0.5167 which seems there is no much difference of variations between the two graphs.
usgdp
autoplot(usgdp)
usgdp_lamda <- BoxCox.lambda(usgdp)
usgdp_lamda
## [1] 0.366352
autoplot(BoxCox(usgdp,usgdp_lamda))
The box cox transformation of usgdp is 0.36652, and after transformation, the line became flatten and less variations.
mcopper
autoplot(mcopper) + ylab("Monthly Copper Price") + xlab("Year")
mcopper_lamda <- BoxCox.lambda(mcopper)
mcopper_lamda
## [1] 0.1919047
autoplot(BoxCox(mcopper,mcopper_lamda)) + ylab("Monthly Copper Price") + xlab("Year")
The graph has upward trend with box cox transformation of 0.1919. The transformation made the graph has more variations than the original, so it is not necessary using the box cox transformation in this case.
enplanements
autoplot(enplanements) + ylab("Domestic Revenue Enplanements") + xlab("Year")
enplanement_lamda <- BoxCox.lambda(enplanements)
enplanement_lamda
## [1] -0.2269461
autoplot(BoxCox(enplanements,enplanement_lamda)) + ylab("Domestic Revenue Enplanements") + xlab("Year")
The graph shows upward trend and seasonality with box cox transformation of -0.2269. After transformation, the graph expresses less variation of seasonality.
Why is a Box-Cox transformation unhelpful for the cangas data?
autoplot(cangas)
cangas_lambda <- BoxCox.lambda(cangas)
cangas_lambda
## [1] 0.5767759
autoplot(BoxCox(cangas,cangas_lambda))
The box cox transform is not that useful when it only reduce the variation for begining and ending part, and doesn’t make any difference to the middle part.
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"], frequency = 12, start =c(1982,4))
head(myts)
## Apr May Jun Jul Aug Sep
## 1982 139.3 136.0 143.5 150.2 144.0 146.9
autoplot(myts)
lambda_myts <- BoxCox.lambda(myts)
lambda_myts
## [1] 0.2324297
autoplot(BoxCox(myts, lambda_myts))
The Box cox is 0.2324 for my time series model from homework 1. As the two plots show, after transformation, the graph became flatten and more smooth.
For your retail time series (from Exercise 3 in Section 2.10):
myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)
autoplot(myts) +
autolayer(myts.train, series="Training") +
autolayer(myts.test, series="Test")
fc <- snaive(myts.train)
accuracy(fc,myts.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 46.37387 59.27744 47.03213 7.755343 7.848159 1.000000 0.8279637
## Test set 67.42917 76.67352 67.42917 4.447568 4.447568 1.433683 0.6327744
## Theil's U
## Training set NA
## Test set 1.004278
checkresiduals(fc)
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 2099.4, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
From the ACF, I can tell the residuals are not correlated and the residuals are normally distributed, but slightly right skewed.
myt.train <- window(myts, end=c(2011,12))
myt.test <- window(myts, start=2012)
fc2 <- snaive(myt.train)
accuracy(fc2,myt.test)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 46.15826 58.85224 46.79362 7.579884 7.669472 1.0000 0.82587185
## Test set 70.39167 75.98102 70.39167 4.523895 4.523895 1.5043 0.02185922
## Theil's U
## Training set NA
## Test set 0.9224141
The accuracy are not sensitive to the training/test split. I changed the weight of train/test split and masure the accuracy again. With comparing the origial and revised weight of train and test, it shows there’s no much difference between these two. Therefore, the accuracy is not sensitive.