HW2_DATA624_Chunjie

Do exercises 3.1, 3.2, 3.3 and 3.8 from the online Hyndman book. Please include your Rpubs link along with your rmd file.

3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec usgdp mcopper enplanements

library(fpp2)

## Loading required package: ggplot2

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.6.2

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Loading required package: fma

## Loading required package: expsmooth

library(ggplot2)

usnetelec

autoplot(usnetelec) + ylab("Annual US Electricity Generation") +  ggtitle("Annual US Net Electricity Generation")+ xlab("Year")

usnetelec_lamda <- BoxCox.lambda(usnetelec)
usnetelec_lamda

## [1] 0.5167714

autoplot(BoxCox(usnetelec,usnetelec_lamda)) +  ggtitle("Box Cox Transformation of Annual US Net Electricity Generation") + xlab("Year")

Both graphs shows linear upward trend, and the box cox transformation of usnetelec is 0.5167 which seems there is no much difference of variations between the two graphs.

usgdp

autoplot(usgdp)

usgdp_lamda <- BoxCox.lambda(usgdp)
usgdp_lamda

## [1] 0.366352

autoplot(BoxCox(usgdp,usgdp_lamda))

The box cox transformation of usgdp is 0.36652, and after transformation, the line became flatten and less variations.

mcopper

autoplot(mcopper) + ylab("Monthly Copper Price") + xlab("Year")

mcopper_lamda <- BoxCox.lambda(mcopper)
mcopper_lamda

## [1] 0.1919047

autoplot(BoxCox(mcopper,mcopper_lamda)) +  ylab("Monthly Copper Price") + xlab("Year")

The graph has upward trend with box cox transformation of 0.1919. The transformation made the graph has more variations than the original, so it is not necessary using the box cox transformation in this case.

enplanements

autoplot(enplanements) + ylab("Domestic Revenue Enplanements") + xlab("Year")

enplanement_lamda <- BoxCox.lambda(enplanements)
enplanement_lamda

## [1] -0.2269461

autoplot(BoxCox(enplanements,enplanement_lamda)) +  ylab("Domestic Revenue Enplanements") + xlab("Year")

The graph shows upward trend and seasonality with box cox transformation of -0.2269. After transformation, the graph expresses less variation of seasonality.

3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

autoplot(cangas)

cangas_lambda <- BoxCox.lambda(cangas)
cangas_lambda

## [1] 0.5767759

autoplot(BoxCox(cangas,cangas_lambda))

The box cox transform is not that useful when it only reduce the variation for begining and ending part, and doesn’t make any difference to the middle part.

3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"], frequency = 12, start =c(1982,4))
head(myts)

##        Apr   May   Jun   Jul   Aug   Sep
## 1982 139.3 136.0 143.5 150.2 144.0 146.9

autoplot(myts)

lambda_myts <- BoxCox.lambda(myts)
lambda_myts

## [1] 0.2324297

autoplot(BoxCox(myts, lambda_myts))

The Box cox is 0.2324 for my time series model from homework 1. As the two plots show, after transformation, the graph became flatten and more smooth.

3.8

For your retail time series (from Exercise 3 in Section 2.10):

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                    ME     RMSE      MAE      MPE     MAPE     MASE      ACF1
## Training set 46.37387 59.27744 47.03213 7.755343 7.848159 1.000000 0.8279637
## Test set     67.42917 76.67352 67.42917 4.447568 4.447568 1.433683 0.6327744
##              Theil's U
## Training set        NA
## Test set      1.004278

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 2099.4, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

From the ACF, I can tell the residuals are not correlated and the residuals are normally distributed, but slightly right skewed.

How sensitive are the accuracy measures to the training/test split?

myt.train <- window(myts, end=c(2011,12))
myt.test <- window(myts, start=2012)
fc2 <- snaive(myt.train)
accuracy(fc2,myt.test)

##                    ME     RMSE      MAE      MPE     MAPE   MASE       ACF1
## Training set 46.15826 58.85224 46.79362 7.579884 7.669472 1.0000 0.82587185
## Test set     70.39167 75.98102 70.39167 4.523895 4.523895 1.5043 0.02185922
##              Theil's U
## Training set        NA
## Test set     0.9224141

The accuracy are not sensitive to the training/test split. I changed the weight of train/test split and masure the accuracy again. With comparing the origial and revised weight of train and test, it shows there’s no much difference between these two. Therefore, the accuracy is not sensitive.

HW2_DATA624_Chunjie_Nan

Chunjie Nan

9/13/2020

Do exercises 3.1, 3.2, 3.3 and 3.8 from the online Hyndman book. Please include your Rpubs link along with your rmd file.

3.1

3.2

3.3

3.8