DATA 624 - HOMEWORK 2
library(tidyverse)
library(fpp2)
library(readxl)
library(rio)
library(gridExtra)
library(ggpubr)
library(ggthemes)
#library(TSstudio)
1 Question - 3.1
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.
1.1 usnetelec
## [1] "Box-Cox Lambda: 0.516771443964645"
cbind(usnetelec,
usnetelec_BoxCox = BoxCox(usnetelec,BoxCox.lambda(usnetelec))) %>%
autoplot(facet=TRUE) +
xlab('Year') +
ylab('billion kwh') +
ggtitle('Annual US net electricity generation (Billion kwh) for 1949-2003') +
theme_hc()
1.2 usgdp
## [1] "Box-Cox Lambda: 0.366352049520934"
cbind(usgdp,
usgdp_BoxCox = BoxCox(usgdp,BoxCox.lambda(usgdp))) %>%
autoplot(facet=TRUE) +
xlab('Quarter') +
ylab('GDP') +
ggtitle('Quarterly US GDP. 1947:1 - 2006.1') +
theme_hc()
1.3 mcopper
## [1] "Box-Cox Lambda: 0.191904709003829"
cbind(mcopper,
mcopper_BoxCox = BoxCox(mcopper,BoxCox.lambda(mcopper))) %>%
autoplot(facet=TRUE) +
xlab('Month') +
ylab('Price') +
ggtitle('Monthly copper prices') +
theme_hc()
1.4 enplanements
## [1] "Box-Cox Lambda: -0.226946111237065"
cbind(enplanements,
enplanements_BoxCox = BoxCox(enplanements,BoxCox.lambda(enplanements))) %>%
autoplot(facet=TRUE) +
xlab('Month') +
ylab('Domestic Revenue Enplanements (millions)') +
ggtitle('Monthly US domestic enplanements: 1996-2000') +
theme_hc()
2 Question - 3.2
Why is a Box-Cox transformation unhelpful for the cangas
data?
Answer: The time series does not have a uniform seasonality. Box-Cox algorithm assumes that the transformed data is highly likely to be normally distributed when SD -> min(SD), however it does not garantee normality after transformation.
## [1] "Box-Cox Lambda: 0.576775938228139"
cbind(cangas,
cangas_BoxCox = BoxCox(cangas,BoxCox.lambda(cangas))) %>%
autoplot(facet=TRUE) +
xlab('Month') +
ylab('Gas Production (billions of cubic metres)') +
ggtitle('Monthly Canadian gas production: 1960.1.-2005.2.') +
theme_hc()
3 Question - 3.3
What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?
3.1 Read data from Ex 2.3
3.2 Select column A3349398A
3.3 Calculate Best Lambda
Answer: The best value for Lambda is 0.123156269082221 using BoxCox.lambda
function. For better interpretation, I would prefer rounding the value to 1 decimal which is 0.1.
## [1] "Box-Cox Lambda: 0.123156269082221"
cbind(myts,
myts_BoxCox = BoxCox(myts,BoxCox.lambda(myts))) %>%
autoplot(facet=TRUE) +
ggtitle('Monthly Food Retailing in Australia') +
theme_hc()
4 Question - 3.8
For your retail time series (from Exercise 3 in Section 2.10):
4.1 a.
Split the data into two parts using.
4.2 b.
Check that your data have been split appropriately by producing the following plot.
4.4 d.
Compare the accuracy of your forecasts against the actual values stored in myts.test
.
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 73.94114 88.31208 75.13514 6.068915 6.134838 1.000000 0.6312891
## Test set 115.00000 127.92727 115.00000 4.459712 4.459712 1.530576 0.2653013
## Theil's U
## Training set NA
## Test set 0.7267171
4.5 e.
Check the residuals.
##
## Ljung-Box test
##
## data: Residuals from Seasonal naive method
## Q* = 671.41, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
Do the residuals appear to be uncorrelated and normally distributed?
Answer: The residuals does not appear to be uncorrelated and normally distributed.
From the autoplot, the variation of residuals gets larger as time expends.
The ACF plot demostrates significant auto correlation.
The histogram shows right screwed distribution.
4.6 f.
How sensitive are the accuracy measures to the training/test split?
Answer: The plot below shows the accuracy metrics of both training set and test set with train-test-split cut off points from year 1985 to 2010. It shows that the metrics of training set are relatively unsensitive, however those of test set are very sensitive to train-test-split cutting point.
acc_df <- data.frame()
for (year in seq(1985, 2010)){
myts.train <- window(myts, end=c(year-1,12))
myts.test <- window(myts, start=year)
fc <- snaive(myts.train)
acc_year <- accuracy(fc,myts.test) %>%
data.frame() %>%
rownames_to_column()
acc_df <- acc_df %>% rbind(cbind(year, acc_year))
}
acc_df %>%
rename(Data_Type = rowname) %>%
select(year, Data_Type, RMSE, MAE, MAPE, MASE) %>%
gather(key = 'Acc_Metrics', value = 'Value', -year, -Data_Type) %>%
ggplot(aes(x = year, y = Value)) +
geom_line() +
facet_grid(Acc_Metrics~Data_Type, scales = 'free_y') +
theme_hc() +
ylab('Accuracy Metrics') +
xlab('Train-Test-Split Cutting Point (year)') +
ggtitle('Accuracy Metrics with different Train-Test-split')