library(tidyverse)
library(fpp2)
library(readxl)
library(rio)
library(gridExtra)
library(ggpubr)
library(ggthemes)
#library(TSstudio)

1 Question - 3.1

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

1.1 `usnetelec`

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(usnetelec)))

## [1] "Box-Cox Lambda: 0.516771443964645"

cbind(usnetelec,
      usnetelec_BoxCox = BoxCox(usnetelec,BoxCox.lambda(usnetelec))) %>%
  autoplot(facet=TRUE) +
  xlab('Year') + 
  ylab('billion kwh') +
  ggtitle('Annual US net electricity generation (Billion kwh) for 1949-2003') +
  theme_hc()

1.2 `usgdp`

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(usgdp)))

## [1] "Box-Cox Lambda: 0.366352049520934"

cbind(usgdp,
      usgdp_BoxCox = BoxCox(usgdp,BoxCox.lambda(usgdp))) %>%
  autoplot(facet=TRUE) +
  xlab('Quarter') + 
  ylab('GDP') +
  ggtitle('Quarterly US GDP. 1947:1 - 2006.1') +
  theme_hc()

1.3 `mcopper`

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(mcopper)))

## [1] "Box-Cox Lambda: 0.191904709003829"

cbind(mcopper,
      mcopper_BoxCox = BoxCox(mcopper,BoxCox.lambda(mcopper))) %>%
  autoplot(facet=TRUE) +
  xlab('Month') + 
  ylab('Price') +
  ggtitle('Monthly copper prices') +
  theme_hc()

1.4 `enplanements`

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(enplanements)))

## [1] "Box-Cox Lambda: -0.226946111237065"

cbind(enplanements,
      enplanements_BoxCox = BoxCox(enplanements,BoxCox.lambda(enplanements))) %>%
  autoplot(facet=TRUE) +
  xlab('Month') + 
  ylab('Domestic Revenue Enplanements (millions)') +
  ggtitle('Monthly US domestic enplanements: 1996-2000') +
  theme_hc()

2 Question - 3.2

Why is a Box-Cox transformation unhelpful for the cangas data?

Answer: The time series does not have a uniform seasonality. Box-Cox algorithm assumes that the transformed data is highly likely to be normally distributed when SD -> min(SD), however it does not garantee normality after transformation.

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(cangas)))

## [1] "Box-Cox Lambda: 0.576775938228139"

cbind(cangas,
      cangas_BoxCox = BoxCox(cangas,BoxCox.lambda(cangas))) %>%
  autoplot(facet=TRUE) +
  xlab('Month') + 
  ylab('Gas Production (billions of cubic metres)') +
  ggtitle('Monthly Canadian gas production: 1960.1.-2005.2.') +
  theme_hc()

3 Question - 3.3

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

3.1 Read data from Ex 2.3

#retaildata <- read_excel('retail.xlsx', skip=1)
retaildata <- import('https://raw.githubusercontent.com/oggyluky11/DATA624-SPRING-2021/main/HW_1-WEEK_2/retail.xlsx', skip=1)

3.2 Select column `A3349398A`

myts<- ts(retaildata[,"A3349398A"], frequency=12, start=c(1982,4))

3.3 Calculate Best Lambda

Answer: The best value for Lambda is 0.123156269082221 using BoxCox.lambda function. For better interpretation, I would prefer rounding the value to 1 decimal which is 0.1.

print(paste0('Box-Cox Lambda: ',BoxCox.lambda(myts)))

## [1] "Box-Cox Lambda: 0.123156269082221"

cbind(myts,
      myts_BoxCox = BoxCox(myts,BoxCox.lambda(myts))) %>%
  autoplot(facet=TRUE) +
  ggtitle('Monthly Food Retailing in Australia') +
  theme_hc()

4 Question - 3.8

For your retail time series (from Exercise 3 in Section 2.10):

4.1 a.

Split the data into two parts using.

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

4.2 b.

Check that your data have been split appropriately by producing the following plot.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

4.3 c.

Calculate forecasts using snaive applied to myts.train.

fc <- snaive(myts.train)

4.4 d.

Compare the accuracy of your forecasts against the actual values stored in myts.test.

accuracy(fc,myts.test)

##                     ME      RMSE       MAE      MPE     MAPE     MASE      ACF1
## Training set  73.94114  88.31208  75.13514 6.068915 6.134838 1.000000 0.6312891
## Test set     115.00000 127.92727 115.00000 4.459712 4.459712 1.530576 0.2653013
##              Theil's U
## Training set        NA
## Test set     0.7267171

4.5 e.

Check the residuals.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 671.41, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Do the residuals appear to be uncorrelated and normally distributed?

Answer: The residuals does not appear to be uncorrelated and normally distributed.

From the autoplot, the variation of residuals gets larger as time expends.
The ACF plot demostrates significant auto correlation.
The histogram shows right screwed distribution.

4.6 f.

How sensitive are the accuracy measures to the training/test split?

Answer: The plot below shows the accuracy metrics of both training set and test set with train-test-split cut off points from year 1985 to 2010. It shows that the metrics of training set are relatively unsensitive, however those of test set are very sensitive to train-test-split cutting point.

acc_df <- data.frame()
for (year in seq(1985, 2010)){
  myts.train <- window(myts, end=c(year-1,12))
  myts.test <- window(myts, start=year)
  fc <- snaive(myts.train)
  acc_year <- accuracy(fc,myts.test) %>%
    data.frame() %>%
    rownames_to_column()
  acc_df <- acc_df %>% rbind(cbind(year, acc_year))
  
}
acc_df %>% 
  rename(Data_Type = rowname) %>%
  select(year, Data_Type, RMSE, MAE, MAPE, MASE) %>%
  gather(key = 'Acc_Metrics', value = 'Value', -year, -Data_Type) %>%
  ggplot(aes(x = year, y = Value)) +
  geom_line() +
  facet_grid(Acc_Metrics~Data_Type, scales = 'free_y') +
  theme_hc() +
  ylab('Accuracy Metrics') +
  xlab('Train-Test-Split Cutting Point (year)') +
  ggtitle('Accuracy Metrics with different Train-Test-split')

DATA 624 - HOMEWORK 2

DATA 624 - HOMEWORK 2

1 Question - 3.1

1.1 `usnetelec`

1.2 `usgdp`

1.3 `mcopper`

1.4 `enplanements`

2 Question - 3.2

3 Question - 3.3

3.1 Read data from Ex 2.3

3.2 Select column `A3349398A`

3.3 Calculate Best Lambda

4 Question - 3.8

4.1 a.

4.2 b.

4.3 c.

4.4 d.

4.5 e.

4.6 f.

DATA 624 - HOMEWORK 2

DATA 624 - HOMEWORK 2

1 Question - 3.1

1.1 usnetelec

1.2 usgdp

1.3 mcopper

1.4 enplanements

2 Question - 3.2

3 Question - 3.3

3.1 Read data from Ex 2.3

3.2 Select column A3349398A

3.3 Calculate Best Lambda

4 Question - 3.8

4.1 a.

4.2 b.

4.3 c.

4.4 d.

4.5 e.

4.6 f.

1.1 `usnetelec`

1.2 `usgdp`

1.3 `mcopper`

1.4 `enplanements`

3.2 Select column `A3349398A`