Predictive
Analytics

Dan Wigodsky

Data 624 Homework 2

February 14, 2019

Question 3:1

For four series, we find an appropriate Box-Cox transformation to stabilize the variance.

For all of our series, the appropriate lambda is less than 1.067113, meaning variance in the later part of the series is decreased, while it increases variance in the earlier range. This is often more appropriate because many datasets get larger as time goes on.

.
Series one, Annual US net electricity generation, 1949-2003, This series calls for a lambda of .5168.
##  Time-Series [1:55] from 1949 to 2003: 296 334 375 404 447 ...
## [1] 0.5167714

Quarterly US GDP. 1947 - 2006. This series calls for a lambda of .3664.
##  Time-Series [1:237] from 1947 to 2006: 1570 1569 1568 1591 1616 ...
## [1] 0.366352

Monthly copper prices. Source: UNCTAD. This series calls for a lambda of .1919.
##  Time-Series [1:564] from 1960 to 2007: 255 260 249 258 244 ...
## [1] 0.1919047

Domestic Revenue Enplanements SOURCE: Department of Transportation, Bureau of Transportation Statistics. This series calls for a lambda of -0.2269.
##  Time-Series [1:282] from 1979 to 2002: 21.1 22.9 25.9 24.4 23.4 ...
## [1] -0.2269461

Question 3:2

The natural log, a monotonically increasing function, allows us to transform a series and preserve size relationships. It has a decreasing slope so it spreads out the left of a distribution while squeezing higher values. It works well at unskewing data. But in this case, the data has increased variance in the center of the distribution. Stretching or contracting one side or another won’t do the same in the opposite end of the interval.
##  Time-Series [1:542] from 1960 to 2005: 1.43 1.31 1.4 1.17 1.12 ...
## [1] 0.5767759

Question 3:3

The retail set from homework 1, Turnover for Clothing, footwear and personal accessory retail sector in Australian Capital Territory, could use a Box-Cox transformation of -0.0518. This retail set is appropriate for a transformation. Its variance regularly increases through the time period.
## [1] -0.05182901

Question 3:8

Train. Predict and Test

Split the data into training and testing sets.
Forecast the test data using seasonal naive method.
Compare accuracy of our seasonal naive method.
.
.
Our test root mean square error is low compared to our data, but lower for our test set than our training set. Our seasonal naive model could provide a better model. The mean error is positive for the training set, but nagative for the test set, indicating that the model postulates a higher set of values than perfect test values would imply. The mean absolute percent error is low for our test set. These are best used in comparing models. The p-value for the Ljung-Box test is incredibly low. We conclude that our residuals are not independent. Our graph of residuals shows that they are not independent. The ACF graph also concurs. We could improve these results by trying models more appropriate to our data.

##                      ME     RMSE      MAE       MPE     MAPE      MASE
## Training set  0.8279279 1.965322 1.444144  5.197633 8.689904 1.0000000
## Test set     -0.2333333 1.247330 1.041667 -1.095009 3.720904 0.7213038
##                   ACF1 Theil's U
## Training set 0.7239773        NA
## Test set     0.4263336 0.2379859

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 580.25, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
The train/test split given to us, at the end of 2010, leads to the lowest RMSE. The graph below shows 6 options, separated by 6 months each.

—————————————————————————

_______________________Appendix____________________________________________

suppressWarnings(suppressMessages(library(fpp2))) suppressWarnings(suppressMessages(library(showtext))) suppressWarnings(suppressMessages(library(ggplot2))) suppressWarnings(suppressMessages(library(kableExtra))) suppressWarnings(suppressMessages(library(expsmooth))) suppressWarnings(suppressMessages(library(gridExtra))) font_add_google(name = “Corben”, family = “corben”, regular.wt = 400, bold.wt = 700) showtext_auto()

str(usnetelec) lambda <- BoxCox.lambda(usnetelec) lambda plota<-autoplot(usnetelec,xlab=‘’)+theme(text = element_text(family = “corben”,color=’#249382’,size=22)) plotb<-autoplot(BoxCox(usnetelec,lambda),xlab=‘’,ylab=’Usnetelec, after Box-Cox’)+theme(text = element_text(family = “corben”,color=‘#249382’,size=22)) grid.arrange(plota,plotb,nrow = 1)

str(usgdp) lambda <- BoxCox.lambda(usgdp) lambda plota<-autoplot(usgdp,xlab=‘’)+theme(text = element_text(family = “corben”,color=’#249382’,size=22)) plotb<-autoplot(BoxCox(usgdp,lambda),xlab=‘’,ylab=’Usgdp, after Box-Cox’)+theme(text = element_text(family = “corben”,color=‘#249382’,size=22)) grid.arrange(plota,plotb,nrow = 1)

str(mcopper) lambda <- BoxCox.lambda(mcopper) lambda plota<-autoplot(mcopper,xlab=‘’)+theme(text = element_text(family = “corben”,color=’#249382’,size=22)) plotb<-autoplot(BoxCox(mcopper,lambda),xlab=‘’,ylab=’mcopper, after Box-Cox’)+theme(text = element_text(family = “corben”,color=‘#249382’,size=22)) grid.arrange(plota,plotb,nrow = 1)

str(enplanements) lambda <- BoxCox.lambda(enplanements) lambda plota<-autoplot(enplanements,xlab=‘’)+theme(text = element_text(family = “corben”,color=’#249382’,size=22)) plotb<-autoplot(BoxCox(enplanements,lambda),xlab=‘’,ylab=’enplanements, after Box-Cox’)+theme(text = element_text(family = “corben”,color=‘#249382’,size=22)) grid.arrange(plota,plotb,nrow = 1)

str(cangas) plota<-autoplot(cangas,xlab=‘’)+theme(text = element_text(family = “corben”,color=’#249382’,size=20)) lambda <- BoxCox.lambda(cangas) lambda plotb<-autoplot(BoxCox(cangas,lambda),xlab=‘’,ylab=’with Box-Cox’)+theme(text = element_text(family = “corben”,color=‘#249382’,size=20)) grid.arrange(plota,plotb,nrow = 2)

retaildata <- readxl::read_excel(“C:/Users/dawig/Desktop/Data624/retail.xlsx”, skip=1) turnover <- ts(retaildata[,“A3349608L”], frequency=12, start=c(1982,4)) plota<-autoplot(turnover, ylab=“turnover”, xlab=“”)+ theme(panel.background = element_rect(fill = ‘#efeae8’))+theme(text = element_text(family = “corben”,color=‘#249382’,size=20)) lambda <- BoxCox.lambda(turnover) lambda plotb<-autoplot(BoxCox(turnover,lambda), ylab=“turnover, after Box-Cox”, xlab=“”)+theme(text = element_text(family = “corben”,color=‘#249382’,size=20)) grid.arrange(plota,plotb,nrow = 1)

turnover.train <- window(turnover, end=c(2010,12)) turnover.test <- window(turnover, start=2011) last.train<-window(turnover.train, start=2008) last.turnover<-window(turnover, start=2008) plota<-autoplot(turnover,xlab=‘’) + autolayer(turnover.train, series=“Training”) + autolayer(turnover.test, series=“Test”)+theme(text = element_text(family = “corben”,color=’#249382’,size=20)) #show a naive prediction fc_1 <- naive(turnover.train) plotb<-autoplot(last.turnover,xlab=‘’,ylab=’Naive’) + autolayer(last.train, series=“Training”) + autolayer(fc_1, series=“naive prediction”)+theme(text = element_text(family = “corben”,color=‘#249382’,size=20)) #show a seasonal naive prediction fc <- snaive(turnover.train) plotc<-autoplot(last.turnover,xlab=‘’,ylab=’Seasonal Naive’) + autolayer(last.train, series=“Training”) + autolayer(fc, series=“prediction”) +theme(text = element_text(family = “corben”,color=‘#249382’,size=20)) #look at a tighter time span plotd<-autoplot(last.turnover,xlab=‘’,ylab=’‘) + autolayer(last.train, series=“Training”) + autolayer(turnover.test, series=“Test”)+theme(text = element_text(family = “corben”,color=’#249382’,size=20)) grid.arrange(plota,plotd,plotc,plotb,nrow = 2) accuracy(fc,turnover.test) RMSE_set<-rep(0,6) RMSE_set[4]<-accuracy(fc,turnover.test)[4] checkresiduals(fc)

turnover.train <- window(turnover, end=c(2009,6)) turnover.test <- window(turnover, start=c(2009,7)) fc <- snaive(turnover.train) RMSE_set[1]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2009,12)) turnover.test <- window(turnover, start=2010) fc <- snaive(turnover.train) RMSE_set[2]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2010,6)) turnover.test <- window(turnover, start=c(2010,7)) fc <- snaive(turnover.train) RMSE_set[3]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2011,6)) turnover.test <- window(turnover, start=c(2011,7)) fc <- snaive(turnover.train) RMSE_set[5]<-accuracy(fc,turnover.test)[4]

turnover.train <- window(turnover, end=c(2012,12)) turnover.test <- window(turnover, start=2013) fc <- snaive(turnover.train) RMSE_set[6]<-accuracy(fc,turnover.test)[4]

ggplot()+geom_line(aes(y=RMSE_set,x=seq_along(RMSE_set)),size=1.5,color=‘#3dc666’)+scale_x_discrete(limits=c(2,4,6), labels=c(“Dec. 2009”,“Dec. 2010”,“Dec. 2011”))+labs(y=‘RMSE’,x=‘train/test split’)+theme(text = element_text(family = “corben”,color=‘#3dc666’,size=20))