Question1

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals. Ans:True,For a good model residual plot should be normally distributed.

A model with small residuals will give good forecasts. Ans:False.It maynot be correct always.If all the residuals are coming positive and if we take the mean,we maynot get zero. To get a good forecast, mean should be zero, along with that we have to see the constant variance and normal distribution.

The best measure of forecast accuracy is MAPE. Ans:False.The MAPE is scale sensitive and should not be used when working with low-volume data.

If your model doesn’t forecast well, you should make it more complicated. Ans:False. A complicated model may not assure a better accuracy. We may need to adjust the data and require strong domain knowledge inorder to find the root cause of bad model.

Always choose the model with the best forecast accuracy as measured on the test set. Ans: True. We have to rely on the accuracy based on the test data. Overfitting may happen if we rely on training data and that is one of the reason we are focusing on test data.

Question2

library(fpp2)

## Warning: package 'fpp2' was built under R version 3.5.2

## Loading required package: ggplot2

## Loading required package: forecast

## Warning: package 'forecast' was built under R version 3.5.2

## Loading required package: fma

## Warning: package 'fma' was built under R version 3.5.2

## Loading required package: expsmooth

## Warning: package 'expsmooth' was built under R version 3.5.2

#2.a.Produce a time plot of the series.
autoplot(dowjones)

#2.b.Produce forecasts using the drift method and plot them.
drift_fct<- rwf(dowjones,13,drift=TRUE)
autoplot(drift_fct)+
    ggtitle("Dow-Jones index, 28 Aug - 18 Dec 1972.")+
  ylab("Closing Price") +
  xlab("Days")

#2.c.Try using some of the other basic forecast functions to forecast the same data set. Which do you think is best? Why?

mean_fct <- meanf(dowjones,h=13)
autoplot(mean_fct) +  ggtitle("Dow-Jones index, 28 Aug - 18 Dec 1972.")+
  ylab("Closing Price") +
  xlab("Days")

naive_fct <- rwf(dowjones,h=13)
autoplot(naive_fct)+  ggtitle("Dow-Jones index, 28 Aug - 18 Dec 1972.")+
  ylab("Closing Price") +
  xlab("Days")

#Plot all the forecast functions in a single graph
autoplot(dowjones) +
  autolayer(meanf(dowjones,h=13),series="mean",PI=FALSE) +
  autolayer(rwf(dowjones,h=13,drift=TRUE),series="Drift",PI=FALSE) +
  autolayer(rwf(dowjones,h=13),series="Naive",PI=FALSE)+
  ggtitle("Dow-Jones index, 28 Aug - 18 Dec 1972.")+
  ylab("Closing Price") +
  xlab("Days")

#Check the residuals of all the forecast functions
checkresiduals(drift_fct)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 35.039, df = 9, p-value = 5.865e-05
## 
## Model df: 1.   Total lags used: 10

checkresiduals(mean_fct)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 646.45, df = 9, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 10

checkresiduals(naive_fct)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk
## Q* = 35.039, df = 10, p-value = 0.000123
## 
## Model df: 0.   Total lags used: 10

I will go with naive forcast function as during the recent days the stock price is declining . So instead of considering the all the historical data,i will go with last observation to predict the future.

Question3

?ibmclose

## starting httpd help server ... done

#3.a.Produce some plots of the data in order to become familiar with it.
autoplot(ibmclose)

length(ibmclose)

## [1] 369

#3.b.Split the data into a training set of 300 observations and a test set of 69 observations.
train_ibm <- subset(ibmclose,end=300)
test_ibm<-subset(ibmclose,start=301)
#3.c.Try using various basic methods to forecast the training set and compare the results on the test set. Which method did best?

mean_fit3 <- meanf(train_ibm, h = 69)
naive_fit3 <- rwf(train_ibm, h = 69)
snaive_fit3 <- snaive(train_ibm, h = 69)
drift_fit3 <- rwf(train_ibm, drift = TRUE, h = 69)

autoplot(mean_fit3)+autolayer(test_ibm)

autoplot(naive_fit3)+autolayer(test_ibm)

autoplot(snaive_fit3)+autolayer(test_ibm)

autoplot(drift_fit3)+autolayer(test_ibm)

accuracy(mean_fit3, test_ibm)

##                         ME      RMSE       MAE        MPE     MAPE
## Training set  1.660438e-14  73.61532  58.72231  -2.642058 13.03019
## Test set     -1.306180e+02 132.12557 130.61797 -35.478819 35.47882
##                  MASE      ACF1 Theil's U
## Training set 11.52098 0.9895779        NA
## Test set     25.62649 0.9314689  19.05515

accuracy(naive_fit3, test_ibm)

##                      ME      RMSE      MAE         MPE     MAPE     MASE
## Training set -0.2809365  7.302815  5.09699 -0.08262872 1.115844 1.000000
## Test set     -3.7246377 20.248099 17.02899 -1.29391743 4.668186 3.340989
##                   ACF1 Theil's U
## Training set 0.1351052        NA
## Test set     0.9314689  2.973486

accuracy(snaive_fit3, test_ibm)

##                      ME      RMSE      MAE         MPE     MAPE     MASE
## Training set -0.2809365  7.302815  5.09699 -0.08262872 1.115844 1.000000
## Test set     -3.7246377 20.248099 17.02899 -1.29391743 4.668186 3.340989
##                   ACF1 Theil's U
## Training set 0.1351052        NA
## Test set     0.9314689  2.973486

accuracy(drift_fit3, test_ibm)

##                        ME      RMSE       MAE         MPE     MAPE
## Training set 2.870480e-14  7.297409  5.127996 -0.02530123 1.121650
## Test set     6.108138e+00 17.066963 13.974747  1.41920066 3.707888
##                  MASE      ACF1 Theil's U
## Training set 1.006083 0.1351052        NA
## Test set     2.741765 0.9045875  2.361092

#3.d.Check the residuals of your preferred method. Do they resemble white noise?
checkresiduals(drift_fit3)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 22.555, df = 9, p-value = 0.007278
## 
## Model df: 1.   Total lags used: 10

Answer 3.c. Here the least error we have got is to drift forecast method. so we will go with this model. Answer 3.d No in drift method they doesnot resemble as white noise

Question4.

#4.a. Repeat the exercise for the data set hsales . (Split the data set into a training set and a test set, where the test set is the last two years of data.)
?hsales
autoplot(hsales)

length(hsales)

## [1] 275

frequency(hsales)

## [1] 12

hsales_training <- subset(hsales,end=length(hsales)-24)
length(hsales_training)

## [1] 251

hsales_test <- subset(hsales,start=252)
length(hsales_test)

## [1] 24

#4.b.Try using various basic methods to forecast the training set and compare the results on the test set. Which method did best?
mean_fit4 <- meanf(hsales_training, h = 24)
naive_fit4 <- naive(hsales_training, h = 24)
drift_fit4 <- rwf(hsales_training, drift = TRUE, h = 24)
snaive_fit4 <- snaive(hsales_training, h = 24)

#Plotting the model with test data
autoplot(mean_fit4)+autolayer(hsales_test)

autoplot(naive_fit4)+autolayer(hsales_test)

autoplot(snaive_fit4)+autolayer(hsales_test)

autoplot(drift_fit4)+autolayer(hsales_test)

#Checking the accuracy of hte models.
accuracy(mean_fit4, hsales_test)

##                        ME      RMSE      MAE       MPE     MAPE      MASE
## Training set 3.510503e-15 12.162811 9.532738 -6.144876 20.38306 1.1234341
## Test set     3.839475e+00  9.022555 7.561587  4.779122 13.26183 0.8911338
##                   ACF1 Theil's U
## Training set 0.8661998        NA
## Test set     0.5377994  1.131713

accuracy(naive_fit4, hsales_test)

##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -0.008000 6.301111 5.000000 -0.767457  9.903991 0.5892505
## Test set      2.791667 8.628924 7.208333  2.858639 12.849194 0.8495028
##                   ACF1 Theil's U
## Training set 0.1824472        NA
## Test set     0.5377994  1.098358

accuracy(snaive_fit4, hsales_test)

##                     ME      RMSE      MAE       MPE      MAPE      MASE
## Training set 0.1004184 10.582214 8.485356 -2.184269 17.633696 1.0000000
## Test set     1.0416667  5.905506 4.791667  0.972025  8.545729 0.5646984
##                   ACF1 Theil's U
## Training set 0.8369786        NA
## Test set     0.1687797 0.7091534

accuracy(drift_fit4, hsales_test)

##                        ME     RMSE      MAE        MPE      MAPE      MASE
## Training set 1.506410e-15 6.301106 4.999872 -0.7511048  9.903063 0.5892354
## Test set     2.891667e+00 8.658795 7.249000  3.0426108 12.901697 0.8542954
##                   ACF1 Theil's U
## Training set 0.1824472        NA
## Test set     0.5378711  1.100276

#4.c.Check the residuals of your preferred method. Do they resemble white noise?
checkresiduals(snaive_fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 682.2, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Answer 4.b. Here the least error we have got is to seasonal naive forecast method. so we will go with this model. Answer 4.c No in seasonal naive method they doesnot resemble as white noise

Question 5

#5. Calculate the residuals from a seasonal naïve forecast applied to the WWWusage and bricksq data .
#Test if the residuals are white noise and normally distributed. What do you conclude?

?WWWusage
snaive_fct51 <- snaive(WWWusage)
autoplot(snaive_fct51)

checkresiduals(snaive_fct51)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 145.58, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

?bricksq
snaive_fct52 <- snaive(bricksq)
autoplot(snaive_fct52)

checkresiduals(snaive_fct52)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 233.2, df = 8, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 8

5.WWWusage Here the majority of spikes in ACF is above the blue dash lines, that we can classify this series not as a white noise. In the third graph distribution is not centered at 0 and it is skewed to right and it signifies that residuals are not normally distributed.

bricksq Here the majority of spikes in ACF is above the blue dash lines, that we can classify this series not as a white noise. In the third graph because of the long tail to left, residuals may not be normally distributed.

Question6

#6. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. dole, usgdp, bricksq, enplanements.

#If necessary, find an appropriate Box-Cox transformation in order to stabilize the variance.

#?dole
lambda <- BoxCox.lambda(dole)
print(lambda)

## [1] 0.3290922

dole_df <- cbind(Raw = dole, BoxCox = BoxCox(dole, lambda))

autoplot(dole_df, facets = TRUE) + ggtitle("Unemployment benefits in Australia)")+ ylab("Number of People") + xlab("Month")

#?#usgdp
autoplot(usgdp)

lambda_usgdp <- BoxCox.lambda(usgdp)
print(lambda_usgdp)

## [1] 0.366352

autoplot(BoxCox(usgdp, lambda_usgdp))

#?#bricksq
autoplot(bricksq)

lambda_bricksq <- BoxCox.lambda(bricksq)
print(lambda_bricksq)

## [1] 0.2548929

autoplot(BoxCox(bricksq, lambda_bricksq))

#?enplanements
enplanements_df <- cbind(Raw = enplanements, BoxCox = BoxCox(enplanements, lambda))
autoplot(enplanements_df, facets = TRUE) + ggtitle("Monthly US domestic enplanements)")+ ylab("Millions") + xlab("Month")

dole data has a positive trend,it doesnot show any seasonal pattern.The variations increases with time and hence transformation can be significant.

usgdp data has a positive trend before and after transformation. However after transformation, there seems to be less variance over time and hence transformation may not help

bricksq data shows a clear seasonal pattern and shows a significant positive trend.It looks like this transformation may help the seasonal pattern across the whole time series.

Enplanements data shows a clear seasonal pattern and shows a significant positive trend across time except the dip at last few years.It looks like this transformation may help the seasonal pattern across the whole time series.

DSCI-FXP-SP19- Assignment #2 Forecasting Basics Nidhin