1. Are the following statements true or false? Explain your answer.

  1. Good forecast methods should have normally distributed residuals. # False. it is good to have, but not a must.
  2. A model with small residuals will give good forecasts. # False. A good forecasting method will yield residuals with the following properties: 1. The residuals are uncorrelated. 2. The residuals have zero mean. 3. The residuals have constant variance. 4. The residuals are normally distributed.
  3. The best measure of forecast accuracy is MAPE. # False. It depends on the data.
  4. If your model doesn’t forecast well, you should make it more complicated. #False. Complicated forecast method does not guarantee better forecast.
  5. Always choose the model with the best forecast accuracy as measured on the test set. # False. Residuals should be considered as well.

2. Use the Dow Jones index (data set dowjones) to do the following:

Produce a time plot of the series

library(fma)
## Loading required package: forecast
autoplot(dowjones)

#Produce forecasts using the drift method and plot them.

drift_dj <- rwf(dowjones, drift = TRUE)
autoplot(drift_dj)

#Try using some of the other basic forecast functions to forecast the same data set.

mean_dj<-meanf(dowjones)
autoplot(mean_dj)

naive_dj <- naive(dowjones)
autoplot(naive_dj)

snaive_dj <- snaive(dowjones, h = 10)
autoplot(snaive_dj)

Among all the four Basic Forecasting Methods, I think the Naive Method is the best one because it is difficult to predict stock prices with past observations. It would be better to take the value of the last observation in this case.

3. Consider the daily closing IBM stock prices (data set ibmclose).

Produce some plots of the data in order to become familiar with it.

library(fma)
autoplot(ibmclose)

#Split the data into a training set of 300 observations and a test set of 69 observations.

ibm_training <- subset(ibmclose, end=300)
ibm_test <- subset(ibmclose,start=301)

Try using various basic methods to forecast the training set and compare the results on the test set. Which method did best?

mean_ibm <- meanf(ibm_training,h=69)
naive_ibm<- naive(ibm_training, h=69)
drift_ibm<- rwf(ibm_training, drift = TRUE, h=69)
snaive_ibm<-snaive(ibm_training, h=69)

autoplot(mean_ibm)+autolayer(ibm_test)

autoplot(naive_ibm)+autolayer(ibm_test)

autoplot(drift_ibm)+autolayer(ibm_test)

autoplot(snaive_ibm)+autolayer(ibm_test)

# check the accuracy of IBM training and IBM test
accuracy(mean_ibm,ibm_test)
##                         ME      RMSE       MAE        MPE     MAPE
## Training set  1.660438e-14  73.61532  58.72231  -2.642058 13.03019
## Test set     -1.306180e+02 132.12557 130.61797 -35.478819 35.47882
##                  MASE      ACF1 Theil's U
## Training set 11.52098 0.9895779        NA
## Test set     25.62649 0.9314689  19.05515
accuracy(naive_ibm,ibm_test)
##                      ME      RMSE      MAE         MPE     MAPE     MASE
## Training set -0.2809365  7.302815  5.09699 -0.08262872 1.115844 1.000000
## Test set     -3.7246377 20.248099 17.02899 -1.29391743 4.668186 3.340989
##                   ACF1 Theil's U
## Training set 0.1351052        NA
## Test set     0.9314689  2.973486
accuracy(drift_ibm,ibm_test)
##                        ME      RMSE       MAE         MPE     MAPE
## Training set 2.870480e-14  7.297409  5.127996 -0.02530123 1.121650
## Test set     6.108138e+00 17.066963 13.974747  1.41920066 3.707888
##                  MASE      ACF1 Theil's U
## Training set 1.006083 0.1351052        NA
## Test set     2.741765 0.9045875  2.361092
accuracy(snaive_ibm,ibm_test)
##                      ME      RMSE      MAE         MPE     MAPE     MASE
## Training set -0.2809365  7.302815  5.09699 -0.08262872 1.115844 1.000000
## Test set     -3.7246377 20.248099 17.02899 -1.29391743 4.668186 3.340989
##                   ACF1 Theil's U
## Training set 0.1351052        NA
## Test set     0.9314689  2.973486

Drift method is the best because it has the minimum MAE, RMSE, MAPE and MASE.

Check the residuals of your preferred method. Do they resemble white noise?

checkresiduals(mean_ibm)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 2697.2, df = 9, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 10
checkresiduals(naive_ibm)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 22.555, df = 10, p-value = 0.01251
## 
## Model df: 0.   Total lags used: 10
checkresiduals(drift_ibm)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 22.555, df = 9, p-value = 0.007278
## 
## Model df: 1.   Total lags used: 10
checkresiduals(snaive_ibm)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 22.555, df = 10, p-value = 0.01251
## 
## Model df: 0.   Total lags used: 10

Based on the residual Diagnosis, Naive method is the best because it has the smallest error. They do not resemble white noise.

4. Repeat the exercise for the data set hsales . (Split the data set into a training set and a test set, where the test set is the last two years of data.)

library(fma)
autoplot(hsales)

hsales_training <- subset(hsales, end = length(hsales) - 24)
hsales_test <- subset(hsales, start = length(hsales) - 23)

mean_hsales <- meanf(hsales_training, h = 24)
naive_hsales <- naive(hsales_training, h = 24)
drift_hsales <- rwf(hsales_training, drift = TRUE, h = 24)
snaive_hsales <- snaive(hsales_training, h = 24)

autoplot(mean_hsales) +
  autolayer(hsales_test)

autoplot(naive_hsales) +
  autolayer(hsales_test)

autoplot(drift_hsales) +
  autolayer(hsales_test)

autoplot(snaive_hsales) +
  autolayer(hsales_test)

#check the accuracy of hsales training and hsales test
accuracy(mean_hsales,hsales_test)
##                        ME      RMSE      MAE       MPE     MAPE      MASE
## Training set 3.510503e-15 12.162811 9.532738 -6.144876 20.38306 1.1234341
## Test set     3.839475e+00  9.022555 7.561587  4.779122 13.26183 0.8911338
##                   ACF1 Theil's U
## Training set 0.8661998        NA
## Test set     0.5377994  1.131713
accuracy(naive_hsales,hsales_test)
##                     ME     RMSE      MAE       MPE      MAPE      MASE
## Training set -0.008000 6.301111 5.000000 -0.767457  9.903991 0.5892505
## Test set      2.791667 8.628924 7.208333  2.858639 12.849194 0.8495028
##                   ACF1 Theil's U
## Training set 0.1824472        NA
## Test set     0.5377994  1.098358
accuracy(drift_hsales,hsales_test)
##                        ME     RMSE      MAE        MPE      MAPE      MASE
## Training set 1.506410e-15 6.301106 4.999872 -0.7511048  9.903063 0.5892354
## Test set     2.891667e+00 8.658795 7.249000  3.0426108 12.901697 0.8542954
##                   ACF1 Theil's U
## Training set 0.1824472        NA
## Test set     0.5378711  1.100276
accuracy(snaive_hsales,hsales_test)
##                     ME      RMSE      MAE       MPE      MAPE      MASE
## Training set 0.1004184 10.582214 8.485356 -2.184269 17.633696 1.0000000
## Test set     1.0416667  5.905506 4.791667  0.972025  8.545729 0.5646984
##                   ACF1 Theil's U
## Training set 0.8369786        NA
## Test set     0.1687797 0.7091534

Seasonal Naive method did the best.

#check residuals
checkresiduals(mean_hsales)

## 
##  Ljung-Box test
## 
## data:  Residuals from Mean
## Q* = 887.75, df = 23, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 24
checkresiduals(naive_hsales)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 322.61, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24
checkresiduals(drift_hsales)

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 322.61, df = 23, p-value < 2.2e-16
## 
## Model df: 1.   Total lags used: 24
checkresiduals(snaive_hsales)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 682.2, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residuals do not resemble white noise.

5. Calculate the residuals from a seasonal naïve forecast applied to the WWWusage and bricksq data.

# seasonal naive method of WWWusage
snaive_WWW <- snaive(WWWusage)
autoplot(snaive_WWW)

checkresiduals(snaive_WWW)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 145.58, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10
#seasonal naive method of bricksq
snaive_bricksq <- snaive(bricksq)
autoplot(snaive_bricksq)

checkresiduals(snaive_bricksq)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 233.2, df = 8, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 8

The residuals of both WWWusage and bricksq are not white noise and are not normally distributed.

6. For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. dole, usgdp, bricksq, enplanements.

# data set dole
autoplot(dole)

lambda_dole <- BoxCox.lambda(dole)
autoplot(BoxCox(dole, lambda_dole))

# data set usgdp
library(fpp2)
## Loading required package: ggplot2
## Loading required package: expsmooth
autoplot(usgdp)

lambda_usgdp <- BoxCox.lambda(usgdp)
autoplot(BoxCox(usgdp, lambda_usgdp))

# data set bricksq
autoplot(bricksq)

lambda_bricksq <- BoxCox.lambda(bricksq)
autoplot(BoxCox(bricksq, lambda_bricksq))

# data set enplanements
autoplot(enplanements)

lambda_enplanements <- BoxCox.lambda(enplanements)
autoplot(BoxCox(enplanements, lambda_enplanements))

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(fma)
plot(ibmclose)

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.