Question 1

Are the following statements true or false? Explain your answer.

Question 2

Use the Dow Jones index (data set dowjones) to do the following:

autoplot(dowjones) + ggtitle({"Dow-Jones Index"}) + xlab("Day") + ylab("Dollars")

#Forecast the Dow-Jones indices for the next 20 days
autoplot(dowjones) + autolayer(rwf(dowjones, h = 20, drift = TRUE), PI = TRUE) + ggtitle({"Dow-Jones Index Forecasting"}) + xlab("Day") + ylab("Dollars")

autoplot(dowjones) + 
  autolayer(meanf(dowjones, h = 20), series = "Mean", PI = FALSE) +
  autolayer(rwf(dowjones, h = 20), series = "Naïve", PI = FALSE) +
  autolayer(rwf(dowjones, h = 20, drift = TRUE), series = "Drift", PI = FALSE) +
  ggtitle({"Dow-Jones Index Forecasting"}) + xlab("Day") + ylab("Dollars") + guides(colour = guide_legend(title = "Forecast"))

The naïve method is best for these data.

Question 3

Consider the daily closing IBM stock prices (data set ibmclose).

p1 <- autoplot(ibmclose, main = NULL) + geom_smooth() + xlab("Day") + ylab("Dollars")
p2 <- ggAcf(ibmclose, main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "Closing IBM Stock Price")

Bidirectional trend is observed in the dataset. However, there is no seasonal or cyclic pattern.

ibmclose_test <- tail(ibmclose, 69)
ibmclose_train <- head(ibmclose, 300)
ibmfit1 <- meanf(ibmclose_train, h = 69)
ibmfit2 <- rwf(ibmclose_train, h = 69)
ibmfit3 <- rwf(ibmclose_train, h = 69, drift = TRUE)
ibmfit4 <- snaive(ibmclose_train, h = 69)

# Since this dataset does not have seasonality, forecasting of the Naïve method is same as the Seasonal Naïve method. 
autoplot(ibmclose) + 
  autolayer(ibmfit1, series = "Mean", PI = FALSE) +
  #autolayer(ibmfit2, series = "Naïve", PI = FALSE) +
  autolayer(ibmfit3, series = "Drift", PI = FALSE) +
  autolayer(ibmfit4, series = "Seaonal Naïve", PI = FALSE) +
  ggtitle({"Closing IBM Stock Price Forecasting"}) + xlab("Day") + ylab("Dollars") + guides(colour = guide_legend(title = "Forecast"))

According to the time plot above, the Drift method is the best.Because the forecasting results are close to the actual values, and it shows a downward trend.

checkresiduals(rwf(ibmclose, drift = TRUE))

## 
##  Ljung-Box test
## 
## data:  Residuals from Random walk with drift
## Q* = 14.064, df = 9, p-value = 0.12
## 
## Model df: 1.   Total lags used: 10

The residual plots reveal the following features:

According to the results of Ljung-Box test, the results are not significant with large p-value. Therefore, we can conclude that the residuals resemble while noise.

Question 4

Repeat the exercise for the data set hsales. (Split the data set into a training set and a test set, where the test set is the last two years of data.)

autoplot(hsales) + ggtitle({"Monthly Sales of One-family Houses, USA"}) + xlab("Month") + ylab("Number of Houses")

There is no clear trend in this time series data. However, there might be a seasonal or cyclic pattern. Thus, seasonal plots are created to verify this observation.

p1 <- ggsubseriesplot(hsales, year.labels = TRUE, year.labels.left = TRUE, main = NULL) + xlab("Month") + ylab("Number of Houses")
p2 <- ggAcf(hsales, main = NULL)
grid.arrange(p1, p2, ncol = 2, top = "Monthly Sales of One-family Houses, USA")

According to the subseries plot, a seasonal pattern is observed, where the number of houses sold increases from January to March, and then decreases till the end of year. The correlogram shows the same results that there exists seasonality in the dataset.

hsales_test <- window(hsales, start = 1994)
hsales_train <- window(hsales, end = c(1993, 12))
hsalesfit1 <- meanf(hsales_train, h = 12*2)
hsalesfit2 <- rwf(hsales_train, h = 12*2)
hsalesfit3 <- rwf(hsales_train, h = 12*2, drift = TRUE)
hsalesfit4 <- snaive(hsales_train, h = 12*2)

autoplot(hsales) + 
  autolayer(hsalesfit1, series = "Mean", PI = FALSE) +
  autolayer(hsalesfit2, series = "Naïve", PI = FALSE) +
  autolayer(hsalesfit3, series = "Drift", PI = FALSE) +
  autolayer(hsalesfit4, series = "Seaonal Naïve", PI = FALSE) +
  ggtitle({"Monthly Sales of One-family Houses Forecasting"}) + xlab("Month") + ylab("Number of Houses") + guides(colour = guide_legend(title = "Forecast"))

According to the time plot above, the Seasonal Naïve method is the best.Because the forecasting results are close to the actual values, and it matches both trend and seasonality.

checkresiduals(snaive(hsales))

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 700.44, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

The residual plots reveal the following features:

According to the results of Ljung-Box test, the residuals are significant with p-value smaller than 0.05. Therefore, we can conclude that the residuals have some remaining autocorrelation. There is information remaining unexplained in the forecasting model.

Question 5

Calculate the residuals from a seasonal naïve forecast applied to the WWWusage and bricksq data. Test if the residuals are white noise and normally distributed. What do you conclude?

WWWusage_res <- residuals(snaive(WWWusage))
checkresiduals(snaive(WWWusage))

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 145.58, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

The residual plots reveal the following features:

res <- residuals(snaive(bricksq))
checkresiduals(snaive(bricksq))

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 233.2, df = 8, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 8

The residual plots reveal the following features:

Question 6

For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. dole, usgdp, bricksq, enplanements.

If necessary, find an appropriate Box-Cox transformation in order to stabilize the variance.

  1. dole Data
    Data Description: Monthly total of people on unemployment benefits in Australia, from Jan 1965 to Jul 1992.
lambda <- BoxCox.lambda(dole)
print(lambda)
## [1] 0.3290922
df <- cbind(Raw = dole, BoxCox = BoxCox(dole, lambda))

autoplot(df, facets = TRUE) + ggtitle("People on Unemployment Benefits in Australia (Jan 1965 - Jul 1992)")+ ylab("Number of People") + xlab("Month")

According to the time plot, the original data has a clearly upward tend; however, no seasonal pattern is observed. The variation increases with the level of the series. Therefore, transformation can be helpful.

Comparing the time plots before and after the Box-Cox Transformation, it is observed that this transformation can help stable the seasonal variation across the whole series; meanwhile, the upward trend retains. The optimal value of lambda, which is equal to 0.329, is determined by the BoxCox.lambda() function.

  1. usgdp Data
    Data Description: Quarterly US GDP from Jan 1947 to Jan 2006.
autoplot(usgdp, main = "Raw Data") + ylab("Billions") + xlab("Quarterly") + ggtitle("Quarterly US GDP (Jan 1947 - Jan 2006)")

According to the time plot, the usgdp data has a clearly increasing trend, but no seasonality. Meanwhile, the variation is constantly small over time. Therefore, no transformation will not help.

  1. bricksq Data
    Data Description: Australian quarterly clay brick production from 1956 to 1994.
lambda <- BoxCox.lambda(bricksq)
print(lambda)
## [1] 0.2548929
df <- cbind(Raw = bricksq, BoxCox = BoxCox(bricksq, lambda))
autoplot(df, facets = TRUE) + ggtitle("US Domestic Monthly Revenue Enplanements (1996 - 2000)")+ ylab("Millions") + xlab("Month")

According to the time plot, the bricksq data has an increasing tend over time, and a strong seasonal pattern is observed. The variation increases with the level of the series. Therefore, transformation can be helpful.

Comparing the time plots before and after the Box-Cox Transformation, it is observed that this transformation can help stable the seasonal variation across the whole series. The optimal value of lambda, which is equal to 0.255, is deted retainrmined by the BoxCox.lambda() function.

  1. enplanements Data
    Data Description: US Domestic Monthly Revenue Enplanements (millions), from 1996 to 2000.
lambda <- BoxCox.lambda(enplanements)
print(lambda)
## [1] -0.2269461
df <- cbind(Raw = enplanements, BoxCox = BoxCox(enplanements, lambda))
autoplot(df, facets = TRUE) + ggtitle("US Domestic Monthly Revenue Enplanements (1996 - 2000)")+ ylab("Millions") + xlab("Month")

According to the time plot, the enplanements data has a clearly increasing tend, and a strong seasonal pattern is observed. The variation increases with the level of the series. Therefore, transformation can be helpful.

Comparing the time plots before and after the Box-Cox Transformation, it is observed that this transformation can help stable the seasonal variation across the whole series. The optimal value of lambda, which is equal to -0.227, is deted retainrmined by the BoxCox.lambda() function.