Homework2_data624

#install.packages('fpp2')
library(fpp2)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──

## ✔ ggplot2   3.4.4      ✔ fma       2.5   
## ✔ forecast  8.21.1     ✔ expsmooth 2.3

##

library(forecast)

For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance.

usnetelec usgdp mcopper enplanements

lambda <- BoxCox.lambda(usnetelec)
#> [1] 0.5167714
autoplot(BoxCox(usnetelec,lambda))

lambda <- BoxCox.lambda(usgdp)
#> [1] 0.366352
autoplot(BoxCox(usgdp,lambda))

lambda <- BoxCox.lambda(usgdp)
#> [1] 0.366352
autoplot(BoxCox(usgdp,lambda))

lambda <- BoxCox.lambda(usgdp)
#> [1] 0.366352
autoplot(BoxCox(usgdp,lambda))

2) Why is a Box-Cox transformation unhelpful for the cangas data?

(lambda <- BoxCox.lambda(cangas))

## [1] 0.5767759

autoplot(cangas)

#> [1] 0.5767759
autoplot(BoxCox(cangas,lambda))

# After applying the box cox it does not appear to make any change. THe seasonal variation does not improve.

What Box-Cox transformation would you select for your retail data (from Exercise 3 in Section 2.10)?

retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349337W"],
  frequency=12, start=c(1982,4))

(lambda <- BoxCox.lambda(myts))

## [1] 0.9165544

#> [1] 0.9165544
autoplot(BoxCox(myts,lambda))

autoplot(myts)

(4) For each of the following series, make a graph of the data. If transforming seems appropriate, do so and describe the effect. dole, usdeaths, bricksq.

autoplot(dole)

autoplot(usdeaths)

autoplot(bricksq)

(lambda <- BoxCox.lambda(dole))

## [1] 0.3290922

#> [1] 0.3290922
autoplot(BoxCox(dole,lambda))

(lambda <- BoxCox.lambda(usdeaths))

## [1] -0.03363775

#> [1] 0.5167714
autoplot(BoxCox(usdeaths,lambda))

#did not need transformation

(lambda <- BoxCox.lambda(bricksq))

## [1] 0.2548929

#> [1] 0.5167714
autoplot(BoxCox(bricksq,lambda))

#did not need transformation

#Calculate the residuals from a seasonal naïve forecast applied to the quarterly Australian beer production data from 1992. The following code will help.

beer <- window(ausbeer, start=1992)
fc <- snaive(beer)
autoplot(fc)

res <- residuals(fc)
autoplot(res)

#Test if the residuals are white noise and normally distributed.

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 32.269, df = 8, p-value = 8.336e-05
## 
## Model df: 0.   Total lags used: 8

#What do you conclude?
#The distribution does not look like a normal distribution. Autocorrelation is also present.

Are the following statements true or false? Explain your answer.

Good forecast methods should have normally distributed residuals.False. Good forecast methods do not necessarily require normally distributed residuals. While it is often desirable for residuals (the differences between the observed values and the forecasted values) to be normally distributed for certain statistical tests and assumptions, it’s not a strict requirement for a forecast method to be effective.
A model with small residuals will give good forecasts.False. While small residuals can be an indication of a well-fitting model, it is not a guarantee of good forecasts.
The best measure of forecast accuracy is MAPE.False. While Mean Absolute Percentage Error (MAPE) is a commonly used measure of forecast accuracy, it may not always be the best measure, especially in cases where the data contains extreme values or zeros, as it can result in division by zero.
If your model doesn’t forecast well, you should make it more complicated. False. Increasing the complexity of a model in an attempt to improve forecasting performance can sometimes lead to overfitting, where the model learns the noise in the data rather than the underlying patterns.
Always choose the model with the best forecast accuracy as measured on the test set.False. While choosing the model with the best forecast accuracy on the test set is generally a good practice, it’s important to consider other factors such as model interpretability, computational efficiency, and the specific requirements of the forecasting task.

For your retail time series (from Exercise 3 in Section 2.10):

myts.train <- window(myts, end=c(2010,12))
myts.test <- window(myts, start=2011)

Split the data into two parts using

myts.train <- window(myts, end=c(2010,12)) myts.test <- window(myts, start=2011) Check that your data have been split appropriately by producing the following plot.

autoplot(myts) + autolayer(myts.train, series=“Training”) + autolayer(myts.test, series=“Test”) Calculate forecasts using snaive applied to myts.train.

autoplot(myts) +
  autolayer(myts.train, series="Training") +
  autolayer(myts.test, series="Test")

fc <- snaive(myts.train) Compare the accuracy of your forecasts against the actual values stored in myts.test.

fc <- snaive(myts.train)
accuracy(fc,myts.test)

##                     ME     RMSE      MAE      MPE      MAPE      MASE      ACF1
## Training set  9.460661 26.30758 21.23363 4.655690 12.762886 1.0000000 0.8070166
## Test set     17.212500 21.26067 17.39583 4.748234  4.807728 0.8192584 0.4843871
##              Theil's U
## Training set        NA
## Test set     0.6934111

checkresiduals(fc)

## 
##  Ljung-Box test
## 
## data:  Residuals from Seasonal naive method
## Q* = 856.11, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

accuracy(fc,myts.test) Check the residuals.

checkresiduals(fc) Do the residuals appear to be uncorrelated and normally distributed? The distribution does not appear normal. We have outliers.The p-value from the Ljung-Box test is extremely small, indicating strong evidence against the null hypothesis of no autocorrelation in the residuals. Therefore, the residuals do not appear to be uncorrelated.

How sensitive are the accuracy measures to the training/test split? I think it is sensitive.

(9)visnights contains quarterly visitor nights (in millions) from 1998 to 2016 for twenty regions of Australia.

Use window() to create three training sets for visnights[,“QLDMetro”], omitting the last 1, 2 and 3 years; call these train1, train2, and train3, respectively. For example train1 <- window(visnights[, “QLDMetro”], end = c(2015, 4)).

Compute one year of forecasts for each training set using the snaive() method. Call these fc1, fc2 and fc3, respectively.

Use accuracy() to compare the MAPE over the three test sets. Comment on these.

# Create training sets using window function
train1 <- window(visnights[, "QLDMetro"], end = c(2015, 4))
train2 <- window(visnights[, "QLDMetro"], end = c(2014, 4))
train3 <- window(visnights[, "QLDMetro"], end = c(2013, 4))

# Generate forecasts using snaive method
fc1 <- forecast(snaive(train1), h = 4)
fc2 <- forecast(snaive(train2), h = 4)
fc3 <- forecast(snaive(train3), h = 4)

# Calculate accuracy measures for each forecast
accuracy_fc1 <- accuracy(fc1)
accuracy_fc2 <- accuracy(fc2)
accuracy_fc3 <- accuracy(fc3)

# Extract MAPE values
mape_fc1 <- accuracy_fc1[,"MAPE"]
mape_fc2 <- accuracy_fc2[,"MAPE"]
mape_fc3 <- accuracy_fc3[,"MAPE"]

# Print MAPE values
print(mape_fc1)

## [1] 7.97676

print(mape_fc2)

## [1] 8.284216

print(mape_fc3)

## [1] 8.271365

#This shows that train1 had better accuracy then the other two.

Homework2_data624

2024-02-12