Homework 3

Question 1

install.packages(“ggplot2”) install.packages(“forecast”)

tsprice <- ts(rnorm(100, mean = 50, sd = 10), start = c(2020, 1), frequency = 12)
library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.4.3

library(forecast)

## Warning: package 'forecast' was built under R version 4.4.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

autoplot(tsprice)

##The data appears to be seasonal.

acf(tsprice)

#the data appears to be trended and seasonal. Due to the fact more than 95% of the spikes in the ACF lie outside fo +-2/the square root of T, then this is not a white noise series.

train.price <- window(tsprice, end = c(2026, 8))
test.price <- window(tsprice, start = c(2026, 9))
autoplot(tsprice) + autolayer(train.price, series = "Train") + autolayer(test.price, series = "Test")

library(knitr)
model <- auto.arima(train.price)
forecast_values <- forecast(model, h = length(test.price))
accuracy_results <- accuracy(forecast_values, test.price)
kable(accuracy_results, format.args = list(big.mark = "."), digits = 2)

## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing
## Warning in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
## 'big.mark' and 'decimal.mark' are both '.', which could be confusing

	ME	RMSE	MAE	MPE	MAPE	MASE	ACF1	Theil’s U
Training set	-0.13	9.35	7.19	-4.10	15.50	0.52	-0.04	NA
Test set	1.82	9.95	7.65	-0.27	16.12	0.55	-0.28	0.61

# Mean Absolute Error is the average of the absolute errors between the forcasted values and the actual values. The Naive is the smallest value, being the forecasted value is off 53 cents from the actual price on average. Root Mean Squared Error is the average squared differences between the actual values and the forecasted values. The Naive is again the smallest value, being the forecasted value is off 53 cents from the actual price on average.Mean Absolute Percentage Error is the average of the absolute percetage differences between the actual values and the forecasted values. The Naive is again the smallest value, which means that the forecasted values are 4.46% off from the actual observed values. The Mean Absolute Scaled Error is a scaled version of MAE, which compares the forecast error to the error obtained from a naive forecast. Again the Naive is the lowest, showing that observations from this model’s forecast error is 38% of the actual forecast error.

library(forecast)
if (!exists("train.price")) {
  stop("Error: train.price does not exist! Please define it first.")
}
arima_model <- auto.arima(train.price)
exists("arima_model")  # Should return TRUE

## [1] TRUE

checkresiduals(arima_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
## Q* = 14.412, df = 14, p-value = 0.4195
## 
## Model df: 2.   Total lags used: 16

checkresiduals(arima_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
## Q* = 14.412, df = 14, p-value = 0.4195
## 
## Model df: 2.   Total lags used: 16

tsprice <- ts(rnorm(100, mean = 50, sd = 10), start = c(2020, 1), frequency = 12)
train_length <- round(length(tsprice) * 0.8)
test.price <- window(tsprice, start = c(2020, train_length + 1))
train.price <- window(tsprice, end = c(2026, train_length))

## Warning in window.default(x, ...): 'end' value not changed

naive_model <- naive(train.price, h = length(test.price))
checkresiduals(naive_model)

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 44.366, df = 20, p-value = 0.001344
## 
## Model df: 0.   Total lags used: 20

#12. Yes, the ACF is showing that there is a pattern with the data. There is not white noise due to the p-value being so low and the spikes falling beyond the dotted blue line in the ACF. Yea, the histogram is showing that the residuals do have a normal curve pattern. One may point out the length of the left tail, but I do not think that is enough to rule out this is a normal curve pattern. The null hypothesis for the Ljung Box test is that the autocorrelations come from white noise. We reject the null, because the P-value is very low. This suggest that the autocorrelation does not come from white noise.

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Homework 3

2025-03-03

Including Plots