This is a data set of the company Brent’s crude oil spot prices taken monthly from 1990 to 2022. The varaibles are the month and the price of the oil per a barrel.
Oil = read.csv("https://raw.githubusercontent.com/anafrance/STA-321/main/www/Time%20Series") [,-1]
Next, the data is split into training and testing parts. This will allow us to test our model against actaul data and see how well it works. The training part will be the first 386 observations with the last 16 months being the testing part.
training = Oil[1:380]
testing = Oil[381:396]
Oil.ts = ts(training, frequency = 10, start = c(1990, 1))
pred.mv = meanf(Oil.ts, h=16)$mean
pred.naive = naive(Oil.ts, h=16)$mean
pred.snaive = snaive(Oil.ts, h=16)$mean
pred.rwf = rwf(Oil.ts, h=16, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
pred.naive = pred.naive,
pred.snaive = pred.snaive,
pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")
| pred.mv | pred.naive | pred.snaive | pred.rwf |
|---|---|---|---|
| 49.20751 | 63 | 42.69 | 63.11016 |
| 49.20751 | 63 | 49.99 | 63.22031 |
| 49.20751 | 63 | 54.77 | 63.33047 |
| 49.20751 | 63 | 62.28 | 63.44062 |
| 49.20751 | 63 | 65.41 | 63.55078 |
| 49.20751 | 63 | 65.00 | 63.66093 |
| 49.20751 | 63 | 66.00 | 63.77109 |
| 49.20751 | 63 | 65.00 | 63.88125 |
| 49.20751 | 63 | 64.00 | 63.99140 |
| 49.20751 | 63 | 63.00 | 64.10156 |
| 49.20751 | 63 | 42.69 | 64.21171 |
| 49.20751 | 63 | 49.99 | 64.32187 |
| 49.20751 | 63 | 54.77 | 64.43202 |
| 49.20751 | 63 | 62.28 | 64.54218 |
| 49.20751 | 63 | 65.41 | 64.65234 |
| 49.20751 | 63 | 65.00 | 64.76249 |
A time series plot is made and the predicted values. Note that, the forecast values were based on the model that uses 380 historical data in the time series. The following only show observations #363 -#380 and the 16 forecasted values.
plot(363:380, Oil[363:380], type="l", xlim=c(363,396), ylim=c(10, 80),
xlab = "observation sequence",
ylab = "Crude Oil Prices",
main = "Monthly Crude Oil Prices and forecasting")
points(363:380, Oil[363:380],pch=20)
pred.mv
## Time Series:
## Start = c(2028, 1)
## End = c(2029, 6)
## Frequency = 10
## [1] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751
## [9] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751
##
points(381:396, pred.mv, pch=15, col = "red")
points(381:396, pred.naive, pch=16, col = "blue")
points(381:396, pred.rwf, pch=18, col = "navy")
points(381:396, pred.snaive, pch=17, col = "purple")
##
lines(381:396, pred.mv, lty=2, col = "red")
lines(381:396, pred.snaive, lty=2, col = "purple")
lines(381:396, pred.naive, lty=2, col = "blue")
lines(381:396, pred.rwf, lty=2, col = "navy")
##
legend("bottomright", c("moving average", "naive", "drift", "seasonal naive"),
col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
bty="n", cex = 0.8)
The dift and naive methods seem to follow the pattern of the last values where the moving averages all of the old values making it appear to low compared to the other methods. The performance of naive and drift methods in this seasonal time series are close to each other with seasonal having some lower values. It’s hard to tell from the gragh which methods works the best.
We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.
true.value = Oil[381:396]
PE.mv = 100*(true.value - pred.mv)/true.value
PE.naive = 100*(true.value - pred.naive)/true.value
PE.snaive = 100*(true.value - pred.snaive)/true.value
PE.rwf = 100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")
| MAPE | MAD | MSE | |
|---|---|---|---|
| Moving Average | 18.559014 | 179.67987 | 127.10882 |
| Naive | 4.268278 | 41.00000 | 7.56250 |
| Seasonal Naive | 10.722856 | 103.48000 | 65.36945 |
| Drift | 5.818752 | 55.98117 | 13.55767 |
In summary, the naive method has the best performance with the dift method being a close second. The method with the worst performance was the moving aveage. This shows that even though the moving average method is usually the safest method to go with, it isn’t always the best.