1 Description of the Data Set

This is a data set of the company Brent’s crude oil spot prices taken monthly from 1990 to 2022. The varaibles are the month and the price of the oil per a barrel.

Oil = read.csv("https://raw.githubusercontent.com/anafrance/STA-321/main/www/Time%20Series") [,-1]

2 Training and Testing Data

Next, the data is split into training and testing parts. This will allow us to test our model against actaul data and see how well it works. The training part will be the first 386 observations with the last 16 months being the testing part.

training = Oil[1:380]
testing = Oil[381:396]

Oil.ts = ts(training, frequency = 10, start = c(1990, 1))

3 Forecasting Models

pred.mv = meanf(Oil.ts, h=16)$mean
pred.naive = naive(Oil.ts, h=16)$mean
pred.snaive = snaive(Oil.ts, h=16)$mean
pred.rwf = rwf(Oil.ts, h=16, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
                    pred.naive = pred.naive,
                    pred.snaive = pred.snaive,
                    pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")
Forecasting Table
pred.mv pred.naive pred.snaive pred.rwf
49.20751 63 42.69 63.11016
49.20751 63 49.99 63.22031
49.20751 63 54.77 63.33047
49.20751 63 62.28 63.44062
49.20751 63 65.41 63.55078
49.20751 63 65.00 63.66093
49.20751 63 66.00 63.77109
49.20751 63 65.00 63.88125
49.20751 63 64.00 63.99140
49.20751 63 63.00 64.10156
49.20751 63 42.69 64.21171
49.20751 63 49.99 64.32187
49.20751 63 54.77 64.43202
49.20751 63 62.28 64.54218
49.20751 63 65.41 64.65234
49.20751 63 65.00 64.76249

4 Visualization

A time series plot is made and the predicted values. Note that, the forecast values were based on the model that uses 380 historical data in the time series. The following only show observations #363 -#380 and the 16 forecasted values.

plot(363:380, Oil[363:380], type="l", xlim=c(363,396), ylim=c(10, 80),
     xlab = "observation sequence",
     ylab = "Crude Oil Prices",
     main = "Monthly Crude Oil Prices and forecasting")
points(363:380, Oil[363:380],pch=20)

pred.mv
## Time Series:
## Start = c(2028, 1) 
## End = c(2029, 6) 
## Frequency = 10 
##  [1] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751
##  [9] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751
##
points(381:396, pred.mv, pch=15, col = "red")
points(381:396, pred.naive, pch=16, col = "blue")
points(381:396, pred.rwf, pch=18, col = "navy")
points(381:396, pred.snaive, pch=17, col = "purple")
##
lines(381:396, pred.mv, lty=2, col = "red")
lines(381:396, pred.snaive, lty=2, col = "purple")
lines(381:396, pred.naive, lty=2, col = "blue")
lines(381:396, pred.rwf, lty=2, col = "navy")
## 
legend("bottomright", c("moving average", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

The dift and naive methods seem to follow the pattern of the last values where the moving averages all of the old values making it appear to low compared to the other methods. The performance of naive and drift methods in this seasonal time series are close to each other with seasonal having some lower values. It’s hard to tell from the gragh which methods works the best.

5 Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

true.value = Oil[381:396]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")
Overall performance of the four forecasting methods
MAPE MAD MSE
Moving Average 18.559014 179.67987 127.10882
Naive 4.268278 41.00000 7.56250
Seasonal Naive 10.722856 103.48000 65.36945
Drift 5.818752 55.98117 13.55767

In summary, the naive method has the best performance with the dift method being a close second. The method with the worst performance was the moving aveage. This shows that even though the moving average method is usually the safest method to go with, it isn’t always the best.