1 Description of the Data Set

This is a data set of the company Brent’s crude oil spot prices taken monthly from 1990 to 2022. The varaibles are the month and the price of the oil per a barrel.

Oil = read.csv("https://raw.githubusercontent.com/anafrance/STA-321/main/www/Time%20Series") [,-1]

2 Training and Testing Data

Next, the data is split into training and testing parts. This will allow us to test our model against actaul data and see how well it works. The training part will be the first 386 observations with the last 16 months being the testing part.

training = Oil[1:380]
testing = Oil[381:396]

Oil.ts = ts(training, frequency = 10, start = c(1990, 1))

3 Forecasting Models

pred.mv = meanf(Oil.ts, h=16)$mean
pred.naive = naive(Oil.ts, h=16)$mean
pred.snaive = snaive(Oil.ts, h=16)$mean
pred.rwf = rwf(Oil.ts, h=16, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
                    pred.naive = pred.naive,
                    pred.snaive = pred.snaive,
                    pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")

Forecasting Table
pred.mv	pred.naive	pred.snaive	pred.rwf
49.20751	63	42.69	63.11016
49.20751	63	49.99	63.22031
49.20751	63	54.77	63.33047
49.20751	63	62.28	63.44062
49.20751	63	65.41	63.55078
49.20751	63	65.00	63.66093
49.20751	63	66.00	63.77109
49.20751	63	65.00	63.88125
49.20751	63	64.00	63.99140
49.20751	63	63.00	64.10156
49.20751	63	42.69	64.21171
49.20751	63	49.99	64.32187
49.20751	63	54.77	64.43202
49.20751	63	62.28	64.54218
49.20751	63	65.41	64.65234
49.20751	63	65.00	64.76249

4 Visualization

A time series plot is made and the predicted values. Note that, the forecast values were based on the model that uses 380 historical data in the time series. The following only show observations #363 -#380 and the 16 forecasted values.

plot(363:380, Oil[363:380], type="l", xlim=c(363,396), ylim=c(10, 80),
     xlab = "observation sequence",
     ylab = "Crude Oil Prices",
     main = "Monthly Crude Oil Prices and forecasting")
points(363:380, Oil[363:380],pch=20)

pred.mv

## Time Series:
## Start = c(2028, 1) 
## End = c(2029, 6) 
## Frequency = 10 
##  [1] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751
##  [9] 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751 49.20751

##
points(381:396, pred.mv, pch=15, col = "red")
points(381:396, pred.naive, pch=16, col = "blue")
points(381:396, pred.rwf, pch=18, col = "navy")
points(381:396, pred.snaive, pch=17, col = "purple")
##
lines(381:396, pred.mv, lty=2, col = "red")
lines(381:396, pred.snaive, lty=2, col = "purple")
lines(381:396, pred.naive, lty=2, col = "blue")
lines(381:396, pred.rwf, lty=2, col = "navy")
## 
legend("bottomright", c("moving average", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

The dift and naive methods seem to follow the pattern of the last values where the moving averages all of the old values making it appear to low compared to the other methods. The performance of naive and drift methods in this seasonal time series are close to each other with seasonal having some lower values. It’s hard to tell from the gragh which methods works the best.

5 Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

true.value = Oil[381:396]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")

Overall performance of the four forecasting methods
	MAPE	MAD	MSE
Moving Average	18.559014	179.67987	127.10882
Naive	4.268278	41.00000	7.56250
Seasonal Naive	10.722856	103.48000	65.36945
Drift	5.818752	55.98117	13.55767

In summary, the naive method has the best performance with the dift method being a close second. The method with the worst performance was the moving aveage. This shows that even though the moving average method is usually the safest method to go with, it isn’t always the best.

Time Series: Crude Oil Prices

Gianna LaFrance

2023-04-16

1 Description of the Data Set

2 Training and Testing Data

3 Forecasting Models

4 Visualization

5 Accuracy Metrics