This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).
The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).
sunSpots = read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-sunspots.csv")[,-1]
## training and testing data: hold-up last 10 periods of data for calculating forecasting errors
training = sunSpots[1:2810]
testing = sunSpots[2811:2820]
##
sunSpots.ts = ts(training, frequency = 12, start = c(1749, 1))
We next use the four baseline forecasting methods and the first 30 data values to forecast the next 6 monthโs data values.
pred.mv = meanf(sunSpots.ts, h=15)$mean
pred.naive = naive(sunSpots.ts, h=15)$mean
pred.snaive = snaive(sunSpots.ts, h=15)$mean
pred.rwf = rwf(sunSpots.ts, h=15, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
pred.naive = pred.naive,
pred.snaive = pred.snaive,
pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")
| pred.mv | pred.naive | pred.snaive | pred.rwf |
|---|---|---|---|
| 51.21199 | 51 | 153.8 | 50.99751 |
| 51.21199 | 51 | 122.0 | 50.99502 |
| 51.21199 | 51 | 82.2 | 50.99252 |
| 51.21199 | 51 | 110.4 | 50.99003 |
| 51.21199 | 51 | 106.1 | 50.98754 |
| 51.21199 | 51 | 107.6 | 50.98505 |
| 51.21199 | 51 | 118.8 | 50.98256 |
| 51.21199 | 51 | 94.7 | 50.98006 |
| 51.21199 | 51 | 98.1 | 50.97757 |
| 51.21199 | 51 | 127.0 | 50.97508 |
| 51.21199 | 51 | 84.3 | 50.97259 |
| 51.21199 | 51 | 51.0 | 50.97010 |
| 51.21199 | 51 | 153.8 | 50.96760 |
| 51.21199 | 51 | 122.0 | 50.96511 |
| 51.21199 | 51 | 82.2 | 50.96262 |
We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 2810 historical data in the time series. The following only show observations #2806-#2820 and the 15 forecasted values.
plot(2800:2820, sunSpots[2800:2820], type="l", xlim=c(2800,2820), ylim=c(20, 175),
xlab = "observation sequence",
ylab = "Sun Spots Counts",
main = "Monthly sunspot counts and forecasting")
points(2800:2820, sunSpots[2800:2820],pch=20)
##
points(2806:2820, pred.mv, pch=15, col = "red")
points(2806:2820, pred.naive, pch=16, col = "blue")
points(2806:2820, pred.rwf, pch=18, col = "navy")
points(2806:2820, pred.snaive, pch=17, col = "purple")
##
lines(2806:2820, pred.mv, lty=2, col = "red")
lines(2806:2820, pred.snaive, lty=2, col = "purple")
lines(2806:2820, pred.naive, lty=2, col = "blue")
lines(2806:2820, pred.rwf, lty=2, col = "navy")
##
legend("topright", c("moving avergae", "naive", "drift", "seasonal naive"),
col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
bty="n", cex = 0.8)
We can see that the moving average method worked pretty well. The moving average predictive model is one which simply relies on the average of the historical data. The performance of naive and drift methods in this seasonal time series are close to each other. But naive, seasonal naive, and drift methods worked poorly compared to the moving average method.
We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.
true.value = sunSpots[2806:2820]
PE.mv = 100*(true.value - pred.mv)/true.value
PE.naive = 100*(true.value - pred.naive)/true.value
PE.snaive = 100*(true.value - pred.snaive)/true.value
PE.rwf = 100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")
| MAPE | MAD | MSE | |
|---|---|---|---|
| Moving Average | 35.21840 | 424.9160 | 1197.977 |
| Naive | 35.26579 | 426.4000 | 1207.949 |
| Seasonal Naive | 69.57910 | 593.2000 | 2334.351 |
| Drift | 35.26507 | 426.4947 | 1208.491 |
In summary, the moving average method has the best performance. Note that the methods introduced in this module are baseline forecasting. They are all descriptive since we did not use any statistical assumptions. In the next module, we will introduce a few formal non-parametric forecasting methods - exponential forecasting methods. We will use the same accuracy measures to compare different forecasting methods.