1 Introduction

This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).

1.1 Training and Testing Data

sunSpots = read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-sunspots.csv")[,-1]

## training and testing data: hold-up last 10 periods of data for calculating forecasting errors
training = sunSpots[1:2810]
testing = sunSpots[2811:2820]
##
sunSpots.ts = ts(training, frequency = 12, start = c(1749, 1))

1.2 Building 4 Forecasting Models

We next use the four baseline forecasting methods and the first 30 data values to forecast the next 6 monthโ€™s data values.

pred.mv = meanf(sunSpots.ts, h=15)$mean
pred.naive = naive(sunSpots.ts, h=15)$mean
pred.snaive = snaive(sunSpots.ts, h=15)$mean
pred.rwf = rwf(sunSpots.ts, h=15, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
                    pred.naive = pred.naive,
                    pred.snaive = pred.snaive,
                    pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")
Forecasting Table
pred.mv pred.naive pred.snaive pred.rwf
51.21199 51 153.8 50.99751
51.21199 51 122.0 50.99502
51.21199 51 82.2 50.99252
51.21199 51 110.4 50.99003
51.21199 51 106.1 50.98754
51.21199 51 107.6 50.98505
51.21199 51 118.8 50.98256
51.21199 51 94.7 50.98006
51.21199 51 98.1 50.97757
51.21199 51 127.0 50.97508
51.21199 51 84.3 50.97259
51.21199 51 51.0 50.97010
51.21199 51 153.8 50.96760
51.21199 51 122.0 50.96511
51.21199 51 82.2 50.96262

1.3 Visualization

We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 2810 historical data in the time series. The following only show observations #2806-#2820 and the 15 forecasted values.

plot(2800:2820, sunSpots[2800:2820], type="l", xlim=c(2800,2820), ylim=c(20, 175),
     xlab = "observation sequence",
     ylab = "Sun Spots Counts",
     main = "Monthly sunspot counts and forecasting")
points(2800:2820, sunSpots[2800:2820],pch=20)
##
points(2806:2820, pred.mv, pch=15, col = "red")
points(2806:2820, pred.naive, pch=16, col = "blue")
points(2806:2820, pred.rwf, pch=18, col = "navy")
points(2806:2820, pred.snaive, pch=17, col = "purple")
##
lines(2806:2820, pred.mv, lty=2, col = "red")
lines(2806:2820, pred.snaive, lty=2, col = "purple")
lines(2806:2820, pred.naive, lty=2, col = "blue")
lines(2806:2820, pred.rwf, lty=2, col = "navy")
## 
legend("topright", c("moving avergae", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

We can see that the moving average method worked pretty well. The moving average predictive model is one which simply relies on the average of the historical data. The performance of naive and drift methods in this seasonal time series are close to each other. But naive, seasonal naive, and drift methods worked poorly compared to the moving average method.

1.4 Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

true.value = sunSpots[2806:2820]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")
Overall performance of the four forecasting methods
MAPE MAD MSE
Moving Average 35.21840 424.9160 1197.977
Naive 35.26579 426.4000 1207.949
Seasonal Naive 69.57910 593.2000 2334.351
Drift 35.26507 426.4947 1208.491

In summary, the moving average method has the best performance. Note that the methods introduced in this module are baseline forecasting. They are all descriptive since we did not use any statistical assumptions. In the next module, we will introduce a few formal non-parametric forecasting methods - exponential forecasting methods. We will use the same accuracy measures to compare different forecasting methods.