1 Introduction

This dataset describes a monthly count of the number of observed sunspots for just over 230 years (1749-1983).

The units are a count and there are 2,820 observations. The source of the dataset is credited to Andrews & Herzberg (1985).

1.1 Training and Testing Data

sunSpots = read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-sunspots.csv")[,-1]

## training and testing data: hold-up last 10 periods of data for calculating forecasting errors
training = sunSpots[1:2810]
testing = sunSpots[2811:2820]
##
sunSpots.ts = ts(training, frequency = 12, start = c(1749, 1))

1.2 Building 4 Forecasting Models

We next use the four baseline forecasting methods and the first 30 data values to forecast the next 6 month’s data values.

pred.mv = meanf(sunSpots.ts, h=15)$mean
pred.naive = naive(sunSpots.ts, h=15)$mean
pred.snaive = snaive(sunSpots.ts, h=15)$mean
pred.rwf = rwf(sunSpots.ts, h=15, drift = TRUE)$mean
###
###
pred.table = cbind( pred.mv = pred.mv,
                    pred.naive = pred.naive,
                    pred.snaive = pred.snaive,
                    pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")

Forecasting Table
pred.mv	pred.naive	pred.snaive	pred.rwf
51.21199	51	153.8	50.99751
51.21199	51	122.0	50.99502
51.21199	51	82.2	50.99252
51.21199	51	110.4	50.99003
51.21199	51	106.1	50.98754
51.21199	51	107.6	50.98505
51.21199	51	118.8	50.98256
51.21199	51	94.7	50.98006
51.21199	51	98.1	50.97757
51.21199	51	127.0	50.97508
51.21199	51	84.3	50.97259
51.21199	51	51.0	50.97010
51.21199	51	153.8	50.96760
51.21199	51	122.0	50.96511
51.21199	51	82.2	50.96262

1.3 Visualization

We now make a time series plot and the predicted values. Note that, the forecast values were based on the model that uses 2810 historical data in the time series. The following only show observations #2806-#2820 and the 15 forecasted values.

plot(2800:2820, sunSpots[2800:2820], type="l", xlim=c(2800,2820), ylim=c(20, 175),
     xlab = "observation sequence",
     ylab = "Sun Spots Counts",
     main = "Monthly sunspot counts and forecasting")
points(2800:2820, sunSpots[2800:2820],pch=20)
##
points(2806:2820, pred.mv, pch=15, col = "red")
points(2806:2820, pred.naive, pch=16, col = "blue")
points(2806:2820, pred.rwf, pch=18, col = "navy")
points(2806:2820, pred.snaive, pch=17, col = "purple")
##
lines(2806:2820, pred.mv, lty=2, col = "red")
lines(2806:2820, pred.snaive, lty=2, col = "purple")
lines(2806:2820, pred.naive, lty=2, col = "blue")
lines(2806:2820, pred.rwf, lty=2, col = "navy")
## 
legend("topright", c("moving avergae", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

We can see that the moving average method worked pretty well. The moving average predictive model is one which simply relies on the average of the historical data. The performance of naive and drift methods in this seasonal time series are close to each other. But naive, seasonal naive, and drift methods worked poorly compared to the moving average method.

1.4 Accuracy Metrics

We will use the mean absolute prediction error (MAPE) to compare the performance of the four forecasting methods.

true.value = sunSpots[2806:2820]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")

Overall performance of the four forecasting methods
	MAPE	MAD	MSE
Moving Average	35.21840	424.9160	1197.977
Naive	35.26579	426.4000	1207.949
Seasonal Naive	69.57910	593.2000	2334.351
Drift	35.26507	426.4947	1208.491

In summary, the moving average method has the best performance. Note that the methods introduced in this module are baseline forecasting. They are all descriptive since we did not use any statistical assumptions. In the next module, we will introduce a few formal non-parametric forecasting methods - exponential forecasting methods. We will use the same accuracy measures to compare different forecasting methods.

Time Series01 Assignment

Zack Shin

May 12, 2023

1 Introduction

1.1 Training and Testing Data

1.2 Building 4 Forecasting Models

1.3 Visualization

1.4 Accuracy Metrics