Data Training and
Testing
Next, we will hold the last ten periods of the data for testing. We
will define four different training data sets. The training set sizes
used in this analysis are 144, 109, 73, and 48. The same test set with
size ten will be used to calculate the prediction error.
ini.data = data.house[,2]
n0 = length(ini.data)
##
train.data01 = data.house[1:(n0-7), 2]
train.data02 = data.house[37:(n0-7), 2]
train.data03 = data.house[73:(n0-7), 2]
train.data04 = data.house[97:(n0-7), 2]
## last 7 observations
test.data = data.house[(n0-6):n0,2]
##
train01.ts = ts(train.data01, frequency = 12, start = c(2012, 1))
train02.ts = ts(train.data02, frequency = 12, start = c(2015, 1))
train03.ts = ts(train.data03, frequency = 12, start = c(2018, 1))
train04.ts = ts(train.data04, frequency = 12, start = c(2020, 1))
##
stl01 = stl(train01.ts, s.window = 12)
stl02 = stl(train02.ts, s.window = 12)
stl03 = stl(train03.ts, s.window = 12)
stl04 = stl(train04.ts, s.window = 12)
## Forecast with decomposing
fcst01 = forecast(stl01,h=10, method="naive")
fcst02 = forecast(stl02,h=10, method="naive")
fcst03 = forecast(stl03,h=10, method="naive")
fcst04 = forecast(stl04,h=10, method="naive")
Next, we perform error analysis.
PE01=(test.data-fcst01$mean)/fcst01$mean
## Warning in `-.default`(test.data, fcst01$mean): longer object length is not a
## multiple of shorter object length
PE02=(test.data-fcst02$mean)/fcst02$mean
## Warning in `-.default`(test.data, fcst02$mean): longer object length is not a
## multiple of shorter object length
PE03=(test.data-fcst03$mean)/fcst03$mean
## Warning in `-.default`(test.data, fcst03$mean): longer object length is not a
## multiple of shorter object length
PE04=(test.data-fcst04$mean)/fcst04$mean
## Warning in `-.default`(test.data, fcst04$mean): longer object length is not a
## multiple of shorter object length
###
MAPE1 = mean(abs(PE01))
MAPE2 = mean(abs(PE02))
MAPE3 = mean(abs(PE03))
MAPE4 = mean(abs(PE04))
###
E1=test.data-fcst01$mean
## Warning in `-.default`(test.data, fcst01$mean): longer object length is not a
## multiple of shorter object length
E2=test.data-fcst02$mean
## Warning in `-.default`(test.data, fcst02$mean): longer object length is not a
## multiple of shorter object length
E3=test.data-fcst03$mean
## Warning in `-.default`(test.data, fcst03$mean): longer object length is not a
## multiple of shorter object length
E4=test.data-fcst04$mean
## Warning in `-.default`(test.data, fcst04$mean): longer object length is not a
## multiple of shorter object length
##
MSE1=mean(E1^2)
MSE2=mean(E2^2)
MSE3=mean(E3^2)
MSE4=mean(E4^2)
###
MSE=c(MSE1, MSE2, MSE3, MSE4)
MAPE=c(MAPE1, MAPE2, MAPE3, MAPE4)
accuracy=cbind(MSE=MSE, MAPE=MAPE)
row.names(accuracy)=c("n.144", "n.109", "n. 73", "n. 48")
kable(accuracy, caption="Error comparison between forecast results with different sample sizes")
Error comparison between forecast results with different sample
sizes
n.144 |
3528.487 |
0.0802875 |
n.109 |
3491.738 |
0.0795013 |
n. 73 |
3965.483 |
0.0852145 |
n. 48 |
5327.657 |
0.1013145 |
We can see from the table above that a training size of 109 performs
the best and has the lowest errors. While the mean square errors look
normal, the mean absolute percentage error is well into the thousands.
Some possible reasons for this are the observations in our time series
ranges from about 300 to a little over one thousand. The time series
also follows several patterns including seasonal trends and being
additive. We will take a closer look at the errors next by making one
graph for the MSE and one for the MAPE.
par(mfrow=c(1,2))
plot(1:4, MSE, type="b", col="darkred", ylab="Error", xlab="",
#ylim=c(0.4,.85),xlim = c(0.5,4.5),
main="MSE", axes=FALSE)
labs=c("n=144", "n=109", "n=73", "n=48")
axis(1, at=1:4, label=labs)
axis(2)
#lines(1:4, MAPE, type="b", col="blue")
text(1:4, MAPE+0.03, as.character(round(MAPE,4)), col="blue", cex=0.7)
text(1:4, MSE-0.03, as.character(round(MSE,4)), col="darkred", cex=0.7)
legend(1.5, 0.63, c("MSE", "MAPE"), col=c("darkred","blue"), lty=1, bty="n", cex=0.7)
###
#```{r fig.align='center', fig.cap= "Comparing forecast errors", fig.width=5, fig.height=3.5}
plot(1:4, MAPE, type="b", col="darkred", ylab="Error", xlab="",
#ylim=c(0.4,.85),xlim = c(0.5,4.5),
main="MAPE", axes=FALSE)
labs=c("n=144", "n=109", "n=73", "n=48")
axis(1, at=1:4, label=labs)
axis(2)
