The goal of this topic is to build a predictive model on the beer data from the package fpp. The data will be split in 2 samples. One to train the model on data from 1991 to 1994. The second set will contain data from 1995 onwards. The forecast will be made using the tslm function and the four principal methods : mean forecast, naive forecast, drift method and seasonal method. At the end we will measure accuracy of the model comparing the training and test set.
plot(beer, type="l", col="blue")
summary(beer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 119.0 135.5 145.5 149.3 156.2 192.0
hist(beer, col = "blue")
best_beer<-BoxCox.lambda(beer)
best<-BoxCox(beer,best_beer)
plot(best)
seasonplot(best, col = rainbow(10), year.labels = TRUE)
COMMENTS: If we look at the main building blocks of the data we can see that there is no trend but we suspect a seasonal structure related to the rising sales by the end of the year (christmas and new year effects)
decomp<-stl(best, s.window = 15)
plot(decomp)
adjust<-seasadj(decomp)
plot(naive(adjust))
adjust<-seasadj(decomp)
plot(snaive(adjust))
tsdisplay(best)
b1<-window(best, end=1994.99)
b2<-window(best, start=1995)
h=length(b2)
plot(best, type = 'n')
lines(b1)
lines(b2, col = "red")
abline(v = end(b1) + 1, lty = 2, lwd = 2)
# Mean (based on the overall mean value)
f1 <- meanf(b1, h = h)
lines(f1$mean, lwd = 2, col = "yellow")
# Naive (based on the last value)
f2 <- rwf(b1, h = h)
lines(f2$mean, lwd = 2, col = "green")
# Drift (based on 1st and last value)
f3 <- rwf(b1, drift = TRUE, h = h)
lines(f3$mean, lwd = 2, col = "orange")
# Seasonal naive forecast
f4 <- snaive(b1, h = h)
lines(f4$mean, lwd = 2, col = "blue")
kable(accuracy(f1, b2)) # display accuracy for mean method
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|
| Training set | 0.0000000 | 0.0008243 | 0.0006761 | -0.0000688 | 0.0680501 | 1.628164 | 0.4381281 | NA |
| Test set | -0.0005652 | 0.0008534 | 0.0006734 | -0.0569714 | 0.0678601 | 1.621680 | -0.4624799 | 0.7852037 |
kable(accuracy(f2, b2)) # display accuracy for naive method
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|
| Training set | 0.0000128 | 0.0008598 | 0.0006630 | 0.0012535 | 0.0667417 | 1.596633 | -0.2099102 | NA |
| Test set | -0.0017945 | 0.0019050 | 0.0017945 | -0.1807940 | 0.1807940 | 4.321656 | -0.4624799 | 1.697863 |
kable(accuracy(f3, b2)) # display accuracy for drift method
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|
| Training set | 0.0000000 | 0.0008597 | 0.0006616 | -0.0000387 | 0.0666047 | 1.593345 | -0.2099102 | NA |
| Test set | -0.0018522 | 0.0019607 | 0.0018522 | -0.1866123 | 0.1866123 | 4.460764 | -0.4343529 | 1.751872 |
kable(accuracy(f4, b2)) # display accuracy for seasonal method
| ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
|---|---|---|---|---|---|---|---|---|
| Training set | -0.0001523 | 0.0005277 | 0.0004152 | -0.0153448 | 0.0418132 | 1.000000 | -0.2479798 | NA |
| Test set | 0.0000392 | 0.0005297 | 0.0004480 | 0.0039347 | 0.0451176 | 1.078918 | -0.0906758 | 0.454126 |
COMMENT :seasonal naive method is the one that minimizes RMSE on test set with the value of 0.0005296950. Our intuiton on the seasonal structure of the data has been confirmed by accuracy measures. seasonal naive method is then the best method to use for our prediction task
res <- residuals(f4)
plot(res)
hist(res, breaks = "FD", col = "lightgreen")
acf(res, na.action = na.omit)
The bell shape of the histogram suggests that residuals are normal and not correlated.
acf(res, na.action = na.omit)
fit<-tslm(best~trend)
f<-forecast(fit, h=h)
plot(f)
acf(residuals(f))
fit2<-tslm(best~trend+season)
f2<-forecast(fit2, h=h)
plot(f2)
acf(residuals(f2))
pacf(residuals(f2))
pacf and acf plots suggest also that the overall trend and seasonal structure of the data show no autocorrelation.