For this discussion I decided to use a data set that contains the annual water use in New York city from 1898 to 1968. The units are in litres per capita per day. My approach here was to do three models. The first model will be just a simple exponential smoothing (SPS) to be used as a baseline. From there we will evaluate a Holt model and use the ETS() method to select parameters for us for a final model.

library(readxl)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.3
data <- read_excel("C:/Users/Kevin/Desktop/Data analytics cert/Predictive Analytics/Discussion 3/annual-water-use-in-new-york-cit.xlsx")

The first thing we see is that we need to eliminate the first couple of rows so that we are left with just a vector of our data and then create the time series set that we will use for smoothing.

data <- data[-c(seq(1,12)),2]
waterdata <- ts(data, start = 1898)
autoplot(waterdata) + 
  xlab("Time in Years") +
  ylab("Liters of Water Consumed Per Capita Per Day") +
  ggtitle("Water consumption in New York (1898 - 1968")

Simple Exponential Smoothing

simple <- ses(waterdata, h = 10)
round(accuracy(simple),6)
##                    ME     RMSE      MAE      MPE     MAPE     MASE
## Training set 2.645226 24.08614 16.20008 0.414135 3.337943 0.987551
##                   ACF1
## Training set -0.006994
autoplot(simple) +
  autolayer(fitted(simple), series = "Fitted") +
  xlab("Time in Years") +
  ylab("Liters of Water Consumed Per Capita Per Day") +
  ggtitle("Simple Expoenetial Smoothing")


Holt Model

(The reason we use the Holt and not the Holt-Winter model is because our data set is yearly (aka frequency of 1) and does not have a seasonal component. To use the Holt-Winter model we would have to at a minimum break up the data quarterly in order to have a frequency greater than one.)

hlt <- holt(waterdata, damped = TRUE)
round(accuracy(hlt),6)
##                    ME     RMSE      MAE       MPE     MAPE     MASE
## Training set 0.038738 24.01302 16.44368 -0.153404 3.411013 1.002401
##                  ACF1
## Training set 0.004517
autoplot(hlt) +
  autolayer(fitted(simple), series = "Fitted") +
  xlab("Time in Years") +
  ylab("Liters of Water Consumed Per Capita Per Day") +
  ggtitle("Holt Expoenetial Smoothing")

ETS Selected Model

et <- ets(waterdata, model = "ZZZ", damped = TRUE)
summary(et)
## ETS(M,Ad,N) 
## 
## Call:
##  ets(y = waterdata, model = "ZZZ", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.8684 
##     beta  = 1e-04 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 408.8602 
##     b = 4.4417 
## 
##   sigma:  0.049
## 
##      AIC     AICc      BIC 
## 760.0770 761.3895 773.6531 
## 
## Training set error measures:
##                        ME    RMSE      MAE        MPE     MAPE     MASE
## Training set -0.003134886 23.9971 16.42358 -0.1639564 3.404723 1.001176
##                   ACF1
## Training set 0.0370366

From the results of the ETS model, we can see it selected a (M, Ad,N) ETS model. Meaning, that it chose a model whose error was multiplicative, had a trend that was additive and dampened, and did not have a seasonal componenet which we knew before given the data had a frequency of one.

et_fit <- forecast(et, h=10)
autoplot(et_fit) +
  autolayer(fitted(et_fit), series = "Fitted") +
  xlab("Time in Years") +
  ylab("Liters of Water Consumed Per Capita Per Day") +
  ggtitle("ETS chosen Expoenetial Smoothing")

The results of the three different smoothing models show that the ETS model had the smallest training set error measures and would likely be our chosen model. Important to note that it had a fairly large improvement over the SES model, but only a marginal improvement over the Holt model.