The purpose of this document is to outline exponential smoothing techniques for dealing with time series data in operations research. This document will cover the basics of exponential smoothing including trending and include specific examples covered in the Georgia Tech MGMT-6203 course. The contents included here are NOT part of a homework assignment, or exam, but rather just a “re-working” of the examples used in the course content. Georgia Tech is very much concerned about academic integrity and honesty. My goal is not to cheat, but rather the help others understand the lectures.
The original work was done by Bob Myers, Lecturer, Georgia Institute of Technology, MGMT-6203 Data Analytics for Business, Summer 2020.
Assume that we have a set of time series data, and we would like to perform exponential smoothing on it. For standard exponential smoothing, we need a single smoothing parameter \(\alpha\) that indicates how much smoothing to apply. To see how this parameter works, consider the following equation:
\[F_{t+1}=F_t + \alpha(A_{t}-F_{t})\] In this equation, the F values are the forecasts, the A value is the actual measurement, and \(\alpha\) is the smoothing coefficient such that \(0\leq{\alpha}\leq1\). In the equation above, the next estimate in the smoothing sequence is the last estimate, plus some portion of the error in the previous step. To see this in action, consider the following data:
rm(list=ls())
library(tidyverse)
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
data <- read_csv('es_demand.csv')
## Parsed with column specification:
## cols(
## Week = col_double(),
## Actual = col_double(),
## Forecast = col_double()
## )
data
## # A tibble: 10 x 3
## Week Actual Forecast
## <dbl> <dbl> <dbl>
## 1 1 820 820
## 2 2 775 NA
## 3 3 680 NA
## 4 4 655 NA
## 5 5 750 NA
## 6 6 802 NA
## 7 7 798 NA
## 8 8 689 NA
## 9 9 775 NA
## 10 10 NA NA
In the above the values of the Demand are the actual \(A_t\) values, and forecast is the \(F_t\) values.
To start with exponential smoothing, we need to establish an initial forecast. In this example, we just use the actual for that period. In other words, we set \(F_1=A_1=820\). If we have a defined \(\alpha\), we can calculate the sequence of forecasts. One way to do that is shown in the code below.
alpha <- 0.2
new_data <- data
get_es <- function(new_data, alpha){
for(t in seq(2, nrow(new_data))){
a <- new_data$Actual[t-1]
f <- new_data$Forecast[t-1]
nextf <- f + alpha * (a - f)
new_data$Forecast[t] <- nextf
}
return(new_data)
}
get_es(new_data,alpha)
## # A tibble: 10 x 3
## Week Actual Forecast
## <dbl> <dbl> <dbl>
## 1 1 820 820
## 2 2 775 820
## 3 3 680 811
## 4 4 655 785.
## 5 5 750 759.
## 6 6 802 757.
## 7 7 798 766.
## 8 8 689 772.
## 9 9 775 756.
## 10 10 NA 760.
Consider the following data for soup sales.
kroger <- read_csv('kroger.csv')
## Parsed with column specification:
## cols(
## Month = col_character(),
## Actual = col_double(),
## Forecast = col_double(),
## Trend = col_double(),
## Fit = col_double()
## )
kroger
## # A tibble: 7 x 5
## Month Actual Forecast Trend Fit
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Jan 1325 1370 0 1370
## 2 Feb 1353 NA NA NA
## 3 March 1305 NA NA NA
## 4 Apr 1275 NA NA NA
## 5 May 1210 NA NA NA
## 6 Jun 1195 NA NA NA
## 7 Jul NA NA NA NA
Trending allows us to account for a trend in the data. By including trending, we can much more accurately forecast time series data. To account for trending, we use the following equations:
\[F_t=FIT_{t-1}+\alpha(A_{t-1}-FIT_{t-1})\] \[ T_t=T_{t-1}+\delta(F_t-FIT_{t-1})\] \[FIT_t=F_t+T_t\]
We make the calculations in the order that they are described above. The following R code provides an example of this based on the data set above.
library(ggplot2)
alpha <- 0.8
delta <- 0.5
get_trend <- function(data, alpha, delta) {
new_data <- data
for(t in seq(2, nrow(new_data))){
Actual <- new_data$Actual[t-1]
Forecast <- new_data$Forecast[t-1]
Trend <- new_data$Trend[t-1]
Fit <- new_data$Fit[t-1]
Forecast1 = Fit + alpha*(Actual-Fit)
Trend1 = Trend + delta*(Forecast1-Fit)
Fit1 = Forecast1 + Trend1
new_data$Forecast[t]<-Forecast1
new_data$Trend[t]<-Trend1
new_data$Fit[t]<-Fit1
}
return(new_data)
}
es0 <- get_es(kroger, alpha=0)
es2 <- get_es(kroger, alpha=0.2)
es8 <- get_es(kroger, alpha=0.8)
tnd <- get_trend(kroger, alpha=.8, delta=.5)
graph_data <- data.frame(idx=seq(1,nrow(es2)), Actual=es2$Actual, es0=es0$Forecast, es2=es2$Forecast, es8=es8$Forecast, tnd=tnd$Forecast)
ggplot(graph_data) + geom_line(aes(x=idx, y=Actual, color="Actual")) +
geom_line(aes(x=idx, y=es0, color="ES alpha=0")) +
geom_line(aes(x=idx, y=es2, color="ES alpha=.2 ")) +
geom_line(aes(x=idx, y=es8, color="ES alpha=.8")) +
geom_line(aes(x=idx, y=tnd, color="Trending alpha=.8, delta=.5")) +
ylim(1100,1400)
## Warning: Removed 1 row(s) containing missing values (geom_path).
In the above plot, we can see how the trending line follows the actual line more closely than the two exponential smoothing lines.
Consider the following data…
icecream <- read_csv('icecream.csv')
## Parsed with column specification:
## cols(
## Year = col_double(),
## Month = col_character(),
## Sales = col_double(),
## Forecast = col_double()
## )
icecream
## # A tibble: 24 x 4
## Year Month Sales Forecast
## <dbl> <chr> <dbl> <dbl>
## 1 2018 January 10 10
## 2 2018 February 10 NA
## 3 2018 March 10 NA
## 4 2018 April 50 NA
## 5 2018 May 150 NA
## 6 2018 June 400 NA
## 7 2018 July 600 NA
## 8 2018 August 700 NA
## 9 2018 September 350 NA
## 10 2018 October 100 NA
## # … with 14 more rows
In order to use seasonality, we need one complete season of data to start with. The above file has 2 years of icecream sales data. We will continue with just the 2018 data and look at seasonality index.
icecream_2018 <- icecream %>% filter(Year==2018)
Here we will consider the column “Sales” to be X. The seasonality index of sales is:
\[SI_t = X_t/mean(X)\]
The following code will compute \(S_t\) and append it to the data frame:
icecream_2018 <- icecream_2018 %>% mutate(SI=Sales/mean(icecream_2018$Sales))
icecream_2018
## # A tibble: 12 x 5
## Year Month Sales Forecast SI
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2018 January 10 10 0.05
## 2 2018 February 10 NA 0.05
## 3 2018 March 10 NA 0.05
## 4 2018 April 50 NA 0.25
## 5 2018 May 150 NA 0.75
## 6 2018 June 400 NA 2
## 7 2018 July 600 NA 3
## 8 2018 August 700 NA 3.5
## 9 2018 September 350 NA 1.75
## 10 2018 October 100 NA 0.5
## 11 2018 November 10 NA 0.05
## 12 2018 December 10 NA 0.05
We could, if we wanted, compute the exponential smoothing of sales data over this one year period. However, we cannot incorporate seasonality into our models until we have completed one complete cycle (year in this case). We will not worry about computing exponential smoothing over this first year, and turn our attention to computing the seasonally adjusted forecast for the next year.
Once we have computed the seasonality for a year, we can use that information to apply seasonality to future years.
We will now look at the 2019 data, and apply the seasonality index as follows:
icecream_2019 <- icecream %>% filter(Year==2019)
icecream_2019$SI <- icecream_2018$SI
icecream_2019
## # A tibble: 12 x 5
## Year Month Sales Forecast SI
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 2019 January 9 NA 0.05
## 2 2019 February 8 NA 0.05
## 3 2019 March 11 NA 0.05
## 4 2019 April 53 NA 0.25
## 5 2019 May 160 NA 0.75
## 6 2019 June 390 NA 2
## 7 2019 July 590 NA 3
## 8 2019 August 720 NA 3.5
## 9 2019 September 370 NA 1.75
## 10 2019 October 120 NA 0.5
## 11 2019 November 12 NA 0.05
## 12 2019 December 8 NA 0.05
Next, we will create two additional columns for de-seasonalized sales and de-seasonalized forecasts as follows. We also need to start with an additional forecast amount. Finally, we rename the columns so that they agree with the convention used in the lectures.
icecream_2019$Sales_de = rep(NA,12)
icecream_2019$Forecast_de = rep(NA,12)
# Given
icecream_2019$Forecast[1] <- 10
names(icecream_2019)<-c('Year','Month','Actual','Forecast','SI','Actual_de','Forecast_de')
icecream_2019
## # A tibble: 12 x 7
## Year Month Actual Forecast SI Actual_de Forecast_de
## <dbl> <chr> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 2019 January 9 10 0.05 NA NA
## 2 2019 February 8 NA 0.05 NA NA
## 3 2019 March 11 NA 0.05 NA NA
## 4 2019 April 53 NA 0.25 NA NA
## 5 2019 May 160 NA 0.75 NA NA
## 6 2019 June 390 NA 2 NA NA
## 7 2019 July 590 NA 3 NA NA
## 8 2019 August 720 NA 3.5 NA NA
## 9 2019 September 370 NA 1.75 NA NA
## 10 2019 October 120 NA 0.5 NA NA
## 11 2019 November 12 NA 0.05 NA NA
## 12 2019 December 8 NA 0.05 NA NA
Now we apply the following starting with the first row.
\[ Actual\_de_t = \frac{Sales_t}{SI_t}\] \[ Forecast\_de_t = \frac{Forecast_t}{SI_t} \]
Next, we compute the exponential smoothed forecast based on these two values, and then re-seasonalize it based on the SI for that row…
\[ Forecast_{t+1}=(Forecast\_de_t + \alpha(Actual\_de_t-Forecast\_de_t)) \] We are now ready to compute the Actual_de and Forecast_de for the next row.
The following code implements this in R…
for (t in seq(1,nrow(icecream_2019))){
alpha = 0.5
# De-seasonalize
deA <- icecream_2019$Actual[t]/icecream_2019$SI[t]
icecream_2019$Actual_de[t] <- deA
deF <- icecream_2019$Forecast[t]/icecream_2019$SI[t]
icecream_2019$Forecast_de[t] <- deF
# ES Step
es <- deF + alpha * (deA - deF)
# Re-seasonalize and assign to next row (if it exists)
if(t<nrow(icecream_2019)){
icecream_2019$Forecast[t+1] <- es * icecream_2019$SI[t+1]
}
}
icecream_2019
## # A tibble: 12 x 7
## Year Month Actual Forecast SI Actual_de Forecast_de
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2019 January 9 10 0.05 180 200
## 2 2019 February 8 9.5 0.05 160 190
## 3 2019 March 11 8.75 0.05 220 175
## 4 2019 April 53 49.4 0.25 212 198.
## 5 2019 May 160 154. 0.75 213. 205.
## 6 2019 June 390 418. 2 195 209.
## 7 2019 July 590 606. 3 197. 202.
## 8 2019 August 720 698. 3.5 206. 199.
## 9 2019 September 370 354. 1.75 211. 203.
## 10 2019 October 120 103. 0.5 240 207.
## 11 2019 November 12 11.2 0.05 240 223.
## 12 2019 December 8 11.6 0.05 160 232.
Putting both datasets together (and using the original names) we have the following:
icecream_forecast <- icecream_2018
names(icecream_forecast) <- c('Year','Month','Sales','SI')
icecream_forecast$Sales_de <- rep(NA,12)
icecream_forecast$Forecast_de <- rep(NA,12)
icecream_forecast$Forecast <- rep(NA,12)
icecream_forecast$SI <- icecream_2018$SI
icecream_forecast <- icecream_forecast %>% dplyr::select(Year, Month, Sales, Forecast, SI, Sales_de, Forecast_de)
names(icecream_2019) <- c('Year','Month','Sales','Forecast','SI','Sales_de','Forecast_de')
icecream_forecast <- rbind(icecream_forecast, icecream_2019)
as.data.frame(icecream_forecast)
## Year Month Sales Forecast SI Sales_de Forecast_de
## 1 2018 January 10 NA 0.05 NA NA
## 2 2018 February 10 NA 0.05 NA NA
## 3 2018 March 10 NA 0.05 NA NA
## 4 2018 April 50 NA 0.25 NA NA
## 5 2018 May 150 NA 0.75 NA NA
## 6 2018 June 400 NA 2.00 NA NA
## 7 2018 July 600 NA 3.00 NA NA
## 8 2018 August 700 NA 3.50 NA NA
## 9 2018 September 350 NA 1.75 NA NA
## 10 2018 October 100 NA 0.50 NA NA
## 11 2018 November 10 NA 0.05 NA NA
## 12 2018 December 10 NA 0.05 NA NA
## 13 2019 January 9 10.00000 0.05 180.0000 200.0000
## 14 2019 February 8 9.50000 0.05 160.0000 190.0000
## 15 2019 March 11 8.75000 0.05 220.0000 175.0000
## 16 2019 April 53 49.37500 0.25 212.0000 197.5000
## 17 2019 May 160 153.56250 0.75 213.3333 204.7500
## 18 2019 June 390 418.08333 2.00 195.0000 209.0417
## 19 2019 July 590 606.06250 3.00 196.6667 202.0208
## 20 2019 August 720 697.70312 3.50 205.7143 199.3438
## 21 2019 September 370 354.42578 1.75 211.4286 202.5290
## 22 2019 October 120 103.48940 0.50 240.0000 206.9788
## 23 2019 November 12 11.17447 0.05 240.0000 223.4894
## 24 2019 December 8 11.58723 0.05 160.0000 231.7447
One objective that we have with analyzing time series data is to analyze forecasts errors. The simplest way to compute forecast error is by the formula:
\[e_t = A_t - F_t\] Where e is the error, and A and F are the actual and forecasted values for the time series.
In addition to these measure, we can also compute the following error metrics:
RSFE - Running Sum of Forecast Error
\(RSFE=\sum{(A_i-Fi)} = \sum{e_i}\)
MFE - Mean Forecast Error (Bias)
\(MFE=\frac{\sum(A_i-F_i)}{n}=\frac{RSFE}{n}\)
MAD - Mean Absolute Deviation
\(MAD=\frac{\sum{|A_i-F_i|}}{n}\)
TS - Tracking Signal
\(TS = \frac{RSFE}{MAD}\)
We will start by using the trend analysis used above.
data <- tnd %>% select('Month','Actual','Fit')
names(data) <- c('Month','Actual','Forecast')
data <- data[1:6,]
data
## # A tibble: 6 x 3
## Month Actual Forecast
## <chr> <dbl> <dbl>
## 1 Jan 1325 1370
## 2 Feb 1353 1316
## 3 March 1305 1342.
## 4 Apr 1275 1294.
## 5 May 1210 1253.
## 6 Jun 1195 1176.
The following function will calculate the various metrics for a given set of data.
calc_error <- function(data) {
new_data <- data
n = nrow(data)
error <- data$Actual-data$Forecast
rsfe <- sum(data$Actual-data$Forecast)
mfe <- rsfe/n
mad <- sum(abs(data$Actual-data$Forecast))/n
ts <- rsfe/mad
return(data.frame(rsfe=rsfe, mfe=mfe, mad=mad, ts=ts))
}
calc_error(data)
## rsfe mfe mad ts
## 1 -88.2128 -14.70213 33.52987 -2.630872
The general rule for these metrics is to investigate when the tracking signal falls outside of the range (-4,4). In this case, the TS is within the bounds.
However, if we look at the exponential smoothing model…
data <- es2 %>% select('Month','Actual','Forecast')
names(data) <- c('Month','Actual','Forecast')
data <- data[1:6,]
new_data <- data
calc_error(new_data)
## rsfe mfe mad ts
## 1 -418.7888 -69.79813 69.79813 -6
Here we can see that the tracking signal exceeds -4.
Note, these calculations can be done in an ongoing basis like follows:
TS <- data.frame(ts=NA)
for(x in seq(2,nrow(new_data))){
td <- new_data[1:x,]
err <- calc_error(td)$ts
TS <- rbind(TS, data.frame(ts=err))
}
new_data$ts <- TS$ts
new_data
## # A tibble: 6 x 4
## Month Actual Forecast ts
## <chr> <dbl> <dbl> <dbl>
## 1 Jan 1325 1370 NA
## 2 Feb 1353 1361 -2
## 3 March 1305 1359. -3
## 4 Apr 1275 1349. -4
## 5 May 1210 1334. -5
## 6 Jun 1195 1309. -6
Clearly in May, the tracking signal starts to fail the \(|ts|<4\) test. As a result, the process should be investigated further.
This completes this write-up on time series analysis with exponential smoothing, trending, seasonality (with some hand-waving), and error estimation.