Introduction

The purpose of this document is to outline exponential smoothing techniques for dealing with time series data in operations research. This document will cover the basics of exponential smoothing including trending and include specific examples covered in the Georgia Tech MGMT-6203 course. The contents included here are NOT part of a homework assignment, or exam, but rather just a “re-working” of the examples used in the course content. Georgia Tech is very much concerned about academic integrity and honesty. My goal is not to cheat, but rather the help others understand the lectures.

The original work was done by Bob Myers, Lecturer, Georgia Institute of Technology, MGMT-6203 Data Analytics for Business, Summer 2020.

Exponential Smoothing

Assume that we have a set of time series data, and we would like to perform exponential smoothing on it. For standard exponential smoothing, we need a single smoothing parameter \(\alpha\) that indicates how much smoothing to apply. To see how this parameter works, consider the following equation:

\[F_{t+1}=F_t + \alpha(A_{t}-F_{t})\] In this equation, the F values are the forecasts, the A value is the actual measurement, and \(\alpha\) is the smoothing coefficient such that \(0\leq{\alpha}\leq1\). In the equation above, the next estimate in the smoothing sequence is the last estimate, plus some portion of the error in the previous step. To see this in action, consider the following data:

rm(list=ls())
library(tidyverse)

## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.1     ✓ purrr   0.3.4
## ✓ tibble  3.0.1     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
data <- read_csv('es_demand.csv')

## Parsed with column specification:
## cols(
##   Week = col_double(),
##   Actual = col_double(),
##   Forecast = col_double()
## )

data

## # A tibble: 10 x 3
##     Week Actual Forecast
##    <dbl>  <dbl>    <dbl>
##  1     1    820      820
##  2     2    775       NA
##  3     3    680       NA
##  4     4    655       NA
##  5     5    750       NA
##  6     6    802       NA
##  7     7    798       NA
##  8     8    689       NA
##  9     9    775       NA
## 10    10     NA       NA

In the above the values of the Demand are the actual \(A_t\) values, and forecast is the \(F_t\) values.

To start with exponential smoothing, we need to establish an initial forecast. In this example, we just use the actual for that period. In other words, we set \(F_1=A_1=820\). If we have a defined \(\alpha\), we can calculate the sequence of forecasts. One way to do that is shown in the code below.

alpha <- 0.2
new_data <- data

get_es <- function(new_data, alpha){
  
  for(t in seq(2, nrow(new_data))){
    a <- new_data$Actual[t-1]
    f <- new_data$Forecast[t-1]
    nextf <- f + alpha * (a - f)
    new_data$Forecast[t] <- nextf
  }
  return(new_data)
}

get_es(new_data,alpha)

## # A tibble: 10 x 3
##     Week Actual Forecast
##    <dbl>  <dbl>    <dbl>
##  1     1    820     820 
##  2     2    775     820 
##  3     3    680     811 
##  4     4    655     785.
##  5     5    750     759.
##  6     6    802     757.
##  7     7    798     766.
##  8     8    689     772.
##  9     9    775     756.
## 10    10     NA     760.

Exponential Smoothing with Trending

Consider the following data for soup sales.

kroger <- read_csv('kroger.csv')

## Parsed with column specification:
## cols(
##   Month = col_character(),
##   Actual = col_double(),
##   Forecast = col_double(),
##   Trend = col_double(),
##   Fit = col_double()
## )

kroger

## # A tibble: 7 x 5
##   Month Actual Forecast Trend   Fit
##   <chr>  <dbl>    <dbl> <dbl> <dbl>
## 1 Jan     1325     1370     0  1370
## 2 Feb     1353       NA    NA    NA
## 3 March   1305       NA    NA    NA
## 4 Apr     1275       NA    NA    NA
## 5 May     1210       NA    NA    NA
## 6 Jun     1195       NA    NA    NA
## 7 Jul       NA       NA    NA    NA

Trending allows us to account for a trend in the data. By including trending, we can much more accurately forecast time series data. To account for trending, we use the following equations:

\[F_t=FIT_{t-1}+\alpha(A_{t-1}-FIT_{t-1})\] \[ T_t=T_{t-1}+\delta(F_t-FIT_{t-1})\] \[FIT_t=F_t+T_t\]

We make the calculations in the order that they are described above. The following R code provides an example of this based on the data set above.

library(ggplot2)

alpha <- 0.8
delta <- 0.5


get_trend <- function(data, alpha, delta) {
  new_data <- data
  for(t in seq(2, nrow(new_data))){
    Actual <- new_data$Actual[t-1]
    Forecast <- new_data$Forecast[t-1]
    Trend <- new_data$Trend[t-1]
    Fit <- new_data$Fit[t-1]
    
    Forecast1 = Fit + alpha*(Actual-Fit)
    Trend1 = Trend + delta*(Forecast1-Fit)
    Fit1 = Forecast1 + Trend1
    
    new_data$Forecast[t]<-Forecast1
    new_data$Trend[t]<-Trend1
    new_data$Fit[t]<-Fit1
  }
  return(new_data)
}
es0 <- get_es(kroger, alpha=0)
es2 <- get_es(kroger, alpha=0.2)
es8 <- get_es(kroger, alpha=0.8)
tnd <- get_trend(kroger, alpha=.8, delta=.5)

graph_data <- data.frame(idx=seq(1,nrow(es2)), Actual=es2$Actual, es0=es0$Forecast, es2=es2$Forecast, es8=es8$Forecast, tnd=tnd$Forecast)

ggplot(graph_data) + geom_line(aes(x=idx, y=Actual, color="Actual")) +
   geom_line(aes(x=idx, y=es0, color="ES alpha=0")) +
   geom_line(aes(x=idx, y=es2, color="ES alpha=.2 ")) +
   geom_line(aes(x=idx, y=es8, color="ES alpha=.8")) +
   geom_line(aes(x=idx, y=tnd, color="Trending alpha=.8, delta=.5")) +
   ylim(1100,1400)

## Warning: Removed 1 row(s) containing missing values (geom_path).

In the above plot, we can see how the trending line follows the actual line more closely than the two exponential smoothing lines.

Seasonality

Consider the following data…

icecream <- read_csv('icecream.csv')

## Parsed with column specification:
## cols(
##   Year = col_double(),
##   Month = col_character(),
##   Sales = col_double(),
##   Forecast = col_double()
## )

icecream

## # A tibble: 24 x 4
##     Year Month     Sales Forecast
##    <dbl> <chr>     <dbl>    <dbl>
##  1  2018 January      10       10
##  2  2018 February     10       NA
##  3  2018 March        10       NA
##  4  2018 April        50       NA
##  5  2018 May         150       NA
##  6  2018 June        400       NA
##  7  2018 July        600       NA
##  8  2018 August      700       NA
##  9  2018 September   350       NA
## 10  2018 October     100       NA
## # … with 14 more rows

In order to use seasonality, we need one complete season of data to start with. The above file has 2 years of icecream sales data. We will continue with just the 2018 data and look at seasonality index.

icecream_2018 <- icecream %>% filter(Year==2018)

Seasonality Index

Here we will consider the column “Sales” to be X. The seasonality index of sales is:

\[SI_t = X_t/mean(X)\]

The following code will compute \(S_t\) and append it to the data frame:

icecream_2018 <- icecream_2018 %>% mutate(SI=Sales/mean(icecream_2018$Sales))
icecream_2018

## # A tibble: 12 x 5
##     Year Month     Sales Forecast    SI
##    <dbl> <chr>     <dbl>    <dbl> <dbl>
##  1  2018 January      10       10  0.05
##  2  2018 February     10       NA  0.05
##  3  2018 March        10       NA  0.05
##  4  2018 April        50       NA  0.25
##  5  2018 May         150       NA  0.75
##  6  2018 June        400       NA  2   
##  7  2018 July        600       NA  3   
##  8  2018 August      700       NA  3.5 
##  9  2018 September   350       NA  1.75
## 10  2018 October     100       NA  0.5 
## 11  2018 November     10       NA  0.05
## 12  2018 December     10       NA  0.05

We could, if we wanted, compute the exponential smoothing of sales data over this one year period. However, we cannot incorporate seasonality into our models until we have completed one complete cycle (year in this case). We will not worry about computing exponential smoothing over this first year, and turn our attention to computing the seasonally adjusted forecast for the next year.

Seasonally Adjusted Forecast

Once we have computed the seasonality for a year, we can use that information to apply seasonality to future years.

We will now look at the 2019 data, and apply the seasonality index as follows:

icecream_2019 <- icecream %>% filter(Year==2019)
icecream_2019$SI <- icecream_2018$SI
icecream_2019

## # A tibble: 12 x 5
##     Year Month     Sales Forecast    SI
##    <dbl> <chr>     <dbl>    <dbl> <dbl>
##  1  2019 January       9       NA  0.05
##  2  2019 February      8       NA  0.05
##  3  2019 March        11       NA  0.05
##  4  2019 April        53       NA  0.25
##  5  2019 May         160       NA  0.75
##  6  2019 June        390       NA  2   
##  7  2019 July        590       NA  3   
##  8  2019 August      720       NA  3.5 
##  9  2019 September   370       NA  1.75
## 10  2019 October     120       NA  0.5 
## 11  2019 November     12       NA  0.05
## 12  2019 December      8       NA  0.05

Next, we will create two additional columns for de-seasonalized sales and de-seasonalized forecasts as follows. We also need to start with an additional forecast amount. Finally, we rename the columns so that they agree with the convention used in the lectures.

icecream_2019$Sales_de = rep(NA,12)
icecream_2019$Forecast_de = rep(NA,12)
# Given
icecream_2019$Forecast[1] <- 10
names(icecream_2019)<-c('Year','Month','Actual','Forecast','SI','Actual_de','Forecast_de')
icecream_2019

## # A tibble: 12 x 7
##     Year Month     Actual Forecast    SI Actual_de Forecast_de
##    <dbl> <chr>      <dbl>    <dbl> <dbl> <lgl>     <lgl>      
##  1  2019 January        9       10  0.05 NA        NA         
##  2  2019 February       8       NA  0.05 NA        NA         
##  3  2019 March         11       NA  0.05 NA        NA         
##  4  2019 April         53       NA  0.25 NA        NA         
##  5  2019 May          160       NA  0.75 NA        NA         
##  6  2019 June         390       NA  2    NA        NA         
##  7  2019 July         590       NA  3    NA        NA         
##  8  2019 August       720       NA  3.5  NA        NA         
##  9  2019 September    370       NA  1.75 NA        NA         
## 10  2019 October      120       NA  0.5  NA        NA         
## 11  2019 November      12       NA  0.05 NA        NA         
## 12  2019 December       8       NA  0.05 NA        NA

Now we apply the following starting with the first row.

\[ Actual\_de_t = \frac{Sales_t}{SI_t}\] \[ Forecast\_de_t = \frac{Forecast_t}{SI_t} \]

Next, we compute the exponential smoothed forecast based on these two values, and then re-seasonalize it based on the SI for that row…

\[ Forecast_{t+1}=(Forecast\_de_t + \alpha(Actual\_de_t-Forecast\_de_t)) \] We are now ready to compute the Actual_de and Forecast_de for the next row.

The following code implements this in R…

for (t in seq(1,nrow(icecream_2019))){
  
  alpha = 0.5
  
  # De-seasonalize
  deA <- icecream_2019$Actual[t]/icecream_2019$SI[t]
  icecream_2019$Actual_de[t] <- deA
  
  deF <- icecream_2019$Forecast[t]/icecream_2019$SI[t]
  icecream_2019$Forecast_de[t] <- deF
  
  # ES Step
  es <- deF + alpha * (deA - deF)
 
   
  # Re-seasonalize and assign to next row (if it exists)
  if(t<nrow(icecream_2019)){
    icecream_2019$Forecast[t+1] <- es * icecream_2019$SI[t+1]
  }
}

icecream_2019

## # A tibble: 12 x 7
##     Year Month     Actual Forecast    SI Actual_de Forecast_de
##    <dbl> <chr>      <dbl>    <dbl> <dbl>     <dbl>       <dbl>
##  1  2019 January        9    10     0.05      180         200 
##  2  2019 February       8     9.5   0.05      160         190 
##  3  2019 March         11     8.75  0.05      220         175 
##  4  2019 April         53    49.4   0.25      212         198.
##  5  2019 May          160   154.    0.75      213.        205.
##  6  2019 June         390   418.    2         195         209.
##  7  2019 July         590   606.    3         197.        202.
##  8  2019 August       720   698.    3.5       206.        199.
##  9  2019 September    370   354.    1.75      211.        203.
## 10  2019 October      120   103.    0.5       240         207.
## 11  2019 November      12    11.2   0.05      240         223.
## 12  2019 December       8    11.6   0.05      160         232.

The Complete-2 Year Cycle

Putting both datasets together (and using the original names) we have the following:

icecream_forecast <- icecream_2018
names(icecream_forecast) <- c('Year','Month','Sales','SI')
icecream_forecast$Sales_de <- rep(NA,12)
icecream_forecast$Forecast_de <- rep(NA,12)
icecream_forecast$Forecast <- rep(NA,12)
icecream_forecast$SI <- icecream_2018$SI
icecream_forecast <- icecream_forecast %>% dplyr::select(Year, Month, Sales, Forecast, SI, Sales_de, Forecast_de)
names(icecream_2019) <- c('Year','Month','Sales','Forecast','SI','Sales_de','Forecast_de')
icecream_forecast <- rbind(icecream_forecast, icecream_2019)
as.data.frame(icecream_forecast)

##    Year     Month Sales  Forecast   SI Sales_de Forecast_de
## 1  2018   January    10        NA 0.05       NA          NA
## 2  2018  February    10        NA 0.05       NA          NA
## 3  2018     March    10        NA 0.05       NA          NA
## 4  2018     April    50        NA 0.25       NA          NA
## 5  2018       May   150        NA 0.75       NA          NA
## 6  2018      June   400        NA 2.00       NA          NA
## 7  2018      July   600        NA 3.00       NA          NA
## 8  2018    August   700        NA 3.50       NA          NA
## 9  2018 September   350        NA 1.75       NA          NA
## 10 2018   October   100        NA 0.50       NA          NA
## 11 2018  November    10        NA 0.05       NA          NA
## 12 2018  December    10        NA 0.05       NA          NA
## 13 2019   January     9  10.00000 0.05 180.0000    200.0000
## 14 2019  February     8   9.50000 0.05 160.0000    190.0000
## 15 2019     March    11   8.75000 0.05 220.0000    175.0000
## 16 2019     April    53  49.37500 0.25 212.0000    197.5000
## 17 2019       May   160 153.56250 0.75 213.3333    204.7500
## 18 2019      June   390 418.08333 2.00 195.0000    209.0417
## 19 2019      July   590 606.06250 3.00 196.6667    202.0208
## 20 2019    August   720 697.70312 3.50 205.7143    199.3438
## 21 2019 September   370 354.42578 1.75 211.4286    202.5290
## 22 2019   October   120 103.48940 0.50 240.0000    206.9788
## 23 2019  November    12  11.17447 0.05 240.0000    223.4894
## 24 2019  December     8  11.58723 0.05 160.0000    231.7447

Comparing models

One objective that we have with analyzing time series data is to analyze forecasts errors. The simplest way to compute forecast error is by the formula:

\[e_t = A_t - F_t\] Where e is the error, and A and F are the actual and forecasted values for the time series.

In addition to these measure, we can also compute the following error metrics:

RSFE - Running Sum of Forecast Error
\(RSFE=\sum{(A_i-Fi)} = \sum{e_i}\)

MFE - Mean Forecast Error (Bias)
\(MFE=\frac{\sum(A_i-F_i)}{n}=\frac{RSFE}{n}\)

MAD - Mean Absolute Deviation
\(MAD=\frac{\sum{|A_i-F_i|}}{n}\)

TS - Tracking Signal
\(TS = \frac{RSFE}{MAD}\)

We will start by using the trend analysis used above.

data <- tnd %>% select('Month','Actual','Fit')
names(data) <- c('Month','Actual','Forecast')
data <- data[1:6,]
data

## # A tibble: 6 x 3
##   Month Actual Forecast
##   <chr>  <dbl>    <dbl>
## 1 Jan     1325    1370 
## 2 Feb     1353    1316 
## 3 March   1305    1342.
## 4 Apr     1275    1294.
## 5 May     1210    1253.
## 6 Jun     1195    1176.

The following function will calculate the various metrics for a given set of data.

calc_error <- function(data) {
  new_data <- data
  n = nrow(data)
  error <- data$Actual-data$Forecast
  rsfe <- sum(data$Actual-data$Forecast)
  mfe <- rsfe/n
  mad <- sum(abs(data$Actual-data$Forecast))/n
  ts <- rsfe/mad
  return(data.frame(rsfe=rsfe, mfe=mfe, mad=mad, ts=ts))  
}

calc_error(data)

##       rsfe       mfe      mad        ts
## 1 -88.2128 -14.70213 33.52987 -2.630872

The general rule for these metrics is to investigate when the tracking signal falls outside of the range (-4,4). In this case, the TS is within the bounds.

However, if we look at the exponential smoothing model…

data <- es2 %>% select('Month','Actual','Forecast')
names(data) <- c('Month','Actual','Forecast')
data <- data[1:6,]
new_data <- data
calc_error(new_data)

##        rsfe       mfe      mad ts
## 1 -418.7888 -69.79813 69.79813 -6

Here we can see that the tracking signal exceeds -4.

Note, these calculations can be done in an ongoing basis like follows:

TS <- data.frame(ts=NA)
for(x in seq(2,nrow(new_data))){
    td <- new_data[1:x,]
    err <- calc_error(td)$ts
    TS <- rbind(TS, data.frame(ts=err))
}

new_data$ts <- TS$ts
new_data

## # A tibble: 6 x 4
##   Month Actual Forecast    ts
##   <chr>  <dbl>    <dbl> <dbl>
## 1 Jan     1325    1370     NA
## 2 Feb     1353    1361     -2
## 3 March   1305    1359.    -3
## 4 Apr     1275    1349.    -4
## 5 May     1210    1334.    -5
## 6 Jun     1195    1309.    -6

Clearly in May, the tracking signal starts to fail the \(|ts|<4\) test. As a result, the process should be investigated further.

This completes this write-up on time series analysis with exponential smoothing, trending, seasonality (with some hand-waving), and error estimation.

Exponential Smoothing Example

Miles R. Porter

July 16, 2020