In this session, we will learn about time series analysis and perform forecasting using several exponential smoothing methods, including: - Simple Exponential Smoothing - Holt’s Linear Trend Method - Holt-Winters Method
The dataset we will use contains annual sales data from the year 2000 to 2016. Based on this historical data, the objective of this analysis is to: Forecast sales for the next two years, specifically for 2017 and 2018.
# import libs
library(tidyverse)
library(lubridate)
library(forecast)
library(TTR)
library(fpp)
library(tseries)
library(TSstudio)
library(padr)
year <- c(2000,2001,2002,2003,2004,2005,2006,2007,
2008,2009,2010,2011,2012,2013,2014,2015,
2016)
sales <- c(156, 161, 189, 182, 224, 258, 283, 325,
332, 388, 475, 502, 537, 584, 631,
704, 689)
df <- data.frame(year, sales)
df
# create ts object
df_ts <- ts(data = df$sales, start = 2000, frequency = 1)
df_ts
#> Time Series:
#> Start = 2000
#> End = 2016
#> Frequency = 1
#> [1] 156 161 189 182 224 258 283 325 332 388 475 502 537 584 631 704 689
# visualize the data
df_ts %>%
autoplot()
Between the years 2000 and 2015, the sales showed a positive upward trend. However, after 2015, the sales began to show a declining trend.
Simple Exponential Smoothing (SES) is used when data has no trend or seasonality.
# Apply SES
ses_model <- ses(df_ts, h = 2)
ses_model
#> Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> 2017 689.0015 632.7974 745.2056 603.0448 774.9582
#> 2018 689.0015 609.5209 768.4821 567.4464 810.5566
plot(ses_model)
accuracy(ses_model)
#> ME RMSE MAE MPE MAPE MASE ACF1
#> Training set 31.35529 41.19581 33.94409 8.12971 8.839146 0.9412573 -0.0117723
Use when data has a trend but no seasonality.
holt_model <- holt(df_ts, h = 2)
holt_model
#> Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> 2017 731.895 694.600 769.1900 674.8572 788.9328
#> 2018 766.887 718.803 814.9711 693.3488 840.4253
plot(holt_model)
accuracy(holt_model)
#> ME RMSE MAE MPE MAPE MASE
#> Training set -0.3203326 25.44848 19.17602 -1.728756 6.103914 0.531744
#> ACF1
#> Training set -0.003815317
Use when data has both trend and seasonality.
# If you have monthly data with seasonality
df_monthly <- ts(sales, start = 2000, frequency = 12)
hw_model <- hw(df_monthly, seasonal = "additive", h = 12)
hw_model
#> Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> Jun 2001 756.9330 724.8649 789.0012 707.88909 805.9770
#> Jul 2001 791.0206 737.5729 844.4683 709.27940 872.7618
#> Aug 2001 825.1081 733.1534 917.0629 684.47544 965.7409
#> Sep 2001 859.1957 718.1915 1000.1999 643.54844 1074.8429
#> Oct 2001 893.2832 695.3114 1091.2551 590.51142 1196.0551
#> Nov 2001 927.3708 665.7519 1188.9897 527.25919 1327.4824
#> Dec 2001 961.4583 630.2579 1292.6587 454.93103 1467.9856
#> Jan 2002 995.5459 589.3472 1401.7445 374.31868 1616.7731
#> Feb 2002 1029.6334 543.4112 1515.8557 286.02069 1773.2462
#> Mar 2002 1063.7210 492.7616 1634.6804 190.51395 1936.9280
#> Apr 2002 1097.8085 437.6557 1757.9614 88.19191 2107.4251
#> May 2002 1131.8961 378.3112 1885.4810 -20.61253 2284.4047
plot(hw_model)
# Moving average (e.g., 2-period)
ma2 <- SMA(df_ts, n = 2)
ma2
#> Time Series:
#> Start = 2000
#> End = 2016
#> Frequency = 1
#> [1] NA 158.5 175.0 185.5 203.0 241.0 270.5 304.0 328.5 360.0 431.5 488.5
#> [13] 519.5 560.5 607.5 667.5 696.5
plot(df_ts, type = "l")
lines(ma2, col = "blue", lwd = 2)
model_linear <- lm(sales ~ year)
summary(model_linear)
#>
#> Call:
#> lm(formula = sales ~ year)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -57.412 -21.652 1.132 18.676 63.804
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -74211.725 3328.981 -22.29 0.000000000000651 ***
#> year 37.152 1.658 22.41 0.000000000000603 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 33.49 on 15 degrees of freedom
#> Multiple R-squared: 0.971, Adjusted R-squared: 0.9691
#> F-statistic: 502.2 on 1 and 15 DF, p-value: 0.0000000000006034
# Predict future years
future <- data.frame(year = c(2017, 2018))
predict(model_linear, newdata = future)
#> 1 2
#> 723.7794 760.9314
Use ACF (Autocorrelation Function) to detect patterns
acf(df_ts, main = "ACF Plot of Time Series")
Interpretation:
accuracy(model_linear)
#> ME RMSE MAE MPE MAPE
#> Training set 0.000000000000002507632 31.45557 25.26759 0.7846039 9.524381
#> MASE
#> Training set 0.153837
accuracy(ses_model)
#> ME RMSE MAE MPE MAPE MASE ACF1
#> Training set 31.35529 41.19581 33.94409 8.12971 8.839146 0.9412573 -0.0117723
accuracy(holt_model)
#> ME RMSE MAE MPE MAPE MASE
#> Training set -0.3203326 25.44848 19.17602 -1.728756 6.103914 0.531744
#> ACF1
#> Training set -0.003815317
accuracy(hw_model)
#> ME RMSE MAE MPE MAPE MASE
#> Training set 5.132598 25.02288 19.90842 2.400402 5.609929 0.04457775
#> ACF1
#> Training set 0.009726304
Based on the accuracy measures above, it can be concluded that the Holt-Winters model is the most suitable for forecasting the next two years (2017 and 2018), as it has the lowest RMSE value (24.74) compared to the other models.