In this post, we are going to take the proposal model by Facebook S.J.Taylor & Letham, 2018, which was first introduced to produce forecast of time series which expose daily, weekly, yearly seasonality together with the presence of holiday effects. Thus, Additionally, noticeable features of Prophet model could be listed as follows: Accurate and fast, Automatic, Easy to be tunable and Available on both R and Python. To recap this brief introduction, we want to emphasize that Prophet was developed to . That is, Prophet will provide a framework to diminish the cost of entry for those people who are struck with an “in-the-loop” comprehension of the problems they are trying to solve — with automated of time series forecasting. (See the diagram below !)
Prophet model ultilizes a decomposition of a time series model with a classical components (trend, seasonality, and holidays). The following equation is its simplest form: \[\text{y}_{t} = g(t) + s(t) + h(t) + \epsilon_{t}, \]
where \(g(t)\) describes a non-periodic changes (i.e. growth over time), \(s(t)\) illustrates different seasonal patterns (weekly, yearly, …), \(h(t)\) is the holiday effects, and \(\epsilon_{t}\) captures the white-noise pattern in error term. Further detail of the behind mathematics could be revised at the original paper S.J.Taylor & Letham, 2018
Transferring data preprocessing from the previous post, we want to retrieve the food sales in California in the end (we store it at food.CA variable !!!). Since Prophet is efficient for forecasting short horizon, we will restrict this post to forecast 28-days ahead.
Load Package
# Time series decomposition
library("feasts") # Using STL
library('tidyverse')
library('tsibble')
library('fable')
library('stR') # Using Seasonal Trend regression model
library('bsts')
# Date + forecast
library('lubridate') # date and time
library('prophet') # time series analysis
library('timetk') # time series analysis
library('timeDate') # time series analysis
# general data manipulation
library('dplyr') # data manipulation
library('readr') # input/output
library('vroom') # input/output
library('tibble') # data wrangling
library('tidyr') # data wrangling
# Prophet model
library(fable.prophet)
Load data
rm(list = ls())
# Preventing clash from Dplyr and MASS package for select function
select <- dplyr::select
train_df <- vroom("D:\\Dataset\\sales_train_validation.csv", delim = ",", col_types = cols())
price_df <- vroom("D:\\Dataset\\sell_prices.csv", delim = ",",
col_types = cols())
calendar_df <- vroom("D:\\Dataset\\calendar.csv", col_types = cols())
test_df <- vroom("D:\\Dataset\\sales_train_evaluation.csv", delim = ",",
col_types = cols())
# Retrieving special holidays
holidays <- c(USThanksgivingDay(2011:2016), USChristmasDay(2011:2016))@Data
holidays <- as.Date(holidays)
holidays_df <- holidays %>%
as.data.frame() %>% bind_cols(as.data.frame(rep(c("ThanksGiving", "Christmas"), each = 6)))
names(holidays_df) <- c("date", "holiday")
holidays_df <- holidays_df %>%
as_tsibble(index = date)
Function to transform dataframe from wide to long format
wide_to_long <- function(data){
start_date <- date("2011-01-29")
data %>%
select(id, starts_with("d_")) %>%
pivot_longer(starts_with("d_"), names_to = "dates",
values_to = "sales") %>% # group days columns with values equal to number of sales
mutate(dates = as.integer(str_remove(dates, "d_"))) %>% # remove the prefix "d_"
mutate(dates = start_date + dates - 1) %>% # operation to transform into date format
mutate(dates = ymd(dates)) %>% # transform to date class
mutate(id = str_remove(id, "_validation")) # remove suffix "_validation" for each type of product
}
Prepare data
start_date = date("2011-01-29")
temp <- train_df %>%
group_by(cat_id, state_id) %>%
summarise_at(vars(starts_with("d_")), sum) %>%
ungroup() %>%
select(ends_with("id"), starts_with("d_")) %>%
pivot_longer(starts_with("d_"), names_to = "dates", values_to = "sales") %>%
mutate(dates = as.integer(str_remove(dates, "d_"))) %>%
mutate(dates = start_date + dates - 1) %>%
mutate(dates = ymd(dates)) %>%
mutate(dummy = if_else(dates %in% holidays, 1, 0)) %>%
mutate(dow = wday(dates, label = TRUE))
# filter(!str_detect(as.character(dates), '..-12-25'))
# replace_na(list(sales = 0))
temp_ts <- temp %>%
as_tsibble(key = c(cat_id, state_id), index = dates)
food.CA <- temp_ts %>%
filter(cat_id == "FOODS" & state_id == "CA") %>%
select(dates, sales, dummy, dow)
Forecast 1 Week and Compare with other models (ARIMA & ETS)
train <- food.CA %>% filter(dates < "2016-04-18")
fit <- train %>%
model(additive_ets = ETS(sales ~ error("A") + trend("A") + season("A")), multiplicative_ets = ETS(sales ~ error("M") + trend("A") + season("M")),
arima = ARIMA(sales),
prophet = prophet(sales ~ growth(type = "linear") + holiday(holidays = holidays_df) + dow + season(period = "year", order = 10, type = "additive") + season(period = "week", order = 3, type = "multiplicative")))
fc_xreg <- food.CA %>%
slice(tail(row_number(), 7)) %>% select(dow)
fc <- fit %>%
forecast(new_data = fc_xreg)
fc %>%
autoplot(food.CA %>% filter(dates <= "2016-04-24" & dates >= "2016-03-24"),
level = 95) +
labs(y = "Sale Volumes",
title = "Food Sale in California")
fc_accuracy <- accuracy(fc, food.CA, measures = list(
point_accuracy_measures,
interval_accuracy_measures,
distribution_accuracy_measures))
fc_accuracy %>%
group_by(.model) %>%
summarise(RMSE = mean(RMSE),
MAE = mean(MAE),
MASE = mean(MASE),
MAPE = mean(MAPE),
Winkler = mean(winkler),
CRPS = mean(CRPS)
) %>%
arrange(MAPE)
## # A tibble: 4 x 7
## .model RMSE MAE MASE MAPE Winkler CRPS
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 multiplicative_ets 246. 162. 0.162 1.40 5300. 332.
## 2 prophet 324. 238. 0.239 2.08 3686. 264.
## 3 additive_ets 332. 239. 0.240 2.14 4594. 309.
## 4 arima 354. 310. 0.311 2.59 4388. 304.
# Fit prophet alone
fit_prophet <- train %>%
model(prophet(sales ~ growth(type = "linear") + holiday(holidays = holidays_df) + dow + season(period = "year", order = 10, type = "additive") + season(period = "week", order = 3, type = "multiplicative")))
# Extract the components
## Prophet model
fit_prophet %>%
fabletools::components() %>%
autoplot()
## Additive ETS
fit %>% select(additive_ets) %>%
fabletools::components() %>%
autoplot()
## Warning: Removed 7 row(s) containing missing values (geom_path).
## Multiplicative ETS
fit %>% select(multiplicative_ets) %>%
fabletools::components() %>%
autoplot()
## Warning: Removed 7 row(s) containing missing values (geom_path).
## Checking residuals of Prophet model
fit_prophet %>% gg_tsresiduals()
The model assumes to have a linear growth trend, which is suitable with our sales level. There is also substantial remaining autocorrelation in the residuals. Although Prophet has the advantage of being much faster to estimate other classical methods such as ETS or ARIMA and it is completely automated, it rarely gives better forecast accuracy than the alternative approaches, as the example above illustrated.