An Intro to Timeseries Modeling with Prophet

CUNY Data 621 - Spring 2022

Author

Jeff Parks

Code
#install.packages("devtools")
#devtools::install_github("FinYang/tsdl")
#install.packages("prophet")

library(tsdl)
library(prophet)
library(kableExtra)

Timeseries Forecasting

The data science ecosystem abounds with solutions for timeseries forecasting, and depending on the need, these projects can become extremely complicated.

One point to consider is the trade-off between speed and accuracy for your analysis. If you’re forecasting potential flu transmission rates based on public health data, accuracy is probably going to be a lot more important than speed. If you’re in the business of buying advertising for carbonated soft drinks, however, then “fast and fairly accurate” is probably going to be more valuable than “slow and extremely precise.”

In my work with media agencies and brands, we are often asked to help forecast media budgets based on historical sales or survey data with seasonal patterns – and most of the time, it falls into the “fast and fairly accurate” category. In these cases, Prophet is a great starting point to work up forecasts quickly.

Prophet

Prophet is a powerful, but easy-to-implement package for forecasting timeseries data. It is an open-source project created by the Facebook/Meta data science team, and runs on both R and Python. According to the documentation, Prophet works best on timeseries data with “strong seasonal effects and several seasons of historical data.”

As long as these conditions are met, Prophet is great at producing viable forecasts right out of the box, automatically detecting seasonality patterns within the data - however it is also flexible enough to be fine-tuned for specific purposes.

Tasty Cola Sales Forecast

To demonstrate how quickly one can work up a viable forecast, we’ll examine a timeseries dataset representing the monthly sales of a soft drink brand “Tasty Cola” in millions of units. We’ll load in three years of historical sales data, and then ask Prophet to provide a sales forecast for Year 4.

1. Load and Transform Data

The source of our data is the Timeseries Data Library Package (TSDL), which includes nearly 650 sample timeseries datasets from a variety of industries and disciplines.

Code
# Monthly Sales of Tasty Cola (Bowerman & O'Connell, 1993)
cola_monthly_sales <- subset(tsdl,"Sales")[6][[1]]

The input to the Prophet model is very simple. It should be a dataframe with two columns: ds for the dates (which can be either a Date or a Timestamp object) and y containing the numeric value for each date’s observation.

In this case, however, TSDL produces a TimeSeries object with monthly text labels and numeric year labels. With a little transformation, we’ll turn this into a dataframe that Prophet can read.

   Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
1  189  229  249  289  260  431  660  777  915  613  485  277
2  244  296  319  370  313  556  831  960 1152  759  607  371
3  298  378  373  443  374  660 1004 1153 1388  904  715  441
Code
# convert ts to df
start_date <- as.Date("1990-01-01") 
end_date <- length(cola_monthly_sales)

dates <- seq.Date(from = start_date, by = "month", length.out = end_date)

df <- data.frame(ds = dates, y = as.vector(cola_monthly_sales))
ds y
1990-01-01 189
1990-02-01 229
1990-03-01 249
1990-04-01 289
1990-05-01 260
1990-06-01 431

2. Fit the Model

That was the hard part! Now all we have to do is fit the model, decide how far out we wish to forecast, and then make predictions.

We fit the model by passing our dataframe to the prophet() function.

Code
model <- prophet(df)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

(Note from the messages that the daily and weekly seasonality are automatically disabled by Prophet for this monthly dataset.)

3. Make Predictions

With our model trained on three years of historic data (1990, 1991, 1992), we’ll predict sales for the year 1993 - in this case, twelve monthly periods.

The function make_future_dataframe() creates a new dataframe to receive our predictions.

Code
df_1993 <- make_future_dataframe(model, periods=12, freq='month')

Next we make the predictions and append them to our dataframe with the predict() function.

Code
predict_1993 <- predict(model, df_1993)
ds yhat
37 1993-01-01 622.9739
38 1993-02-01 686.0834
39 1993-03-01 595.7136
40 1993-04-01 659.9119
41 1993-05-01 607.3526
42 1993-06-01 855.8760
43 1993-07-01 1156.5733
44 1993-08-01 1292.3222
45 1993-09-01 1492.1747
46 1993-10-01 1072.2187
47 1993-11-01 908.0323
48 1993-12-01 663.1476

Prophet’s built-in plot() function displays the results. This looks like a pretty viable forecast for a first run - we can clearly see a monthly seasonal trend with sales peaking in the summer months:

Code
plot(model, predict_1993)

The prophet_plot_components function provides more granularity to this analysis. In this case we can see the underlying rising trend in sales (a mostly linear trend), but also the steep fluctuations in sales for individual months:

Code
prophet_plot_components(model, predict_1993)

Moving Ahead

As you can see, Prophet is a pretty powerful tool when you need to get a quick read on well-defined seasonal data. If you haven’t worked with much timeseries data yet, the Prophet + TSDL combination is a great way to start exploring!

More Reading