Code
#install.packages("devtools")
#devtools::install_github("FinYang/tsdl")
#install.packages("prophet")
library(tsdl)
library(prophet)
library(kableExtra)CUNY Data 621 - Spring 2022
#install.packages("devtools")
#devtools::install_github("FinYang/tsdl")
#install.packages("prophet")
library(tsdl)
library(prophet)
library(kableExtra)The data science ecosystem abounds with solutions for timeseries forecasting, and depending on the need, these projects can become extremely complicated.
One point to consider is the trade-off between speed and accuracy for your analysis. If you’re forecasting potential flu transmission rates based on public health data, accuracy is probably going to be a lot more important than speed. If you’re in the business of buying advertising for carbonated soft drinks, however, then “fast and fairly accurate” is probably going to be more valuable than “slow and extremely precise.”
In my work with media agencies and brands, we are often asked to help forecast media budgets based on historical sales or survey data with seasonal patterns – and most of the time, it falls into the “fast and fairly accurate” category. In these cases, Prophet is a great starting point to work up forecasts quickly.
Prophet is a powerful, but easy-to-implement package for forecasting timeseries data. It is an open-source project created by the Facebook/Meta data science team, and runs on both R and Python. According to the documentation, Prophet works best on timeseries data with “strong seasonal effects and several seasons of historical data.”
As long as these conditions are met, Prophet is great at producing viable forecasts right out of the box, automatically detecting seasonality patterns within the data - however it is also flexible enough to be fine-tuned for specific purposes.
To demonstrate how quickly one can work up a viable forecast, we’ll examine a timeseries dataset representing the monthly sales of a soft drink brand “Tasty Cola” in millions of units. We’ll load in three years of historical sales data, and then ask Prophet to provide a sales forecast for Year 4.
The source of our data is the Timeseries Data Library Package (TSDL), which includes nearly 650 sample timeseries datasets from a variety of industries and disciplines.
# Monthly Sales of Tasty Cola (Bowerman & O'Connell, 1993)
cola_monthly_sales <- subset(tsdl,"Sales")[6][[1]]The input to the Prophet model is very simple. It should be a dataframe with two columns: ds for the dates (which can be either a Date or a Timestamp object) and y containing the numeric value for each date’s observation.
In this case, however, TSDL produces a TimeSeries object with monthly text labels and numeric year labels. With a little transformation, we’ll turn this into a dataframe that Prophet can read.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 189 229 249 289 260 431 660 777 915 613 485 277
2 244 296 319 370 313 556 831 960 1152 759 607 371
3 298 378 373 443 374 660 1004 1153 1388 904 715 441
# convert ts to df
start_date <- as.Date("1990-01-01")
end_date <- length(cola_monthly_sales)
dates <- seq.Date(from = start_date, by = "month", length.out = end_date)
df <- data.frame(ds = dates, y = as.vector(cola_monthly_sales))| ds | y |
|---|---|
| 1990-01-01 | 189 |
| 1990-02-01 | 229 |
| 1990-03-01 | 249 |
| 1990-04-01 | 289 |
| 1990-05-01 | 260 |
| 1990-06-01 | 431 |
That was the hard part! Now all we have to do is fit the model, decide how far out we wish to forecast, and then make predictions.
We fit the model by passing our dataframe to the prophet() function.
model <- prophet(df)Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
(Note from the messages that the daily and weekly seasonality are automatically disabled by Prophet for this monthly dataset.)
With our model trained on three years of historic data (1990, 1991, 1992), we’ll predict sales for the year 1993 - in this case, twelve monthly periods.
The function make_future_dataframe() creates a new dataframe to receive our predictions.
df_1993 <- make_future_dataframe(model, periods=12, freq='month')Next we make the predictions and append them to our dataframe with the predict() function.
predict_1993 <- predict(model, df_1993)| ds | yhat | |
|---|---|---|
| 37 | 1993-01-01 | 622.9739 |
| 38 | 1993-02-01 | 686.0834 |
| 39 | 1993-03-01 | 595.7136 |
| 40 | 1993-04-01 | 659.9119 |
| 41 | 1993-05-01 | 607.3526 |
| 42 | 1993-06-01 | 855.8760 |
| 43 | 1993-07-01 | 1156.5733 |
| 44 | 1993-08-01 | 1292.3222 |
| 45 | 1993-09-01 | 1492.1747 |
| 46 | 1993-10-01 | 1072.2187 |
| 47 | 1993-11-01 | 908.0323 |
| 48 | 1993-12-01 | 663.1476 |
Prophet’s built-in plot() function displays the results. This looks like a pretty viable forecast for a first run - we can clearly see a monthly seasonal trend with sales peaking in the summer months:
plot(model, predict_1993)The prophet_plot_components function provides more granularity to this analysis. In this case we can see the underlying rising trend in sales (a mostly linear trend), but also the steep fluctuations in sales for individual months:
prophet_plot_components(model, predict_1993)As you can see, Prophet is a pretty powerful tool when you need to get a quick read on well-defined seasonal data. If you haven’t worked with much timeseries data yet, the Prophet + TSDL combination is a great way to start exploring!
Time Series Data Library (TSDL)
https://pkg.yangzhuoranyang.com/tsdl/articles/tsdl.html