Fit and assess a Facebook Prophet model to the time series you have been assigned. Briefly discuss the methodology behind the Prophet model as it applies to time-series forecasting. An initial model can be specified using the default parameters.
Train Test Split
Code
# Set the seed for reproducibilityset.seed(123)# Determine the number of rows for training (80%) and testing (20%)n_rows <-nrow(vehicle_sales_tbl_ts)train_rows <-round(0.8* n_rows)# Split the data into training and testing setsvehicle_train <- vehicle_sales_tbl_ts %>%slice(1:train_rows)vehicle_test <- vehicle_sales_tbl_ts %>%slice((train_rows +1):n_rows)# Visualize the training and test setsggplot() +geom_line(data = vehicle_train, aes(x = date, y = value, color ="Training"), linewidth =1) +geom_line(data = vehicle_test, aes(x = date, y = value, color ="Test"), linewidth =1) +scale_color_manual(values =c("Training"="blue", "Test"="red")) +labs(title ="Training and Test Sets", x ="Date", y ="vehicle sales") +theme_minimal()
Widely used in business/data science for a number of different forecasting problems
Key benefits:
Handles multiple types of seasonality
Easy to implement
Very flexible
Fast
Works best on daily data, but suitable for all types of time series
Not especially sensitive to outliers
The algorithm draws from time-series decomposition, breaking down the time-series into:
Seasonal components: Daily, weekly, monthly, yearly, etc.
Holidays: For daily data
Trend: Estimated along the data with unique slopes identified using changepoint detection
The basic model is fit as:
yt = gt + st +ht + et
where gts the trend, stis seasonality, and ht are holidays.
Section 2
Decompose and visualize the elements of the time-series (trend, seasonality, etc.) as identified by your initial model. Examine the changepoints identified for the “trend” part of the time series.
Code
model = vehicle_train %>%model(prophet = fable.prophet::prophet(vehicle_sales))model %>%components() %>%autoplot()
The decomposition of the time series data reveals an upward trend until early 2000, followed by a downward slope thereafter. Additionally, an additive seasonality component is evident, with no indication of any multiplicative seasonality present.
Do these detected changepoints make sense for your time-series? If not, adjust the hyperparameters of the model by specifying more or fewer changepoints (n_changepoints), or by changing the proportion of the time-series through which changepoints can be identified (changepoint_range), or the prior scale (changepoint_prior_scale). You should also assess whether a linear or logistic trend makes sense for your time-series.
Code
changepoints = model %>%glance() %>%pull(changepoints) %>%bind_rows() %>%.$changepointsvehicle_train %>%ggplot()+geom_line(aes(date,vehicle_sales))+geom_vline(xintercept=as.Date(changepoints),color='red',linetype='dashed')
changepoints_less_flexible shows superior forecast predictions compared to other changepoint options, indicating that a less flexible approach may better capture the underlying patterns in the data. This suggests that simpler models with fewer changepoints may provide more reliable forecasts.
Finally, assess whether the model should take into account a saturating minimum/maximum point - if so, specify a floor and cap.
Despite adjusting the floor and capacity parameters significantly, the resulting forecasts show minimal deviation from the base Prophet model. This indicates that while these parameters theoretically influence the model’s behavior, their practical impact on forecasting accuracy may be limited in this particular dataset. Further exploration and experimentation might be warranted to better understand their effect.
Section 3
Discuss any seasonality identified by the model (daily, weekly, yearly, etc.). Does your time series seem to contain additive or multiplicative seasonality? If appropriate, and examining daily data, assess whether holidays should be included in your model. If there is no seasonality, ensure that your model is specified without seasonality.
The decomposition graph and table indicate the presence of additive seasonality in the time series, as opposed to multiplicative seasonality. This suggests that the seasonal variations in the data have a constant magnitude throughout the series, rather than growing or shrinking proportionally with the trend.
Section 4
Validate your model using the techniques discussed in Assignment 4, including dividing your sample into a training and test set. You should conduct a rolling window cross-validation to assess performance of the model at meaningful thresholds. Use tables/visualizations as appropriate to assess the model across various metrics (RMSE, MAE, MAPE, etc.). You are encouraged to compare models with different parameters (e.g. additive/multiplicative seasonality, linear vs logistic growth, number of changepoints, etc.).
Finally, assess the performance of the best-performing model on the test set. How does the model perform? Does it seem to overfit the training data? If so, how might you adjust the model to improve performance?
Code
vehicle_train %>%bind_rows(vehicle_test) %>%ggplot()+geom_line(aes(date, vehicle_sales))+geom_vline(aes(xintercept=ymd('2014-05-01')),color='red')+annotate("text", x =as.Date("1976-01-01"), y =max(vehicle_sales_tbl_ts$vehicle_sales) *0.9, label ="Train", color ="blue", size =4) +annotate("text", x =as.Date("2023-11-01"), y =max(vehicle_sales_tbl_ts$vehicle_sales) *0.9, label ="Test", color ="blue", size =4) +labs(x ="Week", y ="vehicle sales", title ="Training and Testing Dataset") +theme_bw()
Code
vehicle_train_data <- vehicle_train %>%stretch_tsibble(.init =22*12, .step =12)vehicle_train_data %>%ggplot()+geom_point(aes(date,factor(.id),color=factor(.id)))+ylab('Iteration')+ggtitle('Samples included in each CV Iteration')
Joining with `by = join_by(date)`
`summarise()` has grouped output by 'months_ahead'. You can override using the
`.groups` argument.
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
# A tibble: 3 × 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima Test -9.43 117. 87.7 -1.61 7.20 0.784 0.794 0.463
2 naive Test -42.5 192. 149. -5.41 12.4 1.33 1.30 0.444
3 prophet Test -25.5 219. 171. -4.15 14.9 1.53 1.48 0.804
While the forecast model effectively captures the variation and trend of the data, it falls short in accurately capturing the magnitude of the data. This suggests that while the model tracks the overall patterns well, it may underestimate the actual values, impacting its performance on the test set. ARIMA model has lowest error rate among all the models when compared with RMSE or MAPE
Code
vehicle_train_model <- vehicle_train %>%model(prophet = fable.prophet::prophet(vehicle_sales~growth()+season(period='year', type ='additive')) )
# Produce forecasts for the training settrain_forecast <- vehicle_train_model %>%forecast(new_data = vehicle_train)# Visualize actual versus forecasted values for the training setautoplot(train_forecast) +autolayer(vehicle_train) +labs(title ="Actual vs Forecasted Values for Training Set", y ="vehicle sales", x ="Date") +theme_minimal()
Plot variable not specified, automatically selected `.vars = vehicle_sales`
When plotting the fitted values against the actual values, it’s apparent that the model exhibits low bias, suggesting a good fit to the training data. However, upon comparing the forecast with the test data, it becomes evident that the test forecast demonstrates low variance, indicating a potential overfitting issue. This discrepancy suggests that while the model captures the underlying patterns well, it may struggle to generalize to unseen data, warranting further investigation and potential adjustments to address the overfitting problem.