SF Zillow Weekly Sales Forecasting Assignment 3

Section 1

Initial Prophet Model

Code
suppressMessages({
  model = prophet(sales_train)
  future = make_future_dataframe(model,periods = 27, freq = 'weeks')
  forecast = predict(model,future)

  plot(model,forecast)+
  ggtitle("Default Facebook Prophet Model")+
  ylab("SF Housing Sales Prices")+xlab("Date")+theme_bw()
})

Interactive Visualization

Code
dyplot.prophet(model,forecast)

Prophet Model Methodology

Facebook Prophet is an open-source algorithm for generating time-series models. It is particularly good at modeling time series that have multiple seasonality and doesn’t face some of the above drawbacks of other algorithms. At its core is the sum of three functions of time plus an error term: growth g(t), seasonality s(t), holidays h(t) , and error e_t

  1. The Growth Function(Change Points)

    • Models the overall trend of the data.
    • Trend can be present at all points in the data and can be altered.
    • New concept of Changepoints was added by prophet model.
    • Changepoints are the moments in the data where the data shifts direction.
  2. The Seasonality Function

    • Fourier Series as a function of time
    • Prophet will automatically detect the Fourier order/manually also can be added.
    • Seasonal component can be daily, weekly, yearly e.t.c
  3. The Holiday/Event Function

    The holiday function allows Facebook Prophet to adjust forecasting when a holiday or major event may change the forecast. It takes a list of dates and when each date is present in the forecast adds or subtracts value from the forecast from the growth and seasonality terms based on historical data on the identified holiday dates.

Section 2

Prophet Decomposition

Code
prophet_plot_components(model,forecast)

The sales price has exhibited a steady upward trend since mid-2021, which is mainly attributed to the post-Covid-19 ease in the housing market. Additionally, there appears to be a seasonal pattern in the sales price, with prices rising during the summer months, specifically in June, and declining during the fall and end of the year.

Automated Changepoints

Code
plot(model,forecast)+add_changepoints_to_plot(model)+theme_bw()+xlab("Date")+ylab("SF Housing Sales Prices")

Default change points are not creating the appropriate model. We can try changing the hyperparameters and check the plot.

Multiple Changepoints

By adjusting the hyper parameters and trying multiple combinations we have got a realistic fitted line.

Code
suppressMessages({
model_new = prophet(sales_train,changepoint.range = 0.90,n.changepoints=15,changepoint.prior.scale=0.1)

forecast1 = predict(model_new,future)

plot(model_new,forecast1)+ add_changepoints_to_plot(model_new)+theme_bw()+xlab("Date")+ylab("SF Housing Sales Prices")
})

Decomposing the new model and comparing it with the original model. We can observe that the yearly seasonality remains the same and trend is also close to original model.

Code
prophet_plot_components(model_new,forecast1)

Saturating Minimum/Maximum point

Defining the min and max point help the users help check the wild values in the plot and for will be very useful in stock price forecasting when the decision(sell/buy) need to be taken on price value. In our case its not very revelant.

Code
suppressMessages({
sixmnths_future_df = make_future_dataframe(model,periods = 27)
sixmnths_forecast_df = predict(model,sixmnths_future_df)

# Set "floor" in training data
sales_train$floor = 850000
sales_train$cap = 1500000
future$floor = 850000
future$cap = 1200000

# Set floor in forecsat data
sixmnths_future_df$floor = 850000
sixmnths_future_df$cap = 1200000
sat_model = prophet(sales_train,growth='logistic')

sat_six_mnths_forecast = predict(sat_model,sixmnths_future_df)

plot(sat_model,sat_six_mnths_forecast)+ylim(850000,1200000)+theme_bw()+xlab("Date")+ylab("SF Housing Price")
})

Section 3

Seasonality Identification

Code
prophet_plot_components(model,forecast)

The sales price has exhibited a steady upward trend since mid-2021, which is mainly attributed to the post-Covid-19 ease in the housing market. Additionally, there appears to be a seasonal pattern in the sales price, with prices rising during the summer months, specifically in June, and declining during the fall and end of the year.

From the plot we can infer that it shows multiplicative seasonality.The amplitude of the seasonal fluctuations changes over time, the seasonality is multiplicative.

Magnitude of Fluctuations

Below code will illustrate that the seasonality is multiplicative. We can see that the graphs plotted are similar.

Code
suppressMessages({
multi_model = prophet(sales_train,seasonality.mode = 'multiplicative')
multi_fcst = predict(multi_model,future)

plot(multi_model,multi_fcst)+ ylim(850000,1200000)

prophet_plot_components(multi_model, multi_fcst)
})

Holiday Impact

As its San Francisco sales data, lets check it by using the US holiday calender with the model.

Code
suppressMessages({
model = prophet(sales_train,fit=FALSE)
model = add_country_holidays(model,country_name = 'US')
model = fit.prophet(model,sales_train)
forecast2 = predict(model,future)
prophet_plot_components(model,forecast2)
})

We don’t see much holiday impact in the plot, couple of towers are present but its not something that is present in all years. We should not consider the holiday component for our time series.

Section 4

Cross Validation

Code
df.cv <- cross_validation(model, initial = 1000, period = 180, horizon = 180, units = 'days')
Making 3 forecasts with cutoffs between 2021-01-10 and 2022-01-05
Code
head(df.cv)
# A tibble: 6 × 6
       y ds                      yhat yhat_lower yhat_upper cutoff             
   <dbl> <dttm>                 <dbl>      <dbl>      <dbl> <dttm>             
1 993236 2021-01-11 00:00:00  993947.    973338.   1016181. 2021-01-10 00:00:00
2 994000 2021-01-18 00:00:00 1002823.    981728.   1023199. 2021-01-10 00:00:00
3 994000 2021-01-25 00:00:00 1007902.    985763.   1029551. 2021-01-10 00:00:00
4 996500 2021-02-01 00:00:00 1023878.   1002269.   1045932. 2021-01-10 00:00:00
5 996500 2021-02-08 00:00:00 1041166.   1019004.   1062896. 2021-01-10 00:00:00
6 994375 2021-02-15 00:00:00 1053745.   1031116.   1075358. 2021-01-10 00:00:00

We have done cross validation to check the prediction performance on the horizon of 180 days, starting with 1000 days and first cut at 180 days.

Performance Metric

Code
df.p <- performance_metrics(df.cv)
head(df.p)
  horizon        mse     rmse      mae       mape       mdape      smape
1 15 days  415540829 20384.82 15242.41 0.01554442 0.009684152 0.01570656
2 17 days  839639351 28976.53 22925.15 0.02326756 0.013985943 0.02365010
3 19 days  891404068 29856.39 23754.56 0.02450393 0.013985943 0.02492794
4 22 days  988194621 31435.56 26453.45 0.02704532 0.027473832 0.02742280
5 24 days 1524492495 39044.75 34036.21 0.03471883 0.037035648 0.03539079
6 26 days 1511706869 38880.67 33881.00 0.03493991 0.037035648 0.03562161
   coverage
1 0.7142857
2 0.5714286
3 0.5714286
4 0.4285714
5 0.2857143
6 0.2857143

These metrics can be used to evaluate the performance of a Prophet model in a cross-validation context. For example, rmse and mae are commonly used to evaluate the accuracy of point forecasts, while mape is used to evaluate the accuracy of percentage error forecasts.

Checking the Accuracy of the model

Code
plot_cross_validation_metric(df.cv, metric = 'mape')

The model is showing > 90% accuracy for 1st 100 days and after that its gradually increasing.

Model Comparision

Code
suppressMessages({
mod1 = prophet(sales_train,seasonality.mode='additive')
})
forecast1 = predict(mod1)

df_cv1 <- cross_validation(mod1, initial = 1000, period = 180, horizon = 180, units = 'days')
Making 3 forecasts with cutoffs between 2021-01-10 and 2022-01-05
Code
metrics1 = performance_metrics(df_cv1) %>% 
mutate(model = 'mod1')

suppressMessages({
mod2 = prophet(sales_train,seasonality.mode='multiplicative')
})
forecast2 = predict(mod2)

df_cv2 <- cross_validation(mod2, initial = 1000, period = 180, horizon = 180, units = 'days')
Making 3 forecasts with cutoffs between 2021-01-10 and 2022-01-05
Code
metrics2 = performance_metrics(df_cv2) %>% 
  mutate(model = "mod2")

metrics1 %>% 
  bind_rows(metrics2) %>% 
  ggplot()+
  geom_line(aes(horizon,rmse,color=model))
Don't know how to automatically pick scale for object of type <difftime>.
Defaulting to continuous.

Final Forecast

Code
suppressMessages({
model = prophet(sales_tbl_ts)

future = make_future_dataframe(model, periods = 6, freq = 'weeks')

forecast = predict(model,future) # Get forecast


  plot(model,forecast) +
  ylab("SF Housing Prices")+xlab("Date")+theme_bw()
})