5.2.1 Trend and Seasonality
- Amount of Successful Pledged US Dollars
According to the time plot, the amount of successful pledges varies significantly overtime. The correlogram shows a strong weekly seasonality and the subseries plot also shows a slightly increasing trend overtime. The yearly seasonality is not clear. There are several outliers existing in this time series. By using tsclean() function, we can replace the outliers with reasonable values, resulting in small variance.
p1 <- autoplot(ts(df_pledged$x)) + ggtitle("Time Plot of Successful Pledged US Dollars on Kickstarter") +
xlab("Days") + ylab("US Dollars")
p2 <- ggAcf(tsclean(ts(df_pledged$x))) + ggtitle("ACF Plot of Successful Pledged US Dollars")
p3 <- ggsubseriesplot(tsclean(ts(df_pledged$x, frequency = 7))) + ggtitle("Subseries Plot of Successful Pledged US Dollars")
p4 <- autoplot(tsclean(ts(df_pledged$x))) + ggtitle("Time Plot of Successful Pledged US Dollars (with outlier replacements)") +
xlab("Days") + ylab("US Dollars")
grid.arrange(p1, p4, grid.arrange(p2, p3, ncol = 2), nrow = 3)

5.2.2 Decomposition
The decomposition results show an increasing trend. The weekly seasonality changes dramatically and the yearly seasonality changes a little bit. In this case, the weekly and yearly seasonality have relatively narrow ranges, compared to the trend component. The reminders are small, which means the decomposition is good.
pledged <- tsclean(msts(df_pledged$x, seasonal.periods=c(7,365.25)))
autoplot(mstl(pledged)) + xlab("Days") + ggtitle("Decomposition of Successful Pledged US Dollars")

5.2.3 Covariance
It is expected that when the number of successful projects increases, Kickstarter will pledge more money.Same for the number of backers, when there are more supporters, Kickstarters will collect more money.
According to the scatter plot, there is some covariance between the number of successful projects and pledged US Dollars. But the correlation is not very strong. On the other hand, there is a strong correlation between number of backers and pledged amount. Therefore, in the forecasting, we use the number of backers as a predictor.
p1 <- ggplot(df_pledged, aes(x = proj, y = x)) + geom_point() +
xlab("Number of Successful Projects") +
ylab("Pledged US Dollars")
p2 <- ggplot(df_pledged, aes(x = backers, y = x)) + geom_point() +
xlab("Number of Backers") +
ylab("Pledged US Dollars")
grid.arrange(p1, p2, ncol = 2)

5.2.4 Forecast
First, we build a dynamic regression model with number of backers as a predictor. For forecasting, we use the forecasting results of successful projects from the previous section.
According to the forecasting plot, the results show the seasonality. However, the variance is too small compared to the original time series. Because the seasonality is changing overtime. Moreover, the prediction interval is too wide. Further, the residual plots show that there is lots of information unexplained. Thus, the dynamic regression model is not a good fit.
fit1 <- auto.arima(pledged, xreg = df_pledged$backers)
fcast1 <- forecast(fit1, xreg = backers_fc)
autoplot(fcast1, include = length(pledged)-1500) +
ggtitle("Dynamic Regression Model with Predictor") +
xlab("Days") + ylab("Pledged US Dollars")

checkresiduals(fit1)

##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(4,1,1) errors
## Q* = 3605.2, df = 607.4, p-value < 2.2e-16
##
## Model df: 6. Total lags used: 613.4
Given adding a predictor into a forecasting model may not increase the performance, in this section, we build two forecast models using the variable of pledged US Dollars only.
fcast2 <- stlf(pledged)
p1 <- autoplot(fcast2, include = length(successful_proj)-1500) +
xlab("Days") + ylab("Pledged US Dollars")
fit3 <- tbats(subset(pledged, start = length(pledged) - 1500))
fcast3 <- forecast(fit3)
p2 <- autoplot(fcast3, include = length(successful_proj)-1500) +
xlab("Days") + ylab("Pledged US Dollars")
grid.arrange(p1, p2, nrow = 2)

checkresiduals(fcast2)
## Warning in checkresiduals(fcast2): The fitted degrees of freedom is based
## on the model used for the seasonally adjusted data.

##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,N,N)
## Q* = 1428.1, df = 611.4, p-value < 2.2e-16
##
## Model df: 2. Total lags used: 613.4
According to the forecasting plots above, both ETS and TBATS models show the weekly and yearly seasonality, and the forecasts are reasonable. The ETS model has better representative of the original patterns, and the prediction interval is relatively small. The TBATS forecasts are more smooth, but the prediction interval is dramatically large. Moreover, even though the residual plots show that there is information unexplained, especially a yearly seasonality, in the residual, the overall plots meet the requirements. Therefore, the ETS model is a good fit.