This goes in succession of UMM Kaggle : EDA.
datevisitStartTime, visitNumber), (visitStartTime, hits), (visitStartTime, pageviews), and (pageviews, hits) - We need to add the interaction terms of these terms.isTrueDirect = TRUE, the target value gets bigger.hits, visitNumber, and pageviews are rightly skeweddaterevenueSum peaks. In there,
isMobile tends to be FALSEadwordsClickInfo.isVideoAd tends to be TRUEadwordsClickInfo.adNetworkType tends to be 0isTrueDirect effects highlytransactionRevenue#model1 <- train(transactionRevenue~hits+pageviews+visitNumber+visitNumber*visitStartTime, data=newtrain1, preProcess="scale", method="nb")
Let’s try to forecast log transformed daily transactionRevenue with respect of time series. We divided transactionRevenue into .5*1E06 because the mean of that is 1.704272810^{6}; around 1E06.
timeS <- newtrain %>%
group_by(date) %>%
summarise(revenueMean = log1p(mean(transactionRevenue/(1E06)))) %>%
ungroup() %>%
with(zoo(revenueMean, order.by=date))
timeRange <- difftime(max(newtest$date), min(newtest$date)) + 1
target_arima <- auto.arima(timeS)
summary(target_arima)
## Series: timeS
## ARIMA(4,0,3) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 mean
## -0.2547 -0.4445 0.3578 -0.0995 0.6029 0.5490 -0.3260 0.8924
## s.e. 0.3520 0.2285 0.2290 0.0782 0.3492 0.3354 0.3007 0.0249
##
## sigma^2 estimated as 0.1451: log likelihood=-162.27
## AIC=342.54 AICc=343.04 BIC=377.66
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0003557683 0.3766821 0.2882868 -Inf Inf 0.790288
## ACF1
## Training set -0.004423992
forecast(target_arima, h=timeRange) %>%
autoplot() +
theme_minimal()
In this model, the value converges into specific value. This is meaningless result, so we decided to add some regression terms.
We made regression terms with mean values of pageviews, hits, isTrueDirect and adwordsClickInfo.isVideoAd.
pageviewstarget_arima_PV <- auto.arima(timeS, xreg=mean_pageviews_train)
summary(target_arima_PV)
## Series: timeS
## Regression with ARIMA(3,1,2) errors
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 xreg
## 0.7053 -0.1867 -0.0618 -1.3535 0.4445 4.2972
## s.e. 0.2955 0.0954 0.0700 0.2922 0.2539 0.5438
##
## sigma^2 estimated as 0.1203: log likelihood=-129.08
## AIC=272.15 AICc=272.47 BIC=299.45
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.009628873 0.3435698 0.2641214 -Inf Inf 0.7240426
## ACF1
## Training set -0.002529003
newtrain : 0.344hitstarget_arima_HT <- auto.arima(timeS, xreg=mean_hits_train)
summary(target_arima_HT)
## Series: timeS
## Regression with ARIMA(3,1,2) errors
##
## Coefficients:
## ar1 ar2 ar3 ma1 ma2 xreg
## 0.7101 -0.1896 -0.0587 -1.3525 0.4437 3.9559
## s.e. 0.3000 0.0972 0.0703 0.2966 0.2573 0.5028
##
## sigma^2 estimated as 0.1201: log likelihood=-128.76
## AIC=271.52 AICc=271.84 BIC=298.82
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01096295 0.3432809 0.2642414 -Inf Inf 0.7243716
## ACF1
## Training set -0.002888984
newtrain : 0.343isTrueDirecttarget_arima_ITD <- auto.arima(timeS, xreg=mean_isTrueDirect_train)
summary(target_arima_ITD)
## Series: timeS
## Regression with ARIMA(1,0,0) errors
##
## Coefficients:
## ar1 xreg
## 0.3260 2.8899
## s.e. 0.0497 0.0866
##
## sigma^2 estimated as 0.1268: log likelihood=-140.52
## AIC=287.03 AICc=287.1 BIC=298.74
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001884742 0.3551658 0.2695036 -Inf Inf 0.7387969
## ACF1
## Training set 0.002840936
newtrain : 0.355adwordsClickInfo.isVideoAdtarget_arima_VDO <- auto.arima(timeS, xreg=mean_Video_train)
summary(target_arima_VDO)
## Series: timeS
## Regression with ARIMA(4,1,4) errors
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## -0.1930 -0.4152 0.3746 -0.1505 -0.4497 -0.0277 -0.8383 0.4240
## s.e. 0.2162 0.1562 0.1521 0.0906 0.2089 0.0835 0.0858 0.1848
## xreg
## 4.3452
## s.e. 2.3091
##
## sigma^2 estimated as 0.1441: log likelihood=-161
## AIC=341.99 AICc=342.61 BIC=380.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.004220124 0.3743871 0.2850392 -Inf Inf 0.7813851
## ACF1
## Training set -0.004242756
newtrain : 0.374Among above 4 ARIMA models with regression term, models with hits returns best RMSE in newtrain set.
We can see that the shape of hits-added one and pageviews-added one are similar. But, their shapes are little vague for explaining stationarity. The last one, adwordsClickInfo.isVideoAd-added one performs poorly. isTrueDirect-added one’s shape seems quite reasonable, except one peak.
hits ahd isTrueDirectSo, we used a mixed value of hits and isTrueDirect; there mean.
target_arima_INTERACT <- auto.arima(timeS,
xreg=(mean_hits_train+mean_isTrueDirect_train)/2)
summary(target_arima_INTERACT)
## Series: timeS
## Regression with ARIMA(1,0,2) errors
##
## Coefficients:
## ar1 ma1 ma2 intercept xreg
## 0.9328 -0.5821 -0.1668 -2.9901 4.9080
## s.e. 0.0355 0.0655 0.0570 0.4622 0.5757
##
## sigma^2 estimated as 0.1147: log likelihood=-120.73
## AIC=253.46 AICc=253.69 BIC=276.87
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.005025823 0.3363246 0.2545888 -Inf Inf 0.6979108
## ACF1
## Training set 0.005533488
This returns 0.336 as RMSE in newtrain set. It performs better than either hits or isTrueDirect added things.
TSINTERACT <- forecast(target_arima_INTERACT, h=timeRange,
xreg=(mean_hits_test+mean_isTrueDirect_test)/2) %>%
autoplot()
TSINTERACT
It seems to have stationarity also.