Last Updated - 2016-07-20

Introduction

The previous July model round 1 predicted model of 55013 (using H20).

Actual July Round 1 Actual Premium 52301 , with differences of 2712.

The model will now add in new data and attempt to forecast the July 2016 COE Premium Category A Round 2.

Data source is from here.

Determine COE Premium for NEW vehicle bid

## Warning in TentativeRoughFix(boruta.train): There are no Tentative attributes! Returning original
## object.
##          meanImp medianImp    minImp    maxImp normHits  decision
## PQP   40.2373159  40.49373 37.852762 42.456425        1 Confirmed
## QUOTA 20.2489458  20.40463 18.702087 21.346293        1 Confirmed
## BIDS  16.0881681  15.99586 15.192990 17.227743        1 Confirmed
## DIFF  -0.8036625  -1.23619 -1.993202  1.905156        0  Rejected
## Warning: package 'plyr' was built under R version 3.3.1

## [1] "PQP"   "QUOTA"

We will create some linear regression model equations to forecast PREMIUM based on these variables.

## 
## Call:
## lm(formula = PREMIUM ~ PQP + QUOTA, data = traindata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18651.3  -3518.4   -502.2   3800.7  17251.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.088e+04  2.054e+03   5.298 3.99e-07 ***
## PQP          8.580e-01  2.988e-02  28.714  < 2e-16 ***
## QUOTA       -2.638e+00  9.347e-01  -2.822   0.0054 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5964 on 154 degrees of freedom
## Multiple R-squared:  0.8587, Adjusted R-squared:  0.8568 
## F-statistic: 467.8 on 2 and 154 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = PREMIUM ~ PQP, data = traindata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17996.2  -3390.7   -595.4   3895.6  17841.6 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.566e+03  1.722e+03   4.393 2.06e-05 ***
## PQP         8.796e-01  2.952e-02  29.796  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6097 on 155 degrees of freedom
## Multiple R-squared:  0.8514, Adjusted R-squared:  0.8504 
## F-statistic: 887.8 on 1 and 155 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = PREMIUM ~ PQP + QUOTA + BIDS + DIFF, data = traindata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -17967.7  -3125.2     54.8   3452.2  15910.1 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7663.90013 1887.91528   4.059 7.85e-05 ***
## PQP            0.90748    0.02773  32.723  < 2e-16 ***
## QUOTA          1.71157    2.57356   0.665    0.507    
## BIDS          -2.58791    1.59498  -1.623    0.107    
## DIFF           0.73555    0.11128   6.610 6.13e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5290 on 152 degrees of freedom
## Multiple R-squared:  0.8903, Adjusted R-squared:  0.8874 
## F-statistic: 308.3 on 4 and 152 DF,  p-value: < 2.2e-16

The PQP is a 3 month moving average and it is 49519 for JULY 2016 shown here http://www.aas.com.sg/?show=content&showview=12&val=175

Adjusted R Square is 85.68% with PQP + Quota coefficients. Predicted COE Premium is 47505.

* Adjusted R Square is 85.04% with PQP coefficient only. Predicted COE Premium is 51124.

* Adjusted R Square is 88.74% with all coefficients . Predicted COE Premium is 52601.

Using H2O algorithm with GBM (For data-scientist only )

library(h2o)
## Warning: package 'h2o' was built under R version 3.3.1
## Loading required package: statmod
## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit http://docs.h2o.ai
## 
## ----------------------------------------------------------------------
## 
## Attaching package: 'h2o'
## The following objects are masked from 'package:stats':
## 
##     cor, sd, var
## The following objects are masked from 'package:base':
## 
##     %*%, %in%, &&, ||, apply, as.factor, as.numeric, colnames, colnames<-, ifelse,
##     is.character, is.factor, is.numeric, log, log10, log1p, log2, round, signif, trunc
localH2O <- h2o.init(nthreads = -1)
## 
## H2O is not running yet, starting it now...
## 
## Note:  In case of errors look at the following log files:
##     C:\Users\admin\AppData\Local\Temp\RtmpG0kj4t/h2o_admin_started_from_r.out
##     C:\Users\admin\AppData\Local\Temp\RtmpG0kj4t/h2o_admin_started_from_r.err
## 
## 
## Starting H2O JVM and connecting:  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 405 milliseconds 
##     H2O cluster version:        3.8.3.3 
##     H2O cluster name:           H2O_started_from_R_admin_sfd121 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   7.10 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     R Version:                  R version 3.3.0 (2016-05-03)
h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         1 seconds 655 milliseconds 
##     H2O cluster version:        3.8.3.3 
##     H2O cluster name:           H2O_started_from_R_admin_sfd121 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   7.10 GB 
##     H2O cluster total cores:    8 
##     H2O cluster allowed cores:  8 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     R Version:                  R version 3.3.0 (2016-05-03)
#split data into datafame
samp <- sample(nrow(traindata), 0.7 * nrow(traindata))
training <- traindata[samp, ]
testing <- traindata[-samp, ]


#convert to H2O frame
train.h2o <- as.h2o(traindata); test.h2o  <- as.h2o(testing)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%
### values below for columns
y.dep <- 4 #interested in PREMIUM COLUMNS
x.indep <- c(5:8) # use all varibles COLUMNS from PQP + BIDS  + QUOTA + DIFF

#GBM

gbm.model <- h2o.gbm(y=y.dep, x=x.indep, training_frame = train.h2o, ntrees = 1000, max_depth = 4, learn_rate = 0.01, seed = 1122)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================                                                                |  29%
  |                                                                                                
  |=====================================================                                     |  59%
  |                                                                                                
  |========================================================================                  |  80%
  |                                                                                                
  |==========================================================================================| 100%
h2o.varimp(gbm.model)
## Variable Importances: 
##   variable  relative_importance scaled_importance percentage
## 1      PQP 1629809147904.000000          1.000000   0.854395
## 2     BIDS  174556348416.000000          0.107102   0.091508
## 3    QUOTA   56772468736.000000          0.034834   0.029762
## 4     DIFF   46420484096.000000          0.028482   0.024335
#h2o.performance(gbm.model)

# predict against test data
#predict.gbm <- as.data.frame(h2o.predict(gbm.model, test.h2o))


###############################################################
# i want to put in my figures to predict, so i put in PQP
##############################################################

mypqpdata <- data.frame(PQP=49519)

#convert to h20 frame
result_premium <- as.h2o(mypqpdata)
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%
predict.gbm <- as.data.frame(h2o.predict(gbm.model, result_premium))
## 
  |                                                                                                
  |                                                                                          |   0%
  |                                                                                                
  |==========================================================================================| 100%

* Adjusted R Square is 97.94% Predicted COE Premium is 53092.