Last Updated - 2016-06-15
The previous April model managed to have an close approximate of April actual premium 47300, with differences of 680. The model will now add in new data and attempt to forecast the May 2016 COE Premium Category A.
## Warning in TentativeRoughFix(boruta.train): There are no Tentative attributes! Returning original
## object.
## [1] "PQP" "QUOTA"
We will create some linear regression model equations to forecast PREMIUM based on these variables.
##
## Call:
## lm(formula = PREMIUM ~ PQP + QUOTA, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18975.7 -3372.9 -460.4 3720.6 17013.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.186e+04 2.056e+03 5.770 4.44e-08 ***
## PQP 8.559e-01 2.949e-02 29.027 < 2e-16 ***
## QUOTA -4.067e+00 1.053e+00 -3.861 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5882 on 149 degrees of freedom
## Multiple R-squared: 0.8663, Adjusted R-squared: 0.8645
## F-statistic: 482.5 on 2 and 149 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = PREMIUM ~ PQP, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17987.8 -3462.3 -681.6 4009.0 17867.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.292e+03 1.757e+03 4.15 5.56e-05 ***
## PQP 8.829e-01 2.994e-02 29.49 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6149 on 150 degrees of freedom
## Multiple R-squared: 0.8529, Adjusted R-squared: 0.8519
## F-statistic: 869.5 on 1 and 150 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = PREMIUM ~ PQP + BIDS, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18094.8 -3543.7 -75.7 3704.4 16564.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.165e+04 1.991e+03 5.851 3.00e-08 ***
## PQP 8.625e-01 2.896e-02 29.782 < 2e-16 ***
## BIDS -2.713e+00 6.713e-01 -4.041 8.49e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5857 on 149 degrees of freedom
## Multiple R-squared: 0.8674, Adjusted R-squared: 0.8656
## F-statistic: 487.4 on 2 and 149 DF, p-value: < 2.2e-16
The PQP is a 3 month moving average and it is a known number published here which is 45578 for May 2016 Category A. I will plug this value into the 3 equations.
By statistical approach, we should always pick the equations with the highest R-squared value. However, the reason i came out with 3 figures is to attempt to mirror the equation used by the government agency for its actual COE figures. At the same time, i also wish to approximate the Linear Regression equation used. Below i attempt to extend the linear regression to Quadratic and Polynomial functions to cover as many data points as possible and uses the function for prediction.
I made assumption that when a person bid for a NEW COE, he will rely on last information like current COE, PQP and Quota provided by government agency. The person will not know the population that is bidding for COE. Therefore in my own opinion, the person is unlikely to consider equation with PQP + BIDS coefficients. Therefore the last equation is of no use, but from maths approach, i need to consider all situations.
.