Last Updated - 2016-05-22
I am doing this for a friend who ask me to forecast COE prices. There is no way to predict ACUTAL COE prices but what we want to achieve is an equation that plugs in the variables and then determine the forecasted COE prices. A good model is when the equation determine the projected COE price should NOT deviate much from actual COE price.
Both Premium and PQP can be a response, so what are the independent variables that them ? We will perform 2 algoritm and Linear tests for PREMIUM and PQP for new bids and renewal COE respectively. The data is for Cat A car from year 2010 to year 2016 latest.
Data source is from here.
## Warning in TentativeRoughFix(boruta.train): There are no Tentative attributes! Returning original
## object.
## meanImp medianImp minImp maxImp normHits decision
## PQP 48.05860 47.99839 46.41306 49.54351 1 Confirmed
## QUOTA 22.06913 22.28534 20.33577 23.09287 1 Confirmed
## BIDS 16.38491 16.18743 15.63004 17.28816 1 Confirmed
Blue boxplots correspond to minimal, average and maximum Z score of a shadow attribute.
Red represent Z scores of rejected attributes.
yellow represent Z scores of tentative attributes.
Green represent Z scores of confirmed attributes.
##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
## Variables RMSE Rsquared RMSESD RsquaredSD Selected
## 1 4899 0.9060 1969.9 0.06821
## 2 4466 0.9235 923.5 0.03214 *
## 3 4546 0.9247 1049.5 0.03222
##
## The top 2 variables (out of 2):
## PQP, QUOTA
## [1] "PQP" "QUOTA"
fit1 <- lm(PREMIUM ~ PQP, data=traindata)
summary(fit1)
##
## Call:
## lm(formula = PREMIUM ~ PQP, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17988.2 -3504.7 -696.1 4007.8 17866.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.306e+03 1.768e+03 4.133 5.95e-05 ***
## PQP 8.827e-01 3.008e-02 29.348 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6169 on 149 degrees of freedom
## Multiple R-squared: 0.8525, Adjusted R-squared: 0.8515
## F-statistic: 861.3 on 1 and 149 DF, p-value: < 2.2e-16
25 E-01 = 25 * 10^(-1) = 2.5
25E2 = 25 * 10^2 = 2500
7.8E-02 = 7.8 * 10^(-2) = 0.078
7.8E3 = 7.8 * 10^3 = 7800
# interested in the 1st row of traindata, at column 5 which is PQP
newdata <- data.frame(PQP = traindata[1,5]) # wrap the parameter
results_premium <- predict(fit1, newdata) # apply predict
## Warning in TentativeRoughFix(boruta.train): There are no Tentative attributes! Returning original
## object.
## meanImp medianImp minImp maxImp normHits decision
## PREMIUM 46.62102 46.65102 45.13697 48.34065 1 Confirmed
## QUOTA 23.61932 23.40635 22.37639 25.05192 1 Confirmed
## BIDS 14.66363 14.54352 13.85459 16.07888 1 Confirmed
##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
## Variables RMSE Rsquared RMSESD RsquaredSD Selected
## 1 6765 0.8371 1355.2 0.06955
## 2 4713 0.9246 877.5 0.03012 *
## 3 5065 0.9151 980.1 0.03533
##
## The top 2 variables (out of 2):
## PREMIUM, QUOTA
## [1] "PREMIUM" "QUOTA"
##
## Call:
## lm(formula = PQP ~ . - YEAR - MONTH - ROUND, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13701.3 -3470.7 -458.4 3116.1 21357.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.549e+03 2.437e+03 -1.046 0.2973
## PREMIUM 9.880e-01 3.416e-02 28.927 <2e-16 ***
## QUOTA -2.299e+00 3.148e+00 -0.730 0.4663
## BIDS 3.627e+00 2.013e+00 1.802 0.0737 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6299 on 147 degrees of freedom
## Multiple R-squared: 0.8614, Adjusted R-squared: 0.8585
## F-statistic: 304.4 on 3 and 147 DF, p-value: < 2.2e-16
# interested in the 1st row of traindata, at column 4 which is PREMIUM
newdata <- data.frame(PREMIUM = traindata[1,4]) # wrap the parameter
results_pqp <- predict(fit4, newdata) # apply predict