Ching Yin Goh
Wed May 04 21:59:22 2016
My application is a calculator for estimating the price of diamond based on these 4Cs criteria:
Dataset 'diamonds' with more than 50k objects is used for the prediction
require(UsingR)
data(diamonds)
summary(diamonds)
carat cut color clarity
Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065
1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258
Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194
Mean :0.7979 Premium :13791 G:11292 VS1 : 8171
3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066
Max. :5.0100 I: 5422 VVS1 : 3655
J: 2808 (Other): 2531
depth table price x
Min. :43.00 Min. :43.00 Min. : 326 Min. : 0.000
1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710
Median :61.80 Median :57.00 Median : 2401 Median : 5.700
Mean :61.75 Mean :57.46 Mean : 3933 Mean : 5.731
3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540
Max. :79.00 Max. :95.00 Max. :18823 Max. :10.740
y z
Min. : 0.000 Min. : 0.000
1st Qu.: 4.720 1st Qu.: 2.910
Median : 5.710 Median : 3.530
Mean : 5.735 Mean : 3.539
3rd Qu.: 6.540 3rd Qu.: 4.040
Max. :58.900 Max. :31.800
Variable importance function shows that 'carat' is the most important predictor of diamond price
library(caret)
fitm <- lm(price ~ ., data = diamonds)
varImp(fitm)
Overall
carat 231.4940348
cut.L 26.0011290
cut.Q 16.7783441
cut.C 9.5609097
cut^4 1.6801098
color.L 112.5698421
color.Q 42.5970601
color.C 11.2247022
color^4 2.8237221
color^5 7.4978145
color^6 4.1731348
clarity.L 135.4137965
clarity.Q 68.1967102
clarity.C 40.6684433
clarity^4 18.9223900
clarity^5 14.8278029
clarity^6 0.5018915
clarity^7 7.4887321
depth 14.0710870
table 9.0924516
x 30.6483316
y 0.4970226
z 1.4966983