Diamond Price Estimator

Ching Yin Goh
Wed May 04 21:59:22 2016

Executive Summary

My application is a calculator for estimating the price of diamond based on these 4Cs criteria:

  • Carats
  • Color
  • Clarity
  • Cut

Dataset 'diamonds' with more than 50k objects is used for the prediction

Exploratory Analysis

require(UsingR)
data(diamonds)
summary(diamonds)
     carat               cut        color        clarity     
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
 Max.   :5.0100                     I: 5422   VVS1   : 3655  
                                    J: 2808   (Other): 2531  
     depth           table           price             x         
 Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
 1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
 Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
 Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
 3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
 Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  

       y                z         
 Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.720   1st Qu.: 2.910  
 Median : 5.710   Median : 3.530  
 Mean   : 5.735   Mean   : 3.539  
 3rd Qu.: 6.540   3rd Qu.: 4.040  
 Max.   :58.900   Max.   :31.800  

Predict Using Linear Regression Model

plot of chunk unnamed-chunk-1

Find Important Variables

Variable importance function shows that 'carat' is the most important predictor of diamond price

library(caret)
fitm <- lm(price ~ ., data = diamonds)
varImp(fitm)
              Overall
carat     231.4940348
cut.L      26.0011290
cut.Q      16.7783441
cut.C       9.5609097
cut^4       1.6801098
color.L   112.5698421
color.Q    42.5970601
color.C    11.2247022
color^4     2.8237221
color^5     7.4978145
color^6     4.1731348
clarity.L 135.4137965
clarity.Q  68.1967102
clarity.C  40.6684433
clarity^4  18.9223900
clarity^5  14.8278029
clarity^6   0.5018915
clarity^7   7.4887321
depth      14.0710870
table       9.0924516
x          30.6483316
y           0.4970226
z           1.4966983