Diamond Evaluator

Coursera Data Science - Developing Data Products


21-December-2014

Assignment Background

This application is built to evaluate diamond based on some characteristics selected by the user. The predicted value and a graphical plot of a linear regression is presented in this demonstrative app.

Dataset details

Data from 308 diamond sales have been analysed. The app uses a multivariate linear regression model of the data set to predict diamond prices. The multivariate linear regression model has a R-Squared of 0.95 suggesting 95% of price variability can be explained by the model.

modelFit <- train(price ~ ., data=Diamond, method="glm")
modelFit$results
##   parameter     RMSE  Rsquared   RMSESD  RsquaredSD
## 1      none 773.8633 0.9501985 88.43621 0.009795948

How it works

The prediction is based on this multivariate linear regression previous model. Not surprisingly, the size of the diamond seems to be the major factor that increases the price. Also, as more colour is added to the diamond, the price decreases and the purity of the diamond decreases the price as the purity decreases.

##                     Estimate Std. Error     t value      Pr(>|t|)
## (Intercept)        169.17604   255.0156   0.6633948  5.075958e-01
## carat            12766.39597   190.0244  67.1829167 7.621696e-181
## colourE          -1439.08534   207.9816  -6.9192930  2.828086e-11
## colourF          -1841.69055   195.2316  -9.4333617  1.233886e-18
## colourG          -2176.67219   200.3933 -10.8619998  2.420359e-23
## colourH          -2747.14998   202.9140 -13.5384947  8.256237e-33
## colourI          -3313.10240   212.7145 -15.5753449  2.481556e-40
## clarityVS1       -1474.56615   159.6750  -9.2347968  5.243264e-18
## clarityVS2       -1792.01092   171.1855 -10.4682428  5.145015e-22
## clarityVVS1       -689.29044   159.9250  -4.3100847  2.226508e-05
## clarityVVS2      -1191.16426   148.7582  -8.0073873  2.725972e-14
## certificationHRD    15.22673   107.2475   0.1419775  8.871947e-01
## certificationIGI   141.26245   128.2585   1.1013889  2.716252e-01

Remarks

The model is dependent on Carat size, Colour, and Clarity. These characteristics, especially the colour and clarity are determined by a third party certification process. This application was developed as a demonstration for using elements and techniques learned in the Coursera Developing Data Products class and has no other utility excepting its demonstrative purpose.

Thank you!