Explore how a diamonds features affect diamond price.

Diamond Features

Feature Summary

##      carat               cut        color        clarity     
##  Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
##  1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
##  Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
##  Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
##  3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
##  Max.   :5.0100                     I: 5422   VVS1   : 3655  
##                                     J: 2808   (Other): 2531  
##      depth           table           price             x         
##  Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
##  1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
##  Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
##  Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
##  3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
##  Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  
##                                                                  
##        y                z         
##  Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 4.720   1st Qu.: 2.910  
##  Median : 5.710   Median : 3.530  
##  Mean   : 5.735   Mean   : 3.539  
##  3rd Qu.: 6.540   3rd Qu.: 4.040  
##  Max.   :58.900   Max.   :31.800  
## 

Price Exploration

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326     950    2401    3933    5324   18823

Price by Cut Exploration

## diamonds$cut: Fair
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     337    2050    3282    4359    5206   18574 
## -------------------------------------------------------- 
## diamonds$cut: Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     327    1145    3050    3929    5028   18788 
## -------------------------------------------------------- 
## diamonds$cut: Very Good
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     336     912    2648    3982    5373   18818 
## -------------------------------------------------------- 
## diamonds$cut: Premium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326    1046    3185    4584    6296   18823 
## -------------------------------------------------------- 
## diamonds$cut: Ideal
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326     878    1810    3458    4678   18806

Price By Carat Exploration

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1051    2478    3495    4008    4950   17829

Price By Color Exploration

## diamonds$color: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     357     911    1838    3170    4214   18693 
## -------------------------------------------------------- 
## diamonds$color: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     326     882    1739    3077    4003   18731 
## -------------------------------------------------------- 
## diamonds$color: F
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     342     982    2344    3725    4868   18791 
## -------------------------------------------------------- 
## diamonds$color: G
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     354     931    2242    3999    6048   18818 
## -------------------------------------------------------- 
## diamonds$color: H
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     337     984    3460    4487    5980   18803 
## -------------------------------------------------------- 
## diamonds$color: I
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     334    1120    3730    5092    7202   18823 
## -------------------------------------------------------- 
## diamonds$color: J
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     335    1860    4234    5324    7695   18710

Price By Clarity Exploration

Price By x, y, z (Volume) Exploration

## 
##  Pearson's product-moment correlation
## 
## data:  price and x
## t = 440.16, df = 53938, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8825835 0.8862594
## sample estimates:
##       cor 
## 0.8844352
## 
##  Pearson's product-moment correlation
## 
## data:  price and y
## t = 401.14, df = 53938, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8632867 0.8675241
## sample estimates:
##       cor 
## 0.8654209
## 
##  Pearson's product-moment correlation
## 
## data:  price and z
## t = 393.6, df = 53938, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8590541 0.8634131
## sample estimates:
##       cor 
## 0.8612494

Correlation - Price Vs. Volume

## 
##  Pearson's product-moment correlation
## 
## data:  volume_df$price and volume_df$volume
## t = 485.41, df = 53930, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9004947 0.9036387
## sample estimates:
##       cor 
## 0.9020786

Price By Depth Exploration

## 
##  Pearson's product-moment correlation
## 
## data:  price and depth
## t = -2.473, df = 53938, p-value = 0.0134
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.019084756 -0.002208537
## sample estimates:
##        cor 
## -0.0106474

Price by Table Exploration

Building the Linear Model for Price

## 
## Calls:
## m1: lm(formula = I(log(price)) ~ I(carat^(1/3)), data = diamonds)
## m2: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat, data = diamonds)
## m3: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut, data = diamonds)
## m4: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color, 
##     data = diamonds)
## m5: lm(formula = I(log(price)) ~ I(carat^(1/3)) + carat + cut + color + 
##     clarity, data = diamonds)
## 
## =============================================================================
##                       m1          m2          m3         m4          m5      
## -----------------------------------------------------------------------------
##   (Intercept)      2.821***    1.039***    0.874***    0.932***   0.415***   
##                   (0.006)     (0.019)     (0.019)     (0.017)    (0.010)     
##   I(carat^(1/3))   5.558***    8.568***    8.703***    8.438***   9.144***   
##                   (0.007)     (0.032)     (0.031)     (0.028)    (0.016)     
##   carat                       -1.137***   -1.163***   -0.992***  -1.093***   
##                               (0.012)     (0.011)     (0.010)    (0.006)     
##   cut: .L                                  0.224***    0.224***   0.120***   
##                                           (0.004)     (0.004)    (0.002)     
##   cut: .Q                                 -0.062***   -0.062***  -0.031***   
##                                           (0.004)     (0.003)    (0.002)     
##   cut: .C                                  0.051***    0.052***   0.014***   
##                                           (0.003)     (0.003)    (0.002)     
##   cut: ^4                                  0.018***    0.018***  -0.002      
##                                           (0.003)     (0.002)    (0.001)     
##   color: .L                                           -0.373***  -0.441***   
##                                                       (0.003)    (0.002)     
##   color: .Q                                           -0.129***  -0.093***   
##                                                       (0.003)    (0.002)     
##   color: .C                                            0.001     -0.013***   
##                                                       (0.003)    (0.002)     
##   color: ^4                                            0.029***   0.012***   
##                                                       (0.003)    (0.002)     
##   color: ^5                                           -0.016***  -0.003*     
##                                                       (0.003)    (0.001)     
##   color: ^6                                           -0.023***   0.001      
##                                                       (0.002)    (0.001)     
##   clarity: .L                                                     0.907***   
##                                                                  (0.003)     
##   clarity: .Q                                                    -0.240***   
##                                                                  (0.003)     
##   clarity: .C                                                     0.131***   
##                                                                  (0.003)     
##   clarity: ^4                                                    -0.063***   
##                                                                  (0.002)     
##   clarity: ^5                                                     0.026***   
##                                                                  (0.002)     
##   clarity: ^6                                                    -0.002      
##                                                                  (0.002)     
##   clarity: ^7                                                     0.032***   
##                                                                  (0.001)     
## -----------------------------------------------------------------------------
##   R-squared            0.924       0.935       0.939      0.951       0.984  
##   adj. R-squared       0.924       0.935       0.939      0.951       0.984  
##   sigma                0.280       0.259       0.250      0.224       0.129  
##   F               652012.063  387489.366  138654.523  87959.467  173791.084  
##   p                    0.000       0.000       0.000      0.000       0.000  
##   Log-likelihood   -7962.499   -3631.319   -1837.416   4235.240   34091.272  
##   Deviance          4242.831    3613.360    3380.837   2699.212     892.214  
##   AIC              15930.999    7270.637    3690.832  -8442.481  -68140.544  
##   BIC              15957.685    7306.220    3761.997  -8317.942  -67953.736  
##   N                53940       53940       53940      53940       53940      
## =============================================================================

Prediction

Diamond Features:

  • Carat = 1.00
  • Cut = “Very Good”
  • Color = “I”
  • Clarity=“VS1”

Price Perdiction:

##        fit      lwr      upr
## 1 4537.822 3526.389 5839.352