Prediction of Diamonds Prices

Nille30
04/19/2020

Summary

This presentation contains documentation for the Diamond price prediction application. You can find the application here

This application builds a linear regression model using the diamonds data set and predicts the price of a diamond based on its characteristics. The application allows the following selection:

  • Weight in Carat
  • Cut of the Diamond
  • Color of the Diamond
  • Clarity of the Diamond

to build a plot and give a predicted price of the diamond.

Data Set Used

The data used for this application is diamonds data set, which is part of ggplot2 package. This data set contains the information about 53940 diamonds with 10 variables:

     carat               cut        color        clarity     
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066  
 Max.   :5.0100                     I: 5422   VVS1   : 3655  
                                    J: 2808   (Other): 2531  
     depth           table           price             x         
 Min.   :43.00   Min.   :43.00   Min.   :  326   Min.   : 0.000  
 1st Qu.:61.00   1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710  
 Median :61.80   Median :57.00   Median : 2401   Median : 5.700  
 Mean   :61.75   Mean   :57.46   Mean   : 3933   Mean   : 5.731  
 3rd Qu.:62.50   3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540  
 Max.   :79.00   Max.   :95.00   Max.   :18823   Max.   :10.740  

       y                z         
 Min.   : 0.000   Min.   : 0.000  
 1st Qu.: 4.720   1st Qu.: 2.910  
 Median : 5.710   Median : 3.530  
 Mean   : 5.735   Mean   : 3.539  
 3rd Qu.: 6.540   3rd Qu.: 4.040  
 Max.   :58.900   Max.   :31.800  

Files

The application is build using the Shiny package and the source code is stored in two files:

  • ui.R
  • server.R

You can find both files here: GitHub repository

Application functionality (1/2)

The application draws a plot of diamonds in the diamonds data set distributed by their weight (carat) and price ($). There is also an linear regression line shown in the plot.

By selecting specific characteristics of the diamond (carat, cut, clarity, color) the user is able to select the sub data set and the regression is recalculated based on the diamonds in the data set which share the same characteristics. If there are no characteristics selected the regression model uses all diamonds in the data set.

After the selection, the predict price of the diamond is shown below the graph.

Application functionality (2/2)

plot of chunk unnamed-chunk-3