Cvetan Veljanovski
February 22, 2021
In this presentation you can find the information regarding the Diamond price prediction application. The application can be found on the following link staj tvoj link
This application builds a linear regression model using the diamonds
dataset and is predicting the price of the diamond depending of its properties, such as:
Then it builds a plot and gives the predicted price of the diamond.
The data used for this application is the diamonds
dataset, which is part of the ggplot2
package.
This dataset contains the information of arround 53940 diamonds with 10 different variables:
carat cut color clarity depth
Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. :43.00
1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00
Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80
Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean :61.75
3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066 3rd Qu.:62.50
Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00
J: 2808 (Other): 2531
table price x y
Min. :43.00 Min. : 326 Min. : 0.000 Min. : 0.000
1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710 1st Qu.: 4.720
Median :57.00 Median : 2401 Median : 5.700 Median : 5.710
Mean :57.46 Mean : 3933 Mean : 5.731 Mean : 5.735
3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540 3rd Qu.: 6.540
Max. :95.00 Max. :18823 Max. :10.740 Max. :58.900
z
Min. : 0.000
1st Qu.: 2.910
Median : 3.530
Mean : 3.539
3rd Qu.: 4.040
Max. :31.800
The application is build using the Shiny package and the source code of the application is written in 2 files:
ui.R
server.R
The files can be found on the following link
The application is drawing plot of diamonds in the diamonds
dataset distributed by their size (carat) and price ($). The regression line is shown on the plot as well in blue color.
By selecting specific features of the diamond (carat, cut, clarity, color) the user is able to sub select the dataset and the regression line is recalculated based only on the diamonds filtered in the dataset that share the same features. If no features are selected the regression model is using all diamonds in the data set.