Diamond Price Predictor

Shiny Application for Predicting Diamond Prices (Project for Developing Data Products)

Sarajit Poddar
Data Science Specialization (Coursera)

Objective

  1. The Goal here is to present about a Shiny Application that can be used to predict diamond prices having different specifications.
  2. The Application uses the Diamonds dataset supplied with the ggplot2 library.
  3. There are two parts of the Application

    a. Part 1: This allows the user to select a subset of the large Diamonds 
    dataset based on Price range and number of observations.
    
    b. Part 2: This allows the user to specify parameters such as Carat, 
    Clarity, Cut etc. based on which the price can be predicted.
    

Part 1(a): Subsetting the data

  1. User is asked to provide a price range, based on which the data is filtered
  2. Based on the sampled data, plots are created to visualize price movement
library(ggplot2)
data(diamonds)
# Lets say that input price range is 1000 to 5000 and the number of obs is 500
input.pricerange.low <- 1000; input.pricerange.high <- 5000; input.obs=500
# Subsetting sampling the data
data.sample <- subset(diamonds, price >= input.pricerange.low & 
                              price <= input.pricerange.high) 
data.sample <- data.sample[sample(1:nrow(data.sample), input.obs,
                                  replace=FALSE),]
# Say following are the diamond specification captured for Prediction
input.carat = 1; input.cut = "Fair" ; input.color = "D" ; input.clarity = "VS2"; 
input.depth = 50 ; input.table = 60; input.x= 5

Part 1(b): Plotting the data

  1. There are 3 plots generated based on the sampled Data
  2. The plots are shown in 3 different tabs for ease of navigation.
  3. Here is one of the 3 plots which shows the variance of price based on the Carats plot of chunk unnamed-chunk-2

Part 2: Predicting the Diamond Price

  1. A linear regression model is developed using
  2. The linear moded uses the sample of the entire dataset for predicting the price
  3. For best prediction, a representative price range should be selected by the user
fitted.model   <- lm(price ~ carat + cut + color + clarity + depth + table + x, 
                     data = data.sample)
predict.df <- data.frame(carat=input.carat, 
                         cut=input.cut,
                         color=input.color,
                         clarity=input.clarity,
                         depth=input.depth,
                         table=input.table,
                         x=input.x)
price <- round(predict(fitted.model, predict.df),0)

The price of the Diamond with the given specification is USD 3390.