Project - Diamond Price Estimation

Martin Rico

2023-08-01

Description

The project described include three parts that confirm the knowledge about Developing Data Products.

Exploratory Analysis

For the project is used the diamonds data frame. The goal is allow to the user play with four variables (Carat, Cut, Color, Clarity) and make some plots using ggplot.

The code used to draw the graphics using ggplot and shiny is:

output$distPlot <- renderPlot({
    
    plot <- ggplot(data = diamonds)
   
   if(is.numeric(diamonds[[input$xaxis]])){
     plot <- plot + aes(x=diamonds[[input$xaxis]], y=diamonds$price, colour=diamonds[[input$coloraxis]]) +
       labs(y="Price (US dollars)", x=input$xaxis, colour=input$coloraxis)+
       labs(title = paste("Diamond prices using ",input$xaxis,"and",input$coloraxis))+
       geom_point()
     
   }else{
     plot <- plot + aes(x=diamonds[[input$xaxis]], y=diamonds$price, fill=diamonds[[input$coloraxis]]) +
       labs(y="Price (US dollars)", x=input$xaxis, fill=input$coloraxis)+
       labs(title = paste("Diamond prices using ",input$xaxis,"and",input$coloraxis))+
       theme_update(plot.title = element_text(hjust = 0.5))+
       geom_bar(stat = "identity")
   }
    plot
   
  })

Estimation

This part allow estimate the diamond price using Carat and Cut variables.

The linear regression model was build using the instruction:

library(ggplot2)
modelFit<-lm(price ~ carat+cut, data = diamonds)
summary(modelFit)$coeff
##                Estimate Std. Error     t value      Pr(>|t|)
## (Intercept) -2701.37602   15.43108 -175.060759  0.000000e+00
## carat        7871.08213   13.97963  563.039505  0.000000e+00
## cut.L        1239.80045   26.10004   47.501852  0.000000e+00
## cut.Q        -528.59779   23.13239  -22.850983 5.040203e-115
## cut.C         367.90995   20.21416   18.200609  8.496080e-74
## cut^4          74.59427   16.23958    4.593361  4.371486e-06

The estimation is done using the function predict(). The functionality allow change the carat value in order to estimate the price.

newDataForPrediction<-data.frame(carat=3,cut="Ideal")
predictedPrice<-predict(modelFit,newDataForPrediction)
as.double(predictedPrice)
## [1] 21538.7

For facility cut value is always “Ideal”.

Documentation

The documentation is done loading two .txt files where is described the use of the application.

The code to load the .txt files with the information is:

tabPanel("Documentation", 
                               
          titlePanel("Information about how to use the functions and more."),
                               
             fluidRow(
                       column(6,
                        br(),
                        pre(includeText("include.txt"))
                        ),
                       column(6,br(),
                        pre(includeText("diamond.txt"))
                        )
                      )
                               
        )