Understanding Linear Regression Using Diamonds data

Ashutosh

6 September 2017

Understanding Linear Regression Using Diamonds data

  1. Plotting the price attribute Vs a selectable Predictor variable
  2. Color the scatter points in the plot using a selectable Color variable
  3. Fit a regression model between price and selected Predictor variable
  4. Display the coefficients of the fit
  5. Display the R Squared value of the fit to show a mesaure of Explained variation to Total variation

About Dataset

library(ggplot2)
head(diamonds)
## # A tibble: 6 × 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

Plot of Price Vs Predictor

predictor_var<-"carat"
color_var<-"cut"
ggplot(data = diamonds,aes_string(x=predictor_var, y="price",color=color_var)) + geom_point() + geom_smooth(method = "lm")

Regression Model and R Squared Value

## 
## Call:
## lm(formula = price ~ carat, data = diamonds)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18585.3   -804.8    -18.9    537.4  12731.7 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2256.36      13.06  -172.8   <2e-16 ***
## carat        7756.43      14.07   551.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1549 on 53938 degrees of freedom
## Multiple R-squared:  0.8493, Adjusted R-squared:  0.8493 
## F-statistic: 3.041e+05 on 1 and 53938 DF,  p-value: < 2.2e-16
## [1] 0.8493305