BLOG4: Multiple Linear Regression

Multiple Regression summary using R

Here, we have the diamonds dataset and we have ran through the multiple linear regression model. The 2nd model we have reduced clarity to see how the model performs.The R squared reduced when we removed a significant variable such as clarity from the original FULL model.

Then we plot QQ plot to show how normal the data is given model2. The model significantly deviates at both the tail ends indicating that this model is not a good fit for the data.

library(ggplot2)

You can add options to executable code like this

model <- lm(price ~ carat + cut + color + clarity + depth + table + x + y + z, data = diamonds)
model

Call:
lm(formula = price ~ carat + cut + color + clarity + depth + 
    table + x + y + z, data = diamonds)

Coefficients:
(Intercept)        carat        cut.L        cut.Q        cut.C        cut^4  
   5753.762    11256.978      584.457     -301.908      148.035      -20.794  
    color.L      color.Q      color.C      color^4      color^5      color^6  
  -1952.160     -672.054     -165.283       38.195      -95.793      -48.466  
  clarity.L    clarity.Q    clarity.C    clarity^4    clarity^5    clarity^6  
   4097.431    -1925.004      982.205     -364.918      233.563        6.883  
  clarity^7        depth        table            x            y            z  
     90.640      -63.806      -26.474    -1008.261        9.609      -50.119  

The echo: false option disables the printing of code (only output is displayed).

summary(model)

Call:
lm(formula = price ~ carat + cut + color + clarity + depth + 
    table + x + y + z, data = diamonds)

Residuals:
     Min       1Q   Median       3Q      Max 
-21376.0   -592.4   -183.5    376.4  10694.2 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  5753.762    396.630   14.507  < 2e-16 ***
carat       11256.978     48.628  231.494  < 2e-16 ***
cut.L         584.457     22.478   26.001  < 2e-16 ***
cut.Q        -301.908     17.994  -16.778  < 2e-16 ***
cut.C         148.035     15.483    9.561  < 2e-16 ***
cut^4         -20.794     12.377   -1.680  0.09294 .  
color.L     -1952.160     17.342 -112.570  < 2e-16 ***
color.Q      -672.054     15.777  -42.597  < 2e-16 ***
color.C      -165.283     14.725  -11.225  < 2e-16 ***
color^4        38.195     13.527    2.824  0.00475 ** 
color^5       -95.793     12.776   -7.498 6.59e-14 ***
color^6       -48.466     11.614   -4.173 3.01e-05 ***
clarity.L    4097.431     30.259  135.414  < 2e-16 ***
clarity.Q   -1925.004     28.227  -68.197  < 2e-16 ***
clarity.C     982.205     24.152   40.668  < 2e-16 ***
clarity^4    -364.918     19.285  -18.922  < 2e-16 ***
clarity^5     233.563     15.752   14.828  < 2e-16 ***
clarity^6       6.883     13.715    0.502  0.61575    
clarity^7      90.640     12.103    7.489 7.06e-14 ***
depth         -63.806      4.535  -14.071  < 2e-16 ***
table         -26.474      2.912   -9.092  < 2e-16 ***
x           -1008.261     32.898  -30.648  < 2e-16 ***
y               9.609     19.333    0.497  0.61918    
z             -50.119     33.486   -1.497  0.13448    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1130 on 53916 degrees of freedom
Multiple R-squared:  0.9198,    Adjusted R-squared:  0.9198 
F-statistic: 2.688e+04 on 23 and 53916 DF,  p-value: < 2.2e-16
model2 <- lm(price ~ carat + depth, data = diamonds)
model2

Call:
lm(formula = price ~ carat + depth, data = diamonds)

Coefficients:
(Intercept)        carat        depth  
     4045.3       7765.1       -102.2  
summary(model2)

Call:
lm(formula = price ~ carat + depth, data = diamonds)

Residuals:
     Min       1Q   Median       3Q      Max 
-18238.9   -801.6    -19.6    546.3  12683.7 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4045.333    286.205   14.13   <2e-16 ***
carat       7765.141     14.009  554.28   <2e-16 ***
depth       -102.165      4.635  -22.04   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1542 on 53937 degrees of freedom
Multiple R-squared:  0.8507,    Adjusted R-squared:  0.8507 
F-statistic: 1.536e+05 on 2 and 53937 DF,  p-value: < 2.2e-16
qqnorm(model2$residuals)
qqline(model2$residuals)