library(ggplot2)BLOG4: Multiple Linear Regression
Multiple Regression summary using R
Here, we have the diamonds dataset and we have ran through the multiple linear regression model. The 2nd model we have reduced clarity to see how the model performs.The R squared reduced when we removed a significant variable such as clarity from the original FULL model.
Then we plot QQ plot to show how normal the data is given model2. The model significantly deviates at both the tail ends indicating that this model is not a good fit for the data.
You can add options to executable code like this
model <- lm(price ~ carat + cut + color + clarity + depth + table + x + y + z, data = diamonds)
model
Call:
lm(formula = price ~ carat + cut + color + clarity + depth +
table + x + y + z, data = diamonds)
Coefficients:
(Intercept) carat cut.L cut.Q cut.C cut^4
5753.762 11256.978 584.457 -301.908 148.035 -20.794
color.L color.Q color.C color^4 color^5 color^6
-1952.160 -672.054 -165.283 38.195 -95.793 -48.466
clarity.L clarity.Q clarity.C clarity^4 clarity^5 clarity^6
4097.431 -1925.004 982.205 -364.918 233.563 6.883
clarity^7 depth table x y z
90.640 -63.806 -26.474 -1008.261 9.609 -50.119
The echo: false option disables the printing of code (only output is displayed).
summary(model)
Call:
lm(formula = price ~ carat + cut + color + clarity + depth +
table + x + y + z, data = diamonds)
Residuals:
Min 1Q Median 3Q Max
-21376.0 -592.4 -183.5 376.4 10694.2
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5753.762 396.630 14.507 < 2e-16 ***
carat 11256.978 48.628 231.494 < 2e-16 ***
cut.L 584.457 22.478 26.001 < 2e-16 ***
cut.Q -301.908 17.994 -16.778 < 2e-16 ***
cut.C 148.035 15.483 9.561 < 2e-16 ***
cut^4 -20.794 12.377 -1.680 0.09294 .
color.L -1952.160 17.342 -112.570 < 2e-16 ***
color.Q -672.054 15.777 -42.597 < 2e-16 ***
color.C -165.283 14.725 -11.225 < 2e-16 ***
color^4 38.195 13.527 2.824 0.00475 **
color^5 -95.793 12.776 -7.498 6.59e-14 ***
color^6 -48.466 11.614 -4.173 3.01e-05 ***
clarity.L 4097.431 30.259 135.414 < 2e-16 ***
clarity.Q -1925.004 28.227 -68.197 < 2e-16 ***
clarity.C 982.205 24.152 40.668 < 2e-16 ***
clarity^4 -364.918 19.285 -18.922 < 2e-16 ***
clarity^5 233.563 15.752 14.828 < 2e-16 ***
clarity^6 6.883 13.715 0.502 0.61575
clarity^7 90.640 12.103 7.489 7.06e-14 ***
depth -63.806 4.535 -14.071 < 2e-16 ***
table -26.474 2.912 -9.092 < 2e-16 ***
x -1008.261 32.898 -30.648 < 2e-16 ***
y 9.609 19.333 0.497 0.61918
z -50.119 33.486 -1.497 0.13448
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1130 on 53916 degrees of freedom
Multiple R-squared: 0.9198, Adjusted R-squared: 0.9198
F-statistic: 2.688e+04 on 23 and 53916 DF, p-value: < 2.2e-16
model2 <- lm(price ~ carat + depth, data = diamonds)
model2
Call:
lm(formula = price ~ carat + depth, data = diamonds)
Coefficients:
(Intercept) carat depth
4045.3 7765.1 -102.2
summary(model2)
Call:
lm(formula = price ~ carat + depth, data = diamonds)
Residuals:
Min 1Q Median 3Q Max
-18238.9 -801.6 -19.6 546.3 12683.7
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4045.333 286.205 14.13 <2e-16 ***
carat 7765.141 14.009 554.28 <2e-16 ***
depth -102.165 4.635 -22.04 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1542 on 53937 degrees of freedom
Multiple R-squared: 0.8507, Adjusted R-squared: 0.8507
F-statistic: 1.536e+05 on 2 and 53937 DF, p-value: < 2.2e-16
qqnorm(model2$residuals)
qqline(model2$residuals)