You may use this document as a starting point for this assignment. Your submission should be written in R and include all points addressed below.
To keep this asssignment simple, we are going to use the built in dataset diamonds that is included with R.
summary(diamonds)
## carat cut color clarity depth
## Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. :43.00
## 1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00
## Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80
## Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean :61.75
## 3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066 3rd Qu.:62.50
## Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00
## J: 2808 (Other): 2531
## table price x y
## Min. :43.00 Min. : 326 Min. : 0.000 Min. : 0.000
## 1st Qu.:56.00 1st Qu.: 950 1st Qu.: 4.710 1st Qu.: 4.720
## Median :57.00 Median : 2401 Median : 5.700 Median : 5.710
## Mean :57.46 Mean : 3933 Mean : 5.731 Mean : 5.735
## 3rd Qu.:59.00 3rd Qu.: 5324 3rd Qu.: 6.540 3rd Qu.: 6.540
## Max. :95.00 Max. :18823 Max. :10.740 Max. :58.900
##
## z
## Min. : 0.000
## 1st Qu.: 2.910
## Median : 3.530
## Mean : 3.539
## 3rd Qu.: 4.040
## Max. :31.800
##
The multiple linear regression using carat, depth, and table to predict price is done below.
fit <- lm(price ~ carat +depth + table, data = diamonds)
summary(fit)
##
## Call:
## lm(formula = price ~ carat + depth + table, data = diamonds)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18288.0 -785.9 -33.2 527.2 12486.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13003.441 390.918 33.26 <2e-16 ***
## carat 7858.771 14.151 555.36 <2e-16 ***
## depth -151.236 4.820 -31.38 <2e-16 ***
## table -104.473 3.141 -33.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1526 on 53936 degrees of freedom
## Multiple R-squared: 0.8537, Adjusted R-squared: 0.8537
## F-statistic: 1.049e+05 on 3 and 53936 DF, p-value: < 2.2e-16