1) The boxplot illustrates an unusual relationship between price and clarity of the diamonds, the internally flawless diamonds have the lowest price but the very slightly imperfect gems have the highest average price.
2) A) The internally flawless (IF) has the lowest predicated value at $ 2694.80 but the calristyVS2 has the highest predicted value at $5,856.20 B) These results are suprising because the more flawed diamonds have a higher estimated price, but one would assume that the IF would have the higher price.
3) A) The intercepts and the slopes are different from the previous model. The slope of the line is negative rather than positive. B) Holding clarity constant, each additional carat increases the price of the diamond by $12,226.40. C) Holding other claritys and carat constant, clarity VS2 decreases the price of a diamond by $1561.90 from an average of a interally flawless diamond (IF).
4) When controlling for carat (holding it constant), the IF has the highest expected price (12,226.40-1851.20)= 10.375.20. The lowest expected price would be the clarityVS2 (12,226.40-1851.20-1561.90)= $8813.30
5) The different relationships between price and clarity when carat is considered demonstrates that size affects the price of diamonds. So, although IF diamonds cost more they are generally smaller size. Larger size diamonds are more flawed but are more likely to have high price. As the table shows (table(diam$carat, diam$clarity)), we see that as the smaller carat sized diamonds are less likely to be flawed and the larger are more likely to be flawed. Without taking carat size into account, the average price of diamonds appears reversed.
library(mosaic)
## Loading required package: grid Loading required package: lattice
##
## Attaching package: 'mosaic'
##
## The following objects are masked from 'package:stats':
##
## D, IQR, binom.test, cor, cov, fivenum, median, prop.test, sd, t.test, var
##
## The following objects are masked from 'package:base':
##
## max, mean, min, print, prod, range, sample, sum
diam = read.csv("http://www.macalester.edu/~ajohns24/data/Diamonds.csv")
boxplot(price ~ clarity, diam)
mod = lm(price ~ clarity, diam)
summary(mod)
##
## Call:
## lm(formula = price ~ clarity, data = diam)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5220 -1940 -991 2063 11218
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2695 494 5.45 1.0e-07 ***
## clarityVS1 2362 614 3.85 0.00015 ***
## clarityVS2 3163 668 4.73 3.4e-06 ***
## clarityVVS1 2873 671 4.28 2.5e-05 ***
## clarityVVS2 2662 618 4.31 2.2e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3280 on 303 degrees of freedom
## Multiple R-squared: 0.0843, Adjusted R-squared: 0.0722
## F-statistic: 6.97 on 4 and 303 DF, p-value: 2.22e-05
mod2 = lm(price ~ clarity + carat, diam)
summary(mod2)
##
## Call:
## lm(formula = price ~ clarity + carat, data = diam)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1962 -584 -63 435 5914
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1851 178 -10.43 < 2e-16 ***
## clarityVS1 -1001 203 -4.93 1.3e-06 ***
## clarityVS2 -1562 228 -6.84 4.3e-11 ***
## clarityVVS1 -404 220 -1.84 0.067 .
## clarityVVS2 -959 206 -4.66 4.8e-06 ***
## carat 12226 232 52.66 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1030 on 302 degrees of freedom
## Multiple R-squared: 0.91, Adjusted R-squared: 0.909
## F-statistic: 611 on 5 and 302 DF, p-value: <2e-16
table(diam$carat, diam$clarity)
##
## IF VS1 VS2 VVS1 VVS2
## 0.18 3 0 0 1 2
## 0.19 4 0 0 2 2
## 0.2 1 2 1 0 0
## 0.21 3 1 0 0 0
## 0.22 1 0 0 0 0
## 0.23 3 0 0 0 0
## 0.24 1 0 0 0 0
## 0.25 4 0 0 0 0
## 0.26 2 0 0 1 1
## 0.27 2 0 0 0 0
## 0.28 1 0 0 0 0
## 0.29 2 0 0 0 0
## 0.3 1 2 1 2 3
## 0.31 1 4 1 1 2
## 0.32 1 1 1 0 0
## 0.33 1 0 2 0 0
## 0.34 0 4 2 1 1
## 0.35 0 3 1 1 1
## 0.36 0 1 0 0 1
## 0.37 0 1 1 0 0
## 0.4 1 2 0 0 0
## 0.41 0 1 0 1 1
## 0.43 0 0 0 0 1
## 0.45 0 1 0 0 0
## 0.46 0 0 0 0 1
## 0.47 0 0 0 0 1
## 0.48 0 1 0 0 1
## 0.5 1 5 0 3 1
## 0.51 0 1 0 3 2
## 0.52 1 2 2 1 1
## 0.53 0 2 0 1 3
## 0.54 0 1 0 1 0
## 0.55 1 1 0 0 4
## 0.56 0 3 1 0 3
## 0.57 0 0 1 2 1
## 0.58 1 0 0 3 0
## 0.59 0 0 0 0 1
## 0.6 1 2 1 1 0
## 0.61 0 0 0 0 1
## 0.62 0 0 0 1 1
## 0.63 1 0 0 0 1
## 0.64 0 0 0 1 1
## 0.65 0 0 0 0 1
## 0.66 0 0 0 2 0
## 0.7 0 4 4 4 5
## 0.71 1 4 1 0 4
## 0.72 0 1 1 2 0
## 0.73 0 3 2 2 0
## 0.74 0 0 1 1 1
## 0.75 0 1 0 0 2
## 0.76 1 0 0 0 1
## 0.77 0 0 0 1 0
## 0.78 0 1 0 0 1
## 0.8 1 2 1 0 3
## 0.81 1 2 1 1 0
## 0.82 0 0 2 0 1
## 0.83 0 0 1 0 0
## 0.84 0 0 1 0 0
## 0.85 0 0 2 2 0
## 0.86 0 0 1 0 1
## 0.89 0 1 0 0 0
## 0.9 0 1 0 0 1
## 1 1 8 9 6 8
## 1.01 0 9 6 4 5
## 1.02 0 1 2 0 2
## 1.03 0 1 0 0 0
## 1.04 1 1 0 0 0
## 1.05 0 0 0 0 1
## 1.06 0 0 2 0 1
## 1.07 0 0 0 0 1
## 1.09 0 0 0 0 1
## 1.1 0 0 1 0 0