Exercise 3.25:
Get data and fit complete second order model:
setwd("/Users/traves/Dropbox/SM339/day22")
data = read.csv("Diamonds.csv")
head(data)
## Carat Color Clarity Depth PricePerCt TotalPrice
## 1 1.08 E VS1 68.6 6693 7228.8
## 2 0.31 F VVS1 61.9 3159 979.3
## 3 0.31 H VS1 62.1 1755 544.1
## 4 0.32 F VVS1 60.8 3159 1010.9
## 5 0.33 D IF 60.8 4759 1570.4
## 6 0.33 G VVS1 61.5 2896 955.6
attach(data)
C2 = lm(TotalPrice ~ 1 + Carat + Depth + I(Carat^2) + I(Depth^2) + I(Carat *
Depth))
C2woD = lm(TotalPrice ~ 1 + Carat + I(Carat^2))
anova(C2woD, C2)
## Analysis of Variance Table
##
## Model 1: TotalPrice ~ 1 + Carat + I(Carat^2)
## Model 2: TotalPrice ~ 1 + Carat + Depth + I(Carat^2) + I(Depth^2) + I(Carat *
## Depth)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 348 1.57e+09
## 2 345 1.45e+09 3 1.19e+08 9.43 5.2e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The hypothesis test is determining whether all the terms involving Depth actually help the model fit better (Null hypothesis is that extra terms don't help, Alternative hypothesis is that extra terms do help). Since the p-value is tiny, we reject the null hypothesis and conclude that removing the Depth terms significantly impair the complete model's effectiveness. It is harder to see this in the adjusted \( R^2 \) statistics since both are so high:
summary(C2)
##
## Call:
## lm(formula = TotalPrice ~ 1 + Carat + Depth + I(Carat^2) + I(Depth^2) +
## I(Carat * Depth))
##
## Residuals:
## Min 1Q Median 3Q Max
## -12196 -653 -39 486 10582
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24338.82 30297.91 0.80 0.422
## Carat 7573.62 3040.79 2.49 0.013 *
## Depth -728.70 904.44 -0.81 0.421
## I(Carat^2) 4761.59 330.25 14.42 <2e-16 ***
## I(Depth^2) 5.28 6.73 0.78 0.433
## I(Carat * Depth) -83.89 53.53 -1.57 0.118
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2050 on 345 degrees of freedom
## Multiple R-squared: 0.931, Adjusted R-squared: 0.93
## F-statistic: 936 on 5 and 345 DF, p-value: <2e-16
summary(C2woD)
##
## Call:
## lm(formula = TotalPrice ~ 1 + Carat + I(Carat^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10207 -712 -168 355 12147
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -523 466 -1.12 0.2631
## Carat 2386 752 3.17 0.0017 **
## I(Carat^2) 4498 263 17.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2130 on 348 degrees of freedom
## Multiple R-squared: 0.926, Adjusted R-squared: 0.925
## F-statistic: 2.17e+03 on 2 and 348 DF, p-value: <2e-16
Exercise 3.26
Fit the quadratic model in Carat:
quad = lm(TotalPrice ~ 1 + Carat + I(Carat^2))
summary(quad)
##
## Call:
## lm(formula = TotalPrice ~ 1 + Carat + I(Carat^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -10207 -712 -168 355 12147
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -523 466 -1.12 0.2631
## Carat 2386 752 3.17 0.0017 **
## I(Carat^2) 4498 263 17.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2130 on 348 degrees of freedom
## Multiple R-squared: 0.926, Adjusted R-squared: 0.925
## F-statistic: 2.17e+03 on 2 and 348 DF, p-value: <2e-16
a. Fit:
new = data.frame(Carat = c(0.5), Depth = 62)
predict(quad, newdata = new)
## 1
## 1795
The model predicts a total price of $1,794.84.
b. Predict mean total price (confidence interval):
new = data.frame(Carat = c(0.5), Depth = 62)
predict(quad, newdata = new, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 1795 1424 2165
The model gives a 95% confidence interval for the mean total price of such diamonds between $1,424.30 and $2165.40. We are 95% confident that the average total price for all such diamonds is between $1,424.30 and $2165.40.
c. Predict
new = data.frame(Carat = c(0.5), Depth = 62)
predict(quad, newdata = new, interval = "predict", level = 0.95)
## fit lwr upr
## 1 1795 -2404 5994
The model gives a 95% confidence interval for the total price of an individual such diamond between $-2,404.46 and $5,994.15. We are 95% confident that the price for such a diamond will be less than $5,994.15.
d. First fit the log scale model:
logm = lm(log(TotalPrice) ~ 1 + Carat + Depth + I(Carat^2) + I(Depth^2) + I(Carat *
Depth))
new = data.frame(Carat = c(0.5), Depth = 62)
p1 = predict(logm, newdata = new, interval = "confidence", level = 0.95)
exp(p1)
## fit lwr upr
## 1 1856 1781 1934
p2 = predict(logm, newdata = new, interval = "predict", level = 0.95)
exp(p2)
## fit lwr upr
## 1 1856 1177 2926
Using the log model, we are 95% confident that the average total price of such diamonds is between $1,781 and $1934. We are also 95% confident that the total price of the individal diamond is between $1,177 and $2,926. Note how much more helpful this second interval is, compared with the interval in part c.