Exercise 5.4.1
Using the trees data frame that comes pre-installed in R, fit the regression model that uses the tree Height to explain the Volume of wood harvested from the tree.
(a) Graph the data

(b) Fit an lm model
##
## Call:
## lm(formula = Volume ~ Height, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.274 -9.894 -2.894 12.068 29.852
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.1236 29.2731 -2.976 0.005835 **
## Height 1.5433 0.3839 4.021 0.000378 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.4 on 29 degrees of freedom
## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3358
## F-statistic: 16.16 on 1 and 29 DF, p-value: 0.0003784
(c) Print out the table of coefficients with estimate names, estimated value, standard error, and upper and lower 95% confidence intervals.
| (Intercept) |
-87.12361 |
29.2731221 |
-2.976232 |
0.0058347 |
| Height |
1.54335 |
0.3838693 |
4.020509 |
0.0003784 |
| (Intercept) |
-146.993871 |
-27.253356 |
| Height |
0.758249 |
2.328451 |
(d) Add the model fitted values to the trees data frame along with the regression model confidence intervals.
| 8.3 |
70 |
10.3 |
20.91087 |
14.098550 |
27.72319 |
| 8.6 |
65 |
10.3 |
13.19412 |
3.254288 |
23.13395 |
| 8.8 |
63 |
10.2 |
10.10742 |
-1.223363 |
21.43821 |
| 10.5 |
72 |
16.4 |
23.99757 |
18.159758 |
29.83538 |
| 10.7 |
81 |
18.8 |
37.88772 |
31.592680 |
44.18275 |
| 10.8 |
83 |
19.7 |
40.97442 |
33.597379 |
48.35145 |
| 11.0 |
66 |
15.6 |
14.73747 |
5.471607 |
24.00333 |
| 11.0 |
75 |
18.2 |
28.62762 |
23.644217 |
33.61102 |
| 11.1 |
80 |
22.6 |
36.34437 |
30.506556 |
42.18218 |
| 11.2 |
75 |
19.9 |
28.62762 |
23.644217 |
33.61102 |
| 11.3 |
79 |
24.2 |
34.80102 |
29.345254 |
40.25678 |
| 11.4 |
76 |
21.0 |
30.17097 |
25.249799 |
35.09214 |
| 11.4 |
76 |
21.4 |
30.17097 |
25.249799 |
35.09214 |
| 11.7 |
69 |
21.3 |
19.36752 |
11.990482 |
26.74456 |
| 12.0 |
75 |
19.1 |
28.62762 |
23.644217 |
33.61102 |
| 12.9 |
74 |
22.2 |
27.08427 |
21.918668 |
32.24987 |
| 12.9 |
85 |
33.8 |
44.06112 |
35.450370 |
52.67186 |
| 13.3 |
86 |
27.4 |
45.60447 |
36.338602 |
54.87033 |
| 13.7 |
71 |
25.7 |
22.45422 |
16.159183 |
28.74926 |
| 13.8 |
64 |
24.9 |
11.65077 |
1.021703 |
22.27984 |
| 14.0 |
78 |
34.5 |
33.25767 |
28.092067 |
38.42327 |
| 14.2 |
80 |
31.7 |
36.34437 |
30.506556 |
42.18218 |
| 14.5 |
74 |
36.3 |
27.08427 |
21.918668 |
32.24987 |
| 16.0 |
72 |
38.3 |
23.99757 |
18.159758 |
29.83538 |
| 16.3 |
77 |
42.6 |
31.71432 |
26.730917 |
36.69772 |
| 17.3 |
81 |
55.4 |
37.88772 |
31.592680 |
44.18275 |
| 17.5 |
82 |
55.7 |
39.43107 |
32.618747 |
46.24339 |
| 17.9 |
80 |
58.3 |
36.34437 |
30.506556 |
42.18218 |
| 18.0 |
80 |
51.5 |
36.34437 |
30.506556 |
42.18218 |
| 18.0 |
80 |
51.0 |
36.34437 |
30.506556 |
42.18218 |
| 20.6 |
87 |
77.0 |
47.14782 |
37.207982 |
57.08765 |
(e) Graph the data and fitted regression line and uncertainty ribbon.

(f) Add the R-squared value as an annotation to the graph.

Exercise 5.4.2
The data set phbirths from the faraway package contains information on birth weight, gestational length, and smoking status of mother. We’ll fit a quadratic model to predict infant birth weight using the gestational time.
a. Create two scatter plots of gestational length and birth weight, one for each smoking status.

b. Remove all the observations that are premature (less than 36 weeks). For the remainder of the problem, only use these full-term babies.
| 1 |
FALSE |
0 |
Smoke |
40 |
2898 |
| 3 |
FALSE |
2 |
No Smoke |
38 |
3977 |
| 4 |
FALSE |
2 |
Smoke |
37 |
3040 |
| 5 |
FALSE |
2 |
No Smoke |
38 |
3523 |
| 6 |
FALSE |
5 |
Smoke |
40 |
3100 |
| 7 |
TRUE |
6 |
No Smoke |
40 |
3670 |
c. Fit the quadratic model
##
## Call:
## lm(formula = grams ~ poly(gestate, 2) * smoke, data = phbirths2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1433.51 -296.25 -12.25 291.68 1464.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3364.02 15.45 217.751 < 2e-16 ***
## poly(gestate, 2)1 5770.90 504.11 11.448 < 2e-16 ***
## poly(gestate, 2)2 -2287.74 512.46 -4.464 8.92e-06 ***
## smokeSmoke -202.81 32.68 -6.206 7.85e-10 ***
## poly(gestate, 2)1:smokeSmoke 1813.07 1027.39 1.765 0.077904 .
## poly(gestate, 2)2:smokeSmoke 3654.80 988.42 3.698 0.000229 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 437.7 on 1033 degrees of freedom
## Multiple R-squared: 0.2108, Adjusted R-squared: 0.207
## F-statistic: 55.19 on 5 and 1033 DF, p-value: < 2.2e-16
d. Add the model fitted values to the phbirths data frame along with the regression model confidence intervals.
| 1 |
FALSE |
0 |
Smoke |
40 |
2898 |
3243.132 |
3174.250 |
3312.013 |
| 3 |
FALSE |
2 |
No Smoke |
38 |
3977 |
3200.173 |
3156.090 |
3244.256 |
| 4 |
FALSE |
2 |
Smoke |
37 |
3040 |
2804.668 |
2692.140 |
2917.196 |
| 5 |
FALSE |
2 |
No Smoke |
38 |
3523 |
3200.173 |
3156.090 |
3244.256 |
| 6 |
FALSE |
5 |
Smoke |
40 |
3100 |
3243.132 |
3174.250 |
3312.013 |
| 7 |
TRUE |
6 |
No Smoke |
40 |
3670 |
3478.249 |
3441.992 |
3514.507 |
e. On your two scatterplots from part (a), add layers for the model fits and ribbon of uncertainty for the model fits.

f. Create a column for the residuals in the phbirths data set using any of the following:
| 1 |
FALSE |
0 |
Smoke |
40 |
2898 |
3243.132 |
3174.250 |
3312.013 |
-345.1315 |
| 3 |
FALSE |
2 |
No Smoke |
38 |
3977 |
3200.173 |
3156.090 |
3244.256 |
776.8272 |
| 4 |
FALSE |
2 |
Smoke |
37 |
3040 |
2804.668 |
2692.140 |
2917.196 |
235.3318 |
| 5 |
FALSE |
2 |
No Smoke |
38 |
3523 |
3200.173 |
3156.090 |
3244.256 |
322.8272 |
| 6 |
FALSE |
5 |
Smoke |
40 |
3100 |
3243.132 |
3174.250 |
3312.013 |
-143.1315 |
| 7 |
TRUE |
6 |
No Smoke |
40 |
3670 |
3478.249 |
3441.992 |
3514.507 |
191.7509 |
g. Create a histogram of the residuals.
