Exercise 5.4.1

Using the trees data frame that comes pre-installed in R, fit the regression model that uses the tree Height to explain the Volume of wood harvested from the tree.

(a) Graph the data

(b) Fit an lm model

## 
## Call:
## lm(formula = Volume ~ Height, data = trees)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.274  -9.894  -2.894  12.068  29.852 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.1236    29.2731  -2.976 0.005835 ** 
## Height        1.5433     0.3839   4.021 0.000378 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.4 on 29 degrees of freedom
## Multiple R-squared:  0.3579, Adjusted R-squared:  0.3358 
## F-statistic: 16.16 on 1 and 29 DF,  p-value: 0.0003784

(c) Print out the table of coefficients with estimate names, estimated value, standard error, and upper and lower 95% confidence intervals.

Estimate Std. Error t value Pr(>|t|)
(Intercept) -87.12361 29.2731221 -2.976232 0.0058347
Height 1.54335 0.3838693 4.020509 0.0003784
2.5 % 97.5 %
(Intercept) -146.993871 -27.253356
Height 0.758249 2.328451

(d) Add the model fitted values to the trees data frame along with the regression model confidence intervals.

Girth Height Volume fit lwr upr
8.3 70 10.3 20.91087 14.098550 27.72319
8.6 65 10.3 13.19412 3.254288 23.13395
8.8 63 10.2 10.10742 -1.223363 21.43821
10.5 72 16.4 23.99757 18.159758 29.83538
10.7 81 18.8 37.88772 31.592680 44.18275
10.8 83 19.7 40.97442 33.597379 48.35145
11.0 66 15.6 14.73747 5.471607 24.00333
11.0 75 18.2 28.62762 23.644217 33.61102
11.1 80 22.6 36.34437 30.506556 42.18218
11.2 75 19.9 28.62762 23.644217 33.61102
11.3 79 24.2 34.80102 29.345254 40.25678
11.4 76 21.0 30.17097 25.249799 35.09214
11.4 76 21.4 30.17097 25.249799 35.09214
11.7 69 21.3 19.36752 11.990482 26.74456
12.0 75 19.1 28.62762 23.644217 33.61102
12.9 74 22.2 27.08427 21.918668 32.24987
12.9 85 33.8 44.06112 35.450370 52.67186
13.3 86 27.4 45.60447 36.338602 54.87033
13.7 71 25.7 22.45422 16.159183 28.74926
13.8 64 24.9 11.65077 1.021703 22.27984
14.0 78 34.5 33.25767 28.092067 38.42327
14.2 80 31.7 36.34437 30.506556 42.18218
14.5 74 36.3 27.08427 21.918668 32.24987
16.0 72 38.3 23.99757 18.159758 29.83538
16.3 77 42.6 31.71432 26.730917 36.69772
17.3 81 55.4 37.88772 31.592680 44.18275
17.5 82 55.7 39.43107 32.618747 46.24339
17.9 80 58.3 36.34437 30.506556 42.18218
18.0 80 51.5 36.34437 30.506556 42.18218
18.0 80 51.0 36.34437 30.506556 42.18218
20.6 87 77.0 47.14782 37.207982 57.08765

(e) Graph the data and fitted regression line and uncertainty ribbon.

(f) Add the R-squared value as an annotation to the graph.

Exercise 5.4.2

The data set phbirths from the faraway package contains information on birth weight, gestational length, and smoking status of mother. We’ll fit a quadratic model to predict infant birth weight using the gestational time.

a. Create two scatter plots of gestational length and birth weight, one for each smoking status.

b. Remove all the observations that are premature (less than 36 weeks). For the remainder of the problem, only use these full-term babies.

black educ smoke gestate grams
1 FALSE 0 Smoke 40 2898
3 FALSE 2 No Smoke 38 3977
4 FALSE 2 Smoke 37 3040
5 FALSE 2 No Smoke 38 3523
6 FALSE 5 Smoke 40 3100
7 TRUE 6 No Smoke 40 3670

c. Fit the quadratic model

## 
## Call:
## lm(formula = grams ~ poly(gestate, 2) * smoke, data = phbirths2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1433.51  -296.25   -12.25   291.68  1464.49 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   3364.02      15.45 217.751  < 2e-16 ***
## poly(gestate, 2)1             5770.90     504.11  11.448  < 2e-16 ***
## poly(gestate, 2)2            -2287.74     512.46  -4.464 8.92e-06 ***
## smokeSmoke                    -202.81      32.68  -6.206 7.85e-10 ***
## poly(gestate, 2)1:smokeSmoke  1813.07    1027.39   1.765 0.077904 .  
## poly(gestate, 2)2:smokeSmoke  3654.80     988.42   3.698 0.000229 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 437.7 on 1033 degrees of freedom
## Multiple R-squared:  0.2108, Adjusted R-squared:  0.207 
## F-statistic: 55.19 on 5 and 1033 DF,  p-value: < 2.2e-16

d. Add the model fitted values to the phbirths data frame along with the regression model confidence intervals.

black educ smoke gestate grams fit lwr upr
1 FALSE 0 Smoke 40 2898 3243.132 3174.250 3312.013
3 FALSE 2 No Smoke 38 3977 3200.173 3156.090 3244.256
4 FALSE 2 Smoke 37 3040 2804.668 2692.140 2917.196
5 FALSE 2 No Smoke 38 3523 3200.173 3156.090 3244.256
6 FALSE 5 Smoke 40 3100 3243.132 3174.250 3312.013
7 TRUE 6 No Smoke 40 3670 3478.249 3441.992 3514.507

e. On your two scatterplots from part (a), add layers for the model fits and ribbon of uncertainty for the model fits.

f. Create a column for the residuals in the phbirths data set using any of the following:

black educ smoke gestate grams fit lwr upr residuals
1 FALSE 0 Smoke 40 2898 3243.132 3174.250 3312.013 -345.1315
3 FALSE 2 No Smoke 38 3977 3200.173 3156.090 3244.256 776.8272
4 FALSE 2 Smoke 37 3040 2804.668 2692.140 2917.196 235.3318
5 FALSE 2 No Smoke 38 3523 3200.173 3156.090 3244.256 322.8272
6 FALSE 5 Smoke 40 3100 3243.132 3174.250 3312.013 -143.1315
7 TRUE 6 No Smoke 40 3670 3478.249 3441.992 3514.507 191.7509

g. Create a histogram of the residuals.