606 Chapter 8 KLS

Chapter 8 Homework Problems: 8.2, 8.4, 8.8, 8.16, 8.18

8.2: Baby Weights, Part II

\[ \hat{baby weight} = 120.07 - 1.93 * parity \]
The estimated weight of babies who are not first born is 1.93 pounds lower than babies who are first born. First born: 120.07 - 1.93 * 0 = 120.07 Not first born: 120.07 - 1.93 * 1 = 118.14
T: -1.62, p = 0.1052. This is a fairly high p value, so we will not reject the null. Therefore, there is not enough evidence to conclude that there is an association between birth weight and parity.

8.4 Absenteeism

\[ \hat{days absent} = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn \]
The model predicts that non-aboriginal students miss 9.11 fewer days of school than aboriginal students on average, all other things held constant.

The model predicts that males miss 3.10 more days of school than females, all other things held constant.

The model predicts that slow learners miss an average of 2.15 more days of school than average learners, all other things held
constant.

\[ \hat{days absent} = 18.93 - 9.11 * 0 + 3.10 * 1 + 2.15 * 1 = 24.18 \]

The residual is 22.18.

\[ R^2 = 1 - (240.57 / 264.17) = 0.08933641 \]

\[ R^2 adjusted = 1 - (240.57/264.17) * ((146-1)/(146-3-1)) = 0.07009704 \]

18.93 - (9.11 * 0) + (3.10 * 1) + (2.15 * 1)

## [1] 24.18

1 - (240.57/264.17)

## [1] 0.08933641

1 - (240.57/264.17) * ((146-1)/(146-3-1))

## [1] 0.07009704

8.8: Absenteeism, Part II

The model with no learner status is producing the highest \(R^2\) value of 0.0723. Thereofore, learner status is the variable that should be removed first.

8.16: Challenger disaster, Part I

Temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

Damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)

Undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)

D.Challenger <- data.frame(Temperature, Damaged)
summary(D.Challenger)

##   Temperature       Damaged      
##  Min.   :53.00   Min.   :0.0000  
##  1st Qu.:67.00   1st Qu.:0.0000  
##  Median :70.00   Median :0.0000  
##  Mean   :69.57   Mean   :0.4783  
##  3rd Qu.:75.00   3rd Qu.:1.0000  
##  Max.   :81.00   Max.   :5.0000

U.Challenger <- data.frame(Temperature, Undamaged)
summary(U.Challenger)

##   Temperature      Undamaged    
##  Min.   :53.00   Min.   :1.000  
##  1st Qu.:67.00   1st Qu.:5.000  
##  Median :70.00   Median :6.000  
##  Mean   :69.57   Mean   :5.522  
##  3rd Qu.:75.00   3rd Qu.:6.000  
##  Max.   :81.00   Max.   :6.000

plot(Temperature ~ Damaged)

plot(Temperature ~ Undamaged)

plot(D.Challenger)

plot(U.Challenger)

The visuals (and therefore the data) are perfect mirrors of each other. There is no difference in temperature when comparing damaged O-rings to undamaged o-rings.
This model predicts that a damaged o-rings are associated with 0.2162 fewer degrees in temperature than undamaged o-rings.
\[ \hat{O-Rings} = 11.6630 - 0.2162 * Temperature \]
With a p-value this low, I would say yes, we should consider these concerns to be justified. It was not until we coded the o rings as damaged/undamaged though that we began to see statistically significant results. At a more granular level there was nothing to be concerned about.

8.18: Challenger disaster, Part II

#51 degrees
fiftyonedeg <- exp(11.6630 - 0.2162 * 51) / (1 + exp(11.6630 - 0.2162 * 51))

#53 degrees
fiftythreedeg <- exp(11.6630 - 0.2162 * 53) / (1 + exp(11.6630 - 0.2162 * 53))

#55 degrees
fiftyfivedeg <- exp(11.6630 - 0.2162 * 55) / (1 + exp(11.6630 - 0.2162 * 55))

fiftyonedeg

## [1] 0.6540297

fiftythreedeg

## [1] 0.5509228

fiftyfivedeg

## [1] 0.4432456

a)
\[
\hat{p}51 = 0.6540297
\]

\[
\hat{p}53 = 0.5509228
\]

\[
\hat{p}55 = 0.4432456
\]

b)

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 3.3.3

temps <- c(51,53,55,57,59,61,63,65,67,69,71)
probs <- c(0.6540297,0.5509228,0.4432456,0.341,0.251,0.179,0.124,0.084,0.056,0.037,0.024)

tempsandprobs <- data.frame(temps, probs)

#probability line by itself
ggplot(tempsandprobs, aes(x = temps, y = probs)) + geom_line()

#combined with original plot
combined <- ggplot() +
  geom_line(data=tempsandprobs, aes(x = temps, y = probs)) +
  geom_smooth(data = tempsandprobs, aes(x = temps, y = probs), fill = "blue", colour = "darkblue", size = 1) +
  geom_point(data = D.Challenger, aes( x = Temperature, y = Damaged))

print(combined)

## `geom_smooth()` using method = 'loess'

b) (continued)
As you can see, the line doesn't extend beyond 71 because we
don't have probabilities beyond that point.

c)
Concerns:
Lack of data is a concern for me. There is not enough of what
variables we do have, and I feel we don't have enough variables.

Assumptions:
Each outcome is independent of the other outcomes.
Each predictor is linearly related if all other predictors are
held constant.