8.2 Baby weights, Part II.

Exercise 8.1 introduces a data set on birth weight of babies. Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, from parity.

a. avg_birth_hat = 120.7 - 1.93*parity
b. The intercept can be looked at as the base weight that is adjusted based upon parity.

   First born (parity 0) = 120.7 - 1.93*0 = 120.7
   Not First born (p=1)  = 120.7 - 1.93*1 = 118.14
c. No, we reject the null at 5% level. It is not statistically significant.

8.4 Absenteeism, Part I.

a. avg_days_absent = 18.93 - 9.11(eth) + 3.10(sex) + 2.15(lrn)
b.
eth - non-aboriginal students absent days is 9.11 days lower than by aboriginal students.

sex - male students have 3.10  more absent days than female studenTts.

lrn: slow learner absent days are 2.15 higher than the norm.
c. given eth =0, sex = 1 and lrn = 1, predicted equals: 18.83 - 0*9.11 + 1*3.10 _ 2.15 * 1 = 24.18

Residual = 2 - 24.18 = -22.18
d.
var_resid = 240.57
var_result = 264.17
n = 146
k = 3

R_Sqr = 1- (var_resid/var_result)
Adj_R_Sqr = 1- (var_resid/var_result) * ((n-1)/(n-k-1))
R_Sqr
## [1] 0.08933641
Adj_R_Sqr
## [1] 0.07009704

8.8 Absenteeism, Part II.

a.  The variable lerner status should be removed first. The adj_R2 is 0.0723 and higher than the full model at 0.0701.

8.16 Challenger disaster, Part I.

a. It appears that o-ring damage is more likely at lower temperatures. 
b.The table confirms the negative relationship between temps and o-ring failures. The p-value is near zero, so therefore it's significant.
c. log(p_hat/(1-p_hat)) = 11.663 -0.2162 x temp
d. Would be nice to have more data, but yes we did have a p-value near zero.

8.18 Challenger disaster, Part II.

a. Plugging the temps into the equation I calculate the following probabilities:

p51 = 0.6536
p53 = 0.5504
p55 = 0.4427
temp <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
          53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
          53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
          53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
          53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
          53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)

failure <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
             1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
             1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
             1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
             1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
             0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

logitmodel <- glm(failure ~ temp, family = binomial)

summary(logitmodel)
## 
## Call:
## glm(formula = failure ~ temp, family = binomial)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2646  -0.3395  -0.2472  -0.1299   3.0216  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 11.66299    3.29616   3.538 0.000403 ***
## temp        -0.21623    0.05318  -4.066 4.77e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 76.745  on 137  degrees of freedom
## Residual deviance: 54.759  on 136  degrees of freedom
## AIC: 58.759
## 
## Number of Fisher Scoring iterations: 6
probability <-  data.frame(temp = c(50:85), prob = rep(0, length(c(50:85))))
probability[,2] <- predict(logitmodel, newdata = data.frame(temp = probability[, 1]), type = "response")

probability[probability$temp == 51, 2]
## [1] 0.6536388
plot(probability$temp, probability$prob, xlab = "Temperature", ylab = "Probability of Damage")
curve(predict(logitmodel, data.frame(temp=x), type="response"), add=TRUE)

c. My concerns include the number of samples, the number of damaged cases vs number of non damaged cases - would be good if they were proportonal. I'd also have concerns determining the correct cut-off for calling it a damaged vs non-damaged - stakes are high.