Exercise 8.1 introduces a data set on birth weight of babies. Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, from parity.
a. avg_birth_hat = 120.7 - 1.93*parity
b. The intercept can be looked at as the base weight that is adjusted based upon parity.
First born (parity 0) = 120.7 - 1.93*0 = 120.7
Not First born (p=1) = 120.7 - 1.93*1 = 118.14
c. No, we reject the null at 5% level. It is not statistically significant.
a. avg_days_absent = 18.93 - 9.11(eth) + 3.10(sex) + 2.15(lrn)
b.
eth - non-aboriginal students absent days is 9.11 days lower than by aboriginal students.
sex - male students have 3.10 more absent days than female studenTts.
lrn: slow learner absent days are 2.15 higher than the norm.
c. given eth =0, sex = 1 and lrn = 1, predicted equals: 18.83 - 0*9.11 + 1*3.10 _ 2.15 * 1 = 24.18
Residual = 2 - 24.18 = -22.18
d.
var_resid = 240.57
var_result = 264.17
n = 146
k = 3
R_Sqr = 1- (var_resid/var_result)
Adj_R_Sqr = 1- (var_resid/var_result) * ((n-1)/(n-k-1))
R_Sqr## [1] 0.08933641
Adj_R_Sqr## [1] 0.07009704
a. The variable lerner status should be removed first. The adj_R2 is 0.0723 and higher than the full model at 0.0701.
a. It appears that o-ring damage is more likely at lower temperatures.
b.The table confirms the negative relationship between temps and o-ring failures. The p-value is near zero, so therefore it's significant.
c. log(p_hat/(1-p_hat)) = 11.663 -0.2162 x temp
d. Would be nice to have more data, but yes we did have a p-value near zero.
a. Plugging the temps into the equation I calculate the following probabilities:
p51 = 0.6536
p53 = 0.5504
p55 = 0.4427
temp <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81,
53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
failure <- c(1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
logitmodel <- glm(failure ~ temp, family = binomial)
summary(logitmodel)##
## Call:
## glm(formula = failure ~ temp, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2646 -0.3395 -0.2472 -0.1299 3.0216
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.66299 3.29616 3.538 0.000403 ***
## temp -0.21623 0.05318 -4.066 4.77e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 76.745 on 137 degrees of freedom
## Residual deviance: 54.759 on 136 degrees of freedom
## AIC: 58.759
##
## Number of Fisher Scoring iterations: 6
probability <- data.frame(temp = c(50:85), prob = rep(0, length(c(50:85))))
probability[,2] <- predict(logitmodel, newdata = data.frame(temp = probability[, 1]), type = "response")
probability[probability$temp == 51, 2]## [1] 0.6536388
plot(probability$temp, probability$prob, xlab = "Temperature", ylab = "Probability of Damage")
curve(predict(logitmodel, data.frame(temp=x), type="response"), add=TRUE)c. My concerns include the number of samples, the number of damaged cases vs number of non damaged cases - would be good if they were proportonal. I'd also have concerns determining the correct cut-off for calling it a damaged vs non-damaged - stakes are high.