8.2 Baby weights, Part II.

  1. The equation of the regression line is:

\[birthweight = 120.07 - 1.93parity\]

  1. The slope of the regression model means the baby is not first born, the average birth weight declines by 1.93 ounces from the average birth weight of a first born. A baby who is not first born is predicted to have a birth weight of 118.14 ounces. A first born is predicted to have an average birth weight of 120.07.
(not_first_born = 120.07 - 1.93 * 1 )
## [1] 118.14
(first_born = 120.07)
## [1] 120.07
  1. There is no statistically significant relationship between average birth weight and parity because the p-value for the parity coefficient is too big. The p-value of 0.1052 is larger than the threshold of 0.05 for statistical significance.

8.4 Absenteeism

  1. The equation of the regression model is:

\[y = 18.93 - 9.11eth + 3.10sex + 2.15lrn\]

  1. The slopes of the regression model may be interpreted as follows:

eth: a student has 9.11 fewer days absent if ethnicity is not aboriginal sex: a student has 3.10 more days absent if sex is male lrn: a student has 2.15 more days absent if a slow learner

  1. The residual for the first observation is:
eth = 0
sex = 1
lrn = 1
days = 2

(fitted_value = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn )
## [1] 24.18
(residual = days - fitted_value )
## [1] -22.18

The residual is -22.18 days. That is, the model predicts a higher number of absent days for this student that was actually realized.

  1. As shown below, \(R^2 = 8.93\)% and the adjusted-\(R^2\) equals 7.01%.
(R_squared = 1 - (240.57/264.17) )
## [1] 0.08933641
n = 146
k = 3
( R_squared_adj = 1 - (240.57/264.17) * ( n - 1)/(n - k - 1) )
## [1] 0.07009704

8.8 Absenteeism, Part II.

We prefer the model with the highest adjusted R-squared. THis would be the model 4 which has no learner status. It has an adjusted R-squared of 7.23%. So we should remove the lrn (learner status) variable.

8.16 Challenger disaster, Part I.

  1. The data suggests that lower temperatures are associated with more damaged O-rings.

  2. As the temperature increases, the probability of O-ring damage decreases because the sign of the coefficient for temp predictor is negative. The intercept of 11.663 is associated with the logistic regression model for which temperature is zero.

  3. The logistic model equation has the form:

\[log( \frac{p}{1-p}) = 11.663 - 0.2162 temp\]

This is equivalent to:

\[p = \frac{ exp( 11.663 - 0.2162 temp )}{ 1 + exp( 11.663 - 0.2162 temp)}\]

  1. Based on the model, concerns about the O-ring are justified. The cold temperature was the primary factor in the failure of the Challenger mission. For example, at temperature of 53 degree, the logistic model predicts a failure probability of 55.1%.
t = 53

(prob = exp( 11.663 - 0.2162 * t)/ ( 1 + exp(11.663 - 0.2162 * t)))
## [1] 0.5509228

8.18 Challenger disaster, Part II.

  1. To calculate the probability of O-ring damage, we write a logistic regression in the functional form below. This tells us the probability of damage at 51 degrees is 65.4%, at 53 degrees is 55.1% and at 55 degrees is 44.3%.
model_prob <- function(t){
  
    return( exp( 11.663 - 0.2162 * t) / ( 1 + exp( 11.663 - 0.2162 * t )))
}

(model_prob(51 ) )
## [1] 0.6540297
(model_prob(53 ) )
## [1] 0.5509228
(model_prob(55 ) )
## [1] 0.4432456
  1. First, we reproduce the empirical data in a plot, then we add the fitted line of the logistic regression model.
library(knitr)
library(tidyverse)
library(kableExtra)

raw_data = data.frame( mission = c(1:23),
                       temp = c( 53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81 ) ,
                       damage = c( 5, 1, 1, 1,  0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0 , 0 ) )

raw_data %>% mutate( freq = damage / 6.0 , fitted = model_prob(temp)) -> raw_data

Display the raw data from actual missions first.

knitr::kable( raw_data, digits = 4 ) %>% kable_styling( bootstrap_options = c("striped", "hover") )
mission temp damage freq fitted
1 53 5 0.8333 0.5509
2 57 1 0.1667 0.3406
3 58 1 0.1667 0.2939
4 63 1 0.1667 0.1237
5 66 0 0.0000 0.0687
6 67 0 0.0000 0.0561
7 67 0 0.0000 0.0561
8 67 0 0.0000 0.0561
9 68 0 0.0000 0.0457
10 69 0 0.0000 0.0372
11 70 1 0.1667 0.0301
12 70 0 0.0000 0.0301
13 70 1 0.1667 0.0301
14 70 0 0.0000 0.0301
15 72 0 0.0000 0.0198
16 73 0 0.0000 0.0160
17 75 0 0.0000 0.0104
18 75 1 0.1667 0.0104
19 76 0 0.0000 0.0084
20 76 0 0.0000 0.0084
21 78 0 0.0000 0.0055
22 79 0 0.0000 0.0044
23 81 0 0.0000 0.0029

To render the plot, add the points associated with temperatures for which no mission occurred into the data frame.

raw_data = add_row( raw_data, mission = 24, temp = 51, damage = 0, fitted = model_prob(temp), freq = fitted )
raw_data = add_row( raw_data, mission = 25, temp = 55, damage = 0, fitted = model_prob(temp), freq = fitted )
raw_data = add_row( raw_data, mission = 26, temp = 59, damage = 0, fitted = model_prob(temp), freq = fitted )
raw_data = add_row( raw_data, mission = 26, temp = 61, damage = 0, fitted = model_prob(temp), freq = fitted )
raw_data = add_row( raw_data, mission = 27, temp = 65, damage = 0, fitted = model_prob(temp), freq = fitted )


ggplot( data=raw_data, aes( x= temp, y = freq)) + geom_point() + geom_line( aes( x= temp, y = fitted), color = "red") +
  ggtitle("Challenger O-ring probability of damage with logistic fitted model in red")

  1. The two conditions stated by the textbook as key for the validity of the logistic regression are: linearity of the predictor to the probit and the independence of observations. The linearity of the probit relationship probability breaks down as temperatures go to freezing. At some point, the O-ring damage is not in doubt. The independence of each mission seems reasonable. However, within the range of 51 to 81 degrees, the logistic model seems to be plausible. At the lower temperature of the Challenger disaster, the probability of damage seems very high.