The equation of the regression line is: \[ \widehat{weight} = 120.07 - 1.93 \cdot \text{parity} \]
The slope is \(\beta_\text{parity} = -1.93\), which means that baby weights observed when \(\text{parity} = 1\) (non-first born) are -1.93 oz. less than when \(\text{parity} = 0\) (first born).
First born, \(\text{parity} = 0\): \[\widehat{weight} = 120.07\]
Non-first born, \(\text{parity} = 1\): \[\widehat{weight} = 120.07 - 1.93 = 118.14\]
Null hypothesis \(H_0\): \(\beta_\text{parity} = 0\)
Alternative hypothesis \(H_A\): \(\beta_\text{parity} \neq 0\)
From the table, the estimate for \(\beta_\text{parity}\) has a p-value of 10.5%, which is greater than the significant level of \(\alpha = 0.05\); in this case, we fail to reject \(H_0\), and conclude that the relationship between birth weight and parity is not statistically significant.
The equation of the regression line is: \[ \widehat{\text{days_absent}} = 18.93 - 9.11 \cdot \text{eth} + 3.10 \cdot \text{sex} + 2.15 \cdot \text{lrn} \]
The interpretation of each slope is:
\(\beta_\text{eth} = -9.11\): holding all else constant, \(\text{eth} = 1\) (not aboriginal) is associated with -9.11 less days absent, on average, than \(\text{eth} = 0\) (aboriginal).
\(\beta_\text{sex} = 3.10\): holding all else constant, \(\text{sex} = 1\) (male) is associated with 3.10 more days absent, on average, than \(\text{sex} = 0\) (female).
\(\beta_\text{lrn} = 2.15\): holding all else constant, \(\text{lrn} = 1\) (slow learner) is associated with 2.15 more days absent, on average, than \(\text{lrn} = 0\) (average learner).
Observation: \(y_1 = 2\)
Prediction: \(\hat{y_1} = 18.93 - 9.11 \cdot 0 + 3.10 \cdot 1 + 2.15 \cdot 1 = 24.18\)
Residual: \(e_1 = y_1 - \hat{y_1} = 2 - 24.18 = -22.18\)
The model over-predicts the days absent for the first observation.
\(n = 146\)
\(k = 3\)
\(n - 1 = 145\)
\(n - k -1 = 142\)
\(Var(e_i) = 240.57\)
\(Var(y_i) = 264.17\)
\[R^2 = 1 - \frac{Var(e_i)}{Var(y_i)} = 1 - 240.57 / 264.17 = 0.0893\] \[R^2_{adj} = 1 - \frac{Var(e_i)}{Var(y_i)} \cdot \frac{n-1}{n-k-1} = 0.0701\]
1 - 240.57 / 264.17
## [1] 0.08933641
1 - 240.57 / 264.17 * 145 / 142
## [1] 0.07009704The variable \(\text{lrn}\) (learner status) should be removed. This improves the model \(R^2_{adj}\) from 0.0701 to 0.0723.
The data suggests that the lower temperatures are associated with a greater frequency of damaged O-rings. For instance, when the temperature was 53, 5 out of 6 O-rings were damaged in mission 1.
Key components of the table:
The equation for the logistic model is:
\[logit(p_i) = log \left(\frac{p_i}{1-p_i}\right) = 11.6630 - 0.2162 \cdot \text{temp} \] This is equivalent to:
\[p_i = \frac{e^{11.6630 - 0.2162 \cdot \text{temp}}}{1 + e^{11.6630 - 0.2162 \cdot \text{temp}}}\]
Yes, concerns are justified, assuming that the conditions for logistic regression are satisfied. The p-value for the temperature variable is 0, which indicates that the relationship between \(logit(p_i)\) and temperature is statistically significant. Furthermore, the negative sign of the temperature coefficient indicates that lower temperatures are associated with higher probabilities of damage.
For instance, for temperatures in the mid-50s and below, the probability of damaged O-rings is significant (\(p_i\) > 44%):
(x <- 11.663 - 0.2162 * 80)
## [1] -5.633
exp(x) / (1 + exp(x))
## [1] 0.003565071
(x <- 11.663 - 0.2162 * 55)
## [1] -0.228
exp(x) / (1 + exp(x))
## [1] 0.4432456The equation for the logistic model is:
\[log \left(\frac{\hat{p}_i}{1-\hat{p}_i}\right) = 11.6630 - 0.2162 \cdot \text{temp}\] which is equivalent to:
\[\hat{p}_i = \frac{e^{11.6630 - 0.2162 \cdot \text{temp}}}{1 + e^{11.6630 - 0.2162 \cdot \text{temp}}}\]
# probability function
prob <- function(t) {
p <- exp(11.6630 - 0.2162 * t) / (1 + exp(11.6630 - 0.2162 * t))
return(p)
}
round(prob(seq(51, 71, 2)), 3)
## [1] 0.654 0.551 0.443 0.341 0.251 0.179 0.124 0.084 0.056 0.037 0.024See graph below.
library(ggplot2)
xrange <- seq(51, 71, 2)
df <- data.frame(
temp = xrange,
prob = prob(xrange)
)
ggplot(df, aes(x = temp, y = prob)) + geom_point() + geom_smooth(se = FALSE) +
labs(x = "Temperature (degrees F)", y = "Probability of damage",
title = "Probability of O-ring damage vs. temperature")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
One concern is the limited size of the dataset; it only includes 23 observations (shuttle missions with distinct temperature readings) with 138 outcomes for the O-rings (6 O-rings for each mission). Given the limited size of the dataset, one has to be careful in drawing conclusions and taking actions on that basis.
In terms of the conditions for logistic regression, it is assumed here that both conditions are satisfied: