Avg. Birth Weight = -1.93 x Parity + 120.07
If the baby was first born, this line predicts the birth weight to be -1.93 + 120.07 = 118.14
Days Absent = -9.11 x eth + 3.10 X sex + 2.15 X lrn + 18.93
In this context the slopes each correspond to coefficients of the variables (ethnic background, gender and learner classification).
Residual is calculated as Actual - Predicted.
Actual = 2
Predicted = 0 X -9.11 + 1 X 3.10 + 1 X 2.15 = 5.25
Residual = 2 - 5.25 = -3.25
R-Squared
var_e <- 240.57
var_y <- 264.17
R2 <- 1 - (var_e / var_y)
R2
## [1] 0.08933641
R-Squared (Adjusted):
var_e <- 240.57
var_y <- 264.17
n <- 146
k <- 3
R2_adj <- 1 - (var_e / var_y) * ((n - 1)/(n - k - 1))
R2_adj
## [1] 0.07009704
The first variable we should remove would be “No Ethnicity”, since it has the lowest Adjusted R-squared.
At first glance, it looks that any o-rings at the highest temperatures (greater than 75 degrees), are not damaged at all. Lower than 57 degrees, and the majority of o-rings are damaged. Anywhere in between, 0 or a minority of o-rings are damaged.
The “Estimate” column lists the coefficient of the predictor variables, as well as the intercept (which is a number without a corresponding variable that is added to the final equation).
The “Std Error” column is the Standard Error of the corresponding point estimate (which is the standard deviation of the sample distribution).
Z-value is the # of Std. Errors away from the sample mean which our point estimate lies.
Pr(>Z) column is the probability of us observing the point estimate based purely on chance. The higher the p value the less likely the variable is associated with our outcome.
Logistic model formula: logit(p) = -0.2162 X Temperature + 11.6630
Based on the model, it does look like temperature is strongy associated with failing o-rings. The p-value is essentially 0.
q_temp <- c(51,53,55)
q_exps <- 11.6630 - 0.2162*q_temp
q_phats <- (exp(1)^q_exps)/(1+exp(1)^q_exps)
q_phats
## [1] 0.6540297 0.5509228 0.4432456
temps <- seq(51,71,2)
p_hats <- c(q_phats,0.341,0.251,0.179,0.124, 0.084,0.056,0.037,0.024)
plot(x = temps, y = p_hats)
lines(x = temps, y = p_hats)
I don’t have concerns over using logistic regression, but the data points seem to be very few. We only have a one observation of flights at below 57 degrees, as well as just a few at higher temperatures. I would like to see some more data at those temperatures.
Key factors in validating logistic regression models are:
Verify each predictor variable is linearly related to corresponding logit() values. This can’t be easily done without lots of data, and we only have a few data points. So we will have to assume this is true.
Verify each observation is independent. Given our limited information we will have to assume this is true as well.