Multiple/Logistic Regression

8.2

  1. Equation of the Regression Line:

Avg. Birth Weight = -1.93 x Parity + 120.07

  1. The slope in this context is the coefficient of the “parity” variable, i.e whether the baby was first born or not.

If the baby was first born, this line predicts the birth weight to be -1.93 + 120.07 = 118.14

  1. I would say no, there isn’t statistically significant relationship, because our p value is over 10%, meaning there is a relatively large chance the variability explained by this variable is simply due to chance.

8.4

  1. Equation of the regression line:

Days Absent = -9.11 x eth + 3.10 X sex + 2.15 X lrn + 18.93

  1. In this context the slopes each correspond to coefficients of the variables (ethnic background, gender and learner classification).

  2. Residual is calculated as Actual - Predicted.

Actual = 2

Predicted = 0 X -9.11 + 1 X 3.10 + 1 X 2.15 = 5.25

Residual = 2 - 5.25 = -3.25

R-Squared

var_e <- 240.57
var_y <- 264.17


R2 <- 1 - (var_e / var_y)

R2
## [1] 0.08933641

R-Squared (Adjusted):

var_e <- 240.57
var_y <- 264.17

n <- 146

k <- 3

R2_adj <- 1 - (var_e / var_y) * ((n - 1)/(n - k - 1))

R2_adj
## [1] 0.07009704

8.8

The first variable we should remove would be “No Ethnicity”, since it has the lowest Adjusted R-squared.

8.16

  1. At first glance, it looks that any o-rings at the highest temperatures (greater than 75 degrees), are not damaged at all. Lower than 57 degrees, and the majority of o-rings are damaged. Anywhere in between, 0 or a minority of o-rings are damaged.

  2. The “Estimate” column lists the coefficient of the predictor variables, as well as the intercept (which is a number without a corresponding variable that is added to the final equation).

The “Std Error” column is the Standard Error of the corresponding point estimate (which is the standard deviation of the sample distribution).

Z-value is the # of Std. Errors away from the sample mean which our point estimate lies.

Pr(>Z) column is the probability of us observing the point estimate based purely on chance. The higher the p value the less likely the variable is associated with our outcome.

  1. Logistic model formula: logit(p) = -0.2162 X Temperature + 11.6630

  2. Based on the model, it does look like temperature is strongy associated with failing o-rings. The p-value is essentially 0.

8.18

q_temp <- c(51,53,55)

q_exps <- 11.6630 - 0.2162*q_temp

q_phats <- (exp(1)^q_exps)/(1+exp(1)^q_exps)

q_phats
## [1] 0.6540297 0.5509228 0.4432456
temps <- seq(51,71,2)

p_hats <- c(q_phats,0.341,0.251,0.179,0.124, 0.084,0.056,0.037,0.024)

plot(x = temps, y = p_hats)
lines(x = temps, y = p_hats)

I don’t have concerns over using logistic regression, but the data points seem to be very few. We only have a one observation of flights at below 57 degrees, as well as just a few at higher temperatures. I would like to see some more data at those temperatures.

Key factors in validating logistic regression models are:

  1. Verify each predictor variable is linearly related to corresponding logit() values. This can’t be easily done without lots of data, and we only have a few data points. So we will have to assume this is true.

  2. Verify each observation is independent. Given our limited information we will have to assume this is true as well.