8.2

Write the equation of the regression line.

\(weight = 120.07 - 1.93(paritynot\_first\_born)\)

Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.

If a baby is not the first born, the average birth weight is 1.93 smaller than a first-born child.

\(Firstborn = 120.07 - 1.93(0) = 120.07\)

\(Not\_Firstborn = 120.07 - 1.93(1) = 118.14\)

Is there a statistically significant relationship between the average birth weight and parity?

No, the p-value isn’t very small so we can’t say for sure the the the birth weight is tied to whether the baby is firstborn or not.

8.4

Write the equation of the regression line.

\(days\_absent = 18.93 - 9.11(eth) + 3.10(sex) + 2.15(lrn)\)

Interpret each one of the slopes in this context.

eth: If the child is not aboriginal and everything else stays the same then the average number of days absent would be 9.11 less

sex: If the child is a male and everything else stays the same then the average number of days absent would be 3.1 more

lrn: If the child is a slow learner and everything else stays the same then the average number of days absent would be 2.15 more

Calculate the residual for the first observation in the data set: a student who is aboriginal, male, a slow learner, and missed 2 days of school.

22.18 days.

eth <- 0
sex <- 1
lrn <- 1

x <- 18.93 - 9.11*eth + 3.10*sex + 2.15*lrn
x - 2
## [1] 22.18

The variance of the residuals is 240.57, and the variance of the number of absent days for all students in the data set is 264.17. Calculate the R2 and the adjusted R2. Note that there are 146 observations in the data set.

\(R^2 = 1 - \frac{240.57}{264.17}\)

\(R^2 = 0.0894\)

\(R^2_{adj} = R^2 \times \frac{146 -1}{146-3-1}\)

\(R^2_{adj} = .0702\)

8.8

The model that should be eliminated is the ethnicity variable. It has the smalled R value.

8.16

Each column of the table above represents a di???erent shuttle mission. Examine these data and describe what you observe with respect to the relationship between temperatures and damaged O-rings.

From looking at the data it seems that the probability of having a damaged O-ring is more likely when the temperature is below 63 degrees.

Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and a logistic regression model was fit to these data. A summary of this model is given below. Describe the key components of this summary table in words.

The key components of the the summary table are the intercept and point estimate for temperature. The higher the temperature the less change of an error. The p-values show that the temperature and intercept and significantly important to the model.

Write out the logistic model using the point estimates of the model parameters.

\(log_e(\frac{p_i}{1-p_i}) = 11.6630 - 0.2162(temp)\)

Based on the model, do you think concerns regarding O-rings are justified? Explain.

I believe there is a concern. It’s obvious that the below 63 degrees the probability of having a damaged O-ring is greater. Without seeing the R squared values I’m not sure how good of a fit the model is but temperature alone seems to be enough to justify it.

8.18

Use the model to calculate the probability that an O-ring will become damaged at each of the following ambient temperatures: 51, 53, and 55 degrees Fahrenheit.

temp <- 51
temp2 <- 53
temp3 <- 55

x <- 11.663 - 0.2162*temp
x2 <- 11.663 - 0.2162*temp2
x3 <- 11.663 - 0.2162*temp3

p1 <- exp(x)/(1+exp(x))
p2 <- exp(x2)/(1+exp(x2))
p3 <- exp(x3)/(1+exp(x3))

print(paste0("Probability of damaging an O-ring at 51 degress is ", round(p1 * 100, 2), "%"))
## [1] "Probability of damaging an O-ring at 51 degress is 65.4%"
print(paste0("Probability of damaging an O-ring at 53 degress is ", round(p2 * 100, 2), "%"))
## [1] "Probability of damaging an O-ring at 53 degress is 55.09%"
print(paste0("Probability of damaging an O-ring at 55 degress is ", round(p3 * 100, 2), "%"))
## [1] "Probability of damaging an O-ring at 55 degress is 44.32%"

Add the model-estimated probabilities from part (a) on the plot, then connect these dots using a smooth curve to represent the model-estimated probabilities.

t <- c(51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71)
p <- c(.654, .5509, .4432, .341, .251, .179, .124, .084, .056, .037, .024)

df <- data.frame(t, p)

logi <- function(x) {
  exp(11.663 - 0.2162*x)/(1+exp(11.663 - 0.2162*x))
}

plot(t, p, xlab = "temp", ylab = "Percentage of damage", xlim = c(50,71))
curve(logi, from = t[1], to = t[11], add=TRUE)

Describe any concerns you may have regarding applying logistic regression in this application, and note any assumptions that are required to accept the model’s validity.

We are assuming that each outcome is independent of the other outcomes and new O-rings were used for each launch. The temperature predictor must be linearly related to the logistic model. Another concern is that we don’t have enough data to verify the model is the best fit. It could use more than 11 samples.