8.2 Baby weights, Part II.

a.Write the equation of the regression line.

\[y=120.07 - 1.93 x_{parity}\]

b.Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.

b0 <- 120.07
b1 <- -1.93
first <- b0 + b1*0
nonfirst <- b0 + b1*1
first
## [1] 120.07
The slope is -1.93. It means that the first born is 1.93 ounces higher than other babies who are not first born. 
The first born is 118.14 oz. 

c.Is there a statistically significant relationship between the average birth weight and parity?

The P-value is larger than 0.05, so we can't reject the null hypothesis.
There is no strong evidence that there is an association between birth weight and parity. 

8.4 Absenteeism, Part I.

a.Write the equation of the regression line.

\[y = 18.93 - 9.11 x_{eth} + 3.10 x_{sex} + 2.15 x_{lrn}\]

b.Interpret each one of the slopes in this context.

The slope of eth means that all other variablies remain the same, when eth increase by 1, absenteeism will decrease by 9.11 days.

The slope of sex means that all other variablies remain the same, when the subject is male, absenteeism will increase by 3.10 days.

The slope of lrn means that all other variablies remain the same, when the subject is a slow learner, absenteeism will increase by 2.15 days.

c.Calculate the residual for the first observation in the data set: a student who is aboriginal,male, a slow learner, and missed 2 days of school.

eth <- 0
sex <- 1
lrn <- 1
actualDayMissed <- 2
absDaysPred <- 18.93 - 9.11 * eth + 3.1 * sex + 2.15 * lrn
absDaysPred
## [1] 24.18
residual <- actualDayMissed - absDaysPred
residual
## [1] -22.18

The residual is -22.18 days.

d.The variance of the residuals is 240.57, and the variance of the number of absent days for all students in the data set is 264.17. Calculate the R2 and the adjusted R2. Note that there are 146 observations in the data set.

R2 = 1 - (240.57/264.17)
R2
## [1] 0.08933641
R2_adj = 1 - (240.57/264.17) * (146 - 1)/(146-3-1)
R2_adj
## [1] 0.07009704

8.8 Absenteeism, Part II.

"No learner status" should be removed because it has the greatest adjusted R2 value in the model. 

8.16 Challenger disaster, Part I.

a.Each column of the table above represents a di???erent shuttle mission. Examine these data and describe what you observe with respect to the relationship between temperatures and damaged O-rings.

When the temperature is less than 66, there will be more damaged O-rings.

b.Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and a logistic regression model was fit to these data. A summary of this model is given below. Describe the key components of this summary table in words.

The key components of the summary are the intercept and the temperature. Z-value and P-value help to distinguish the significant levels.

c.Write out the logistic model using the point estimates of the model parameters.

\[\log_e(\frac{p_i}{1-p_i}) = 11.6630 - 0.2162 x_{temp}\]

d.Based on the model, do you think concerns regarding O-rings are justified? Explain.

Since the p-value is very close to 0, it indicates that the O-ring failure correlates to temperature.

8.18 Challenger disaster, Part II.

a.

temp <- 51
p <- exp(11.6630-(0.2162 * temp)) / (1+exp(11.6630- (0.2162 * temp)))
p
## [1] 0.6540297
temp1<- 53
p2 <- exp(11.6630-(0.2162 * temp1)) / (1+exp(11.6630- (0.2162 * temp1)))
p2
## [1] 0.5509228
temp2 <- 55
p3 <-exp(11.6630-(0.2162 * temp2)) / (1+exp(11.6630- (0.2162 * temp2)))
p3
## [1] 0.4432456

b.Add the model-estimated probabilities from part (a) on the plot, then connect these dots using a smooth curve to represent the model-estimated probabilities.

library(ggplot2)
temps <- seq(51, 71, by=2)
probs <- c(p, p2, p3, 0.341, 0.251, 0.179, 0.124, 0.084, 0.056, 0.037, 0.024)
df_prob <- data.frame(Temperature = temps, Prob_Damage = probs)
g1 <- ggplot(df_prob) + geom_line(aes(x=df_prob$Temperature, y=df_prob$Prob_Damage))
g1

c.Describe any concerns you may have regarding applying logistic regression in this application, and note any assumptions that are required to accept the model’s validity.

We need more sufficient data. 
And for the logistic model: x is linearly related to 

\[logic((p_i))\]

It looks like that they seem to have a linear relationship to the probability of damage. y is independent of the other outcomes.