DATA 606 Chapter 8 Assignment

Chapter 8 - Multiple and Logistic Regression

Graded: 8.2, 8.4, 8.8, 8.16, 8.18

8.2

  1. y = 120.07 + -1.93x
  2. The slope in this context means that the baby’s weight will be 120.07 if it’s the first born, and the weight will decrease by 1.93 ounces if it is not the first born (c)No, the p-value is greater than .05. Thus, we cannot reject the null hypotheses that birth order does not influence birth weight.

8.4

  1. \[ \hat{y} = 18.93 - 9.11 \times eth + 3.10 \times sex + 2.15 \times lrn \] (b)For ethnicity, absenteeism descreases if the student is not aboriginal, but increases if the student is aboriginal. For sex, absenteeism increases if the student is male and decreases if it’s female student. For learner, absenteeism increases if the student is a slow learner, and decreases if otherwise.
eth = 0
sex= 1
lrn = 1

absenteeism = 18.93 + (-9.11*eth) + (3.10*sex) + (lrn*2.15)
actual_absence = 2
residual_absence = actual_absence - absenteeism
residual_absence
## [1] -22.18
var_residual  = 240.57
var_birth_Weights = 264.17
n = 146
k = 3
R2 = 1 - (var_residual/var_birth_Weights)
Adj_R2 = 1 - ((var_residual * (n-1)) / (var_birth_Weights * (n-k-1)))

R2
## [1] 0.08933641
Adj_R2
## [1] 0.07009704

8.8

Backward elimination begins with the largest model and eliminates variables one by- one until we are satisfied that all remaining variables are important to the model. In this context, it would be leaner status that would be removed. This is the variable that contributes the least to the model.

8.16

temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)

undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)

ShuttleMission <- data.frame(temperature, damaged, undamaged)

plot(ShuttleMission)

  1. Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and a logistic regression model was fit to these data. A summary of this model is given below. Describe the key components of this summary table in words.

\[ log_e(\frac{p_i}{1 - p_i}) = 11.6630 - 0.2162 \times Temperature \]

  1. Yes concerns regarding O-rins are justified. The model does explain that Temperature has an affect on damaged O-rings. Damaged O-rings was seen to be the cause of the disaster.

8.18

  1. Solve this formula \[ log_e(\frac{p_i}{1 - p_i}) = 11.6630 - 0.2162 \times Temperature \] in terms of p to get: \[\hat{p} = \frac{e^{11.6630 - 0.2162 \times Temperature}}{1 + e^{11.6630 - 0.2162 \times Temperature}}\]
temperatures = c(51,53,55)
probabilities = exp(11.6630-0.2162*temperatures)/(1+exp(11.6630-0.2162*temperatures))
probabilities
## [1] 0.6540297 0.5509228 0.4432456
#checking the results

round(log((probabilities) / (1-probabilities)),2) == round((11.6630 - (0.2162*temperatures)),2)
## [1] TRUE TRUE TRUE
temperatures2 <- c(temperatures, 57,59,61,63,65,67,69,71)
probabilities2 <- c(probabilities, 0.341,0.251,0.179, 0.124,0.084,0.056,0.037,0.024)

Shuttle_Data <-as.data.frame(cbind(temperatures2, probabilities2))
Shuttle_Data 
##    temperatures2 probabilities2
## 1             51      0.6540297
## 2             53      0.5509228
## 3             55      0.4432456
## 4             57      0.3410000
## 5             59      0.2510000
## 6             61      0.1790000
## 7             63      0.1240000
## 8             65      0.0840000
## 9             67      0.0560000
## 10            69      0.0370000
## 11            71      0.0240000
library(ggplot2)
ggplot(Shuttle_Data, aes(x=temperatures2,y=probabilities2)) + geom_point()  +
    stat_smooth(method = 'glm')

(c)Logistic regression conditions There are two key conditions for fitting a logistic regression model:

Each predictor xi is linearly related to logit(pi) if all other predictors are held constant.

Each outcome Yi is independent of the other outcomes.

For this model, we don’t have enough data to satisfy the first condition. We only have 11 observations which is not enough to apply this model on other data.

We can assume independence of the observations so the second condition has been met.

Corey Arnouts