8.2, 8.4, 8.8, 8.16, 8.18
Baby weights, Part II. Exercise 8.1 introduces a data set on birth weight of babies. Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise. The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, from parity.
| _ | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| (Intercept) | 120.07 | 0.60 | 199.94 | 0.0000 |
| parity | -1.93 | 1.19 | -1.62 | 0.1052 |
Answer : Equation is y = b0 + b1x Weight = 120.07 - 1.93 * Parity
Answer Slope of the regression line in this context says that, for every child who is not a first born will predicted to be have 1.93 less than birth weight of the first child. So the firstborn weighs 120.07 ounces and every other child afterwords would be 120 - 1.93 * 1 = 118.07
Answer
The ‘p value’ of the parity is 0.1052 which is greater than 0.05. So there is no statistically significant relationship between avarage birth weight and parity.
8.4 Absenteeism. Researchers interested in the relationship between absenteeism from school and certain demographic characteristics of children collected data from 146 randomly sampled students in rural New SouthWales, Australia, in a particular school year. Below are three observations from this data set.
| _ | eth | sex | lrn | days |
|---|---|---|---|---|
| 1 | 0 | 1 | 1 | 2 |
| 2 | 0 | 1 | 1 | 11 |
| . | . | . | . | |
| . | . | . | . | |
| . | . | . | . | |
| . | . | . | . | |
| . | . | . | . | |
| 146 | 1 | 0 | 0 | 37 |
The summary table below shows the results of a linear regression model for predicting the average number of days absent based on ethnic background (eth: 0 - aboriginal, 1 - not aboriginal), sex (sex: 0 - female, 1 - male), and learner status (lrn: 0 - average learner, 1 - slow learner).
| _ | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|
| (Intercept) | 18.93 | 2.57 | 7.37 | 0.0000 |
| eth | -9.11 | 2.60 | -3.51 | 0.0000 |
| sex | 3.10 | 2.64 | 1.18 | 0.2411 |
| lrn | 2.15 | 2.65 | 0.81 | 0.4177 |
Answer : Equation is y = b0 + b1x eth + b2 * sex + b3 * lrn = 18.93 + (- 9.11 * (ethnic background) ) + 3.10 * (sex) + 2.15 * (learner status)
Sex: The model predicts a 3.10 absent days increase in males over females.
Learner status: The model predicts a 2.15 absent days increase in slow learners over average learners, Slow learners miss 2.15 more days
ANswer
Absent <- 18.93 - 9.11 * (0) + 3.10 * (1) + 2.15 * (1)
Residual <- 2 - Absent
paste("Residual for this student: ", Residual)
## [1] "Residual for this student: -22.18"
# R-squared = 1 - (variance of residuals)/(variance in outcome)
# R-squared adjusted. 1 - (variance of residuals)/(variance in outcome)*(n-1)/(n-k-1), where k is predictor in variables in model.
R2.Ab <- 1 - (240.57)/(264.17)
R2.Ab.adj <- 1 - (240.57/264.17)*((146-1)/(146-3-1))
paste("R-squared: ", round(R2.Ab,4)) ;paste("R-squared adjusted: ", round(R2.Ab.adj,4))
## [1] "R-squared: 0.0893"
## [1] "R-squared adjusted: 0.0701"
8.8 Absenteeism, Part II. Exercise 8.4 considers a model that predicts the number of days absent using three predictors: ethnic background (eth), gender (sex), and learner status (lrn). The table below shows the adjusted R-squared for the model as well as adjusted R-squared values for all models we evaluate in the first step of the backwards elimination process.
| _ | Model | Adjusted R2 |
|---|---|---|
| 1 | Fullmodel | 0.0701 |
| 2 | Noethnicity | -0.0033 |
| 3 | Nosex | 0.0676 |
| 4 | No learner status | 0.0723 |
Which, if any, variable should be removed from the model first?
Answer Since adjusted R2 improves when learner status is removed, learner status should be removed first.
8.16 Challenger disaster, Part I. On January 28, 1986, a routine launch was anticipated for the Challenger space shuttle. Seventy-three seconds into the flight, disaster happened: the shuttle broke apart, killing all seven crew members on board. An investigation into the cause of the disaster focused on a critical seal called an O-ring, and it is believed that damage to these O-rings during a shuttle launch may be related to the ambient temperature during the launch. The table below summarizes observational data on O-rings for 23 shuttle missions, where the mission order is based on the temperature at the time of the launch. Temp gives the temperature in Fahrenheit, Damaged represents the number of damaged O-rings, and Undamaged represents the number of O-rings that were not damaged.
Shuttle Mission 1 2 3 4 5 6 7 8 9 10 11 12 Temperature 53 57 58 63 66 67 67 67 68 69 70 70 Damaged 5 1 1 1 0 0 0 0 0 0 1 0 Undamaged 1 5 5 5 6 6 6 6 6 6 5 6
Shuttle Mission 13 14 15 16 17 18 19 20 21 22 23 Temperature 70 70 72 73 75 75 76 76 78 79 81 Damaged 1 0 0 0 0 1 0 0 0 0 0 Undamaged 5 6 6 6 6 5 6 6 6 6 6
Answer There are total 11 damaged o-ring, 8 damaged o-rings at temperature ???63oF. There are 3 damaged o-rings above that temperature. It does seem that low temperatures contribute to o-ring damage.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)
undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)
data <- data.frame(temperature = temperature, damaged = damaged, undamaged = undamaged)
ggplot(data,aes(x=temperature,y=damaged)) + geom_point()
Estimate Std. Error z value Pr(>|z|) (Intercept) 11.6630 3.2963 3.54 0.0004 Temperature -0.2162 0.0532 -4.07 0.0000
Answer Given that this is a logistic regression, it behaves fairly similarly to a multiple regression. Therefore, if the numerical variable temperature increases, in this case 1 degree, then the likelihood of being damaged decreases by .2162. The final result will need to be transformed to make sense of the information.
The slope mean that for every 10F above zero, the probability of damaged o-rings decreases by 0.2162 in the exponential term.
Answer ln(phat/(1-phat)) = 11.6630 - 0.2162*T
Where p^ is the probability of damaged o-rings and T is temperature (F).
8.18 Challenger disaster, Part II. Exercise 8.16 introduced us to O-rings that were identified as a plausible explanation for the breakup of the Challenger space shuttle 73 seconds into takeo??? in 1986. The investigation found that the ambient temperature at the time of the shuttle launch was closely related to the damage of O-rings, which are a critical component of the shuttle. See this earlier exercise if you would like to browse the original data.
log(phat/(1 ??? phat)) = 11.6630 ??? 0.2162 * Temperature
where ^p is the model-estimated probability that an O-ring will become damaged. Use the model to calculate the probability that an O-ring will become damaged at each of the following ambient temperatures: 51, 53, and 55 degrees Fahrenheit. The model-estimated probabilities for several additional ambient temperatures are provided below, where subscripts indicate the temperature: ^p57 = 0.341 ^p59 = 0.251 ^p61 = 0.179 ^p63 = 0.124 ^p65 = 0.084 ^p67 = 0.056 ^p69 = 0.037 ^p71 = 0.024
P_hat51 = exp(11.663-0.2162*51)/(1+exp(11.663-0.2162*51))
P_hat51
## [1] 0.6540297
P_hat53 = exp(11.663-0.2162*53)/(1+exp(11.663-0.2162*53))
P_hat53
## [1] 0.5509228
P_hat55 = exp(11.663-0.2162*55)/(1+exp(11.663-0.2162*55))
P_hat55
## [1] 0.4432456
Answer
T_F1 <- c(51,53,55,57,59,61,63,65,67,69,71)
P_model <- c(0.654,0.550,0.443,0.341,0.251,0.179,0.124,0.084,0.056,0.037,0.024)
P_meas <- c((5/6), (1/6), (1/6), (1/6), (0/6),(0/6),(0/6),(0/6),(0/6),(0/6),(1/6),(0/6),(1/6),(0/6),(0/6),(0/6),(0/6),(1/6),(0/6),(0/6),(0/6),(0/6),(0/6))
length(P_meas)
## [1] 23
T_F2 <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
length(T_F2)
## [1] 23
logistic_df <- data.frame(Temp = T_F1, P= P_model)
head(logistic_df)
## Temp P
## 1 51 0.654
## 2 53 0.550
## 3 55 0.443
## 4 57 0.341
## 5 59 0.251
## 6 61 0.179
meas_df <- data.frame(Temp = T_F2, P=P_meas)
head(meas_df)
## Temp P
## 1 53 0.8333333
## 2 57 0.1666667
## 3 58 0.1666667
## 4 63 0.1666667
## 5 66 0.0000000
## 6 67 0.0000000
suppressMessages(suppressWarnings(library(ggplot2)))
ggplot(NULL, aes(x=Temp,y=P)) + geom_line(data = logistic_df, colour = 'red')+geom_point(data=meas_df, colour='lightblue')
Answer Conditon must meet for Logistic Regression: a. Each Predictor Xi must be linearly related to logit(pi), if all other predictor held constant b. each out come of Yi is indipendent .
The sample size 23 is fairly small to varify the 1st condition. Regarding the second condition, AS the shuttlemission is very complicated, the reason of failure could be because of some other cause. More investigation is required. Its assumed that the events are independent.