Chapter 8 - Multiple and Logistic Regression
Practice: 8.1, 8.3, 8.7, 8.15, 8.17
Graded: 8.2, 8.4, 8.8, 8.16, 8.18
Presentation 8.1
\[\widehat{body\_weight} = 123.05 -8.94 * smoke\]
\[\widehat{body\_weight\_smoker} = 123.05 -8.94 * 1 = 114.11\]
\[\widehat{body\_weight\_non\_smoker} = 123.05 -8.94 * 0 = 123.05\]
smoke
is zero.We can do hypothesis testing to see whether there is differences in the variable.
\[H_0: \beta_1 = 0\] For the null hypothesis, the difference of estimated body weight of babies born to smoker and non-smoker mothers is zero.
\[H_a: \beta_1 \neq 0\]
And for the alternative hypothesis, that there is the difference of estimated body weight of babies born to smoker and non-smoker.
The p-value corresponds to the two-sided test and as the p-value is very small, we reject the null hypothesis that there is no differences in bodyweight of smoker/non-smoker mothers . We can state that this data provides strong evidence that the slope is not zero, and that there is a statistically significant association between smoking and birth weights. The body weight of babies and whether the mother is smoker or not is negatively correlated.
\[\widehat{body\_weight} = 120.07 - 1.93 * parity\]
The estimated body weight of babies who wasn’t the first born is 1.93 ounces lower than for first-born babies.
The t-value and P value is -.162 and 0.1052. As the p-value is > 0.05, we can conclude there is no statistical significance relationship between the average birth weight and parity.
\[\widehat{absent\_days} = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn\]
eth
slope predicts 9.11 absent days decrease in non-aboriginal children.
sex
slope predits 3.10 absent days increase in male.
lrn
slope predicts 2.15 absent days increase in slow learners.
\[\widehat{absent\_days} = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn\]
absent <- 18.93 - 9.11 * (0) + 3.10 * (1) + 2.15 * (1)
residual <- 2 - absent
residual
## [1] -22.18
# R-squared = 1 - (variance of residuals)/(variance in outcome)
# R-squared adjusted. 1 - (variance of residuals)/(variance in outcome)*(n-1)/(n-k-1), where k is predictor in variables in model.
r2 <- 1 - (240.57/264.17)
r2.adjusted <- 1 - (240.57/264.17)*((146-1)/(146-3-1))
r2
## [1] 0.08933641
r2.adjusted
## [1] 0.07009704
We should remove the learner status to improve adjusted r squred from 0.070097 to 0.0723.
It appears that colder temp had more damaged O-rings than average | higher temp. This is noticeable around the 50 degree range, particularly at 53 degrees. The colder the ambient temperature, the more likely the O-rings were going to be damaged. Anything above 57 degrees is safe.
The slope of of temperature is negative and statistically significant as p-value is shown zero. This means that as temperature increases by 1, the number of estimated damages of O-rings drops by .2162. This is simplification and not interpretable in this context as we don’t have observations of temp 0 degrees.
\[\widehat{o\_ring\_failure} = 11.6630 - 0.2162 * temp\]
p <- c(51, 53, 55)
logit <- 11.6630 - 0.2162 * p
answer <- exp(logit) / (1 + exp(logit))
51, 53, 55 repectively 0.6540297, 0.5509228, 0.4432456
library(ggplot2)
data <- data.frame(prob=c(0.341, 0.251, 0.179, 0.124, 0.084, 0.056, 0.037, 0.024, 0.654, 0.551, 0.443),
temp=c(57, 59, 61, 63, 65, 67, 69, 71, 51, 53, 55))
ggplot(data, aes(temp, prob)) +
geom_point() +
geom_smooth(se = FALSE, method = 'loess')
Both conditions are difficult to verify.
To conclude, it is uncertain whether the logistic regression can be used with the given information.