if (Sys.info()["sysname"] == "Windows") {
setwd("~/Masters/DATA606/Week8/Homework")
} else {
setwd("~/Documents/Masters/DATA606/Week8/Homework")
}
require(ggplot2)
## Loading required package: ggplot2
Answer:
\[\widehat { birth\quad weight } \quad =\quad 123.05\quad +\quad parity\quad *\quad -1.93\]
Answer:
a non-first born child is expected to be 1.93 ounces less than the first born child. The expected birthweight of first born children is 123.05 and the expected birthweight of non-firstborn children is 121.12.
Answer:
\[{ H }_{ O }:\quad { B }_{ 1 }\quad =\quad 0\\ { H }_{ A }:\quad { B }_{ 1 }\quad \neq \quad 0\]
Since a significance level has not been provided, we will use \(\alpha = 0.05\). The p-value is 0.1052 is greater than the significance level; therefore, we fail to reject the null hypothesis. There is not sufficient evidence to reject the hypothesis that there is no association between parity and birthweights.
Answer:
\[\widehat { days\quad absent } \quad =\quad 18.93\quad +\quad eth\quad *\quad -9.11\quad +\quad sex\quad *\quad 3.10\quad +\quad lrn\quad *\quad 2.15\]
Answer:
ethnic background: The model predicts a non-aboriginal chld to be absent 9.11 days less than an aboriginal child, all else held constant
sex: The model predicts a male child to be absent 3.1 more days than a female child, all else held constant
learner status: The model predicts a slow learner to be absent 2.15 more days than an average learner, all else held constant.
Answer:
y_predict <- 18.93 + 0 * -9.11 + 1 * 3.1 + 1 * 2.15
y_observed <- 2
y_observed - y_predict
## [1] -22.18
Answer:
r-squared value:
var_resid <- 240.57
var_outcome <- 264.17
1 - var_resid/var_outcome
## [1] 0.08933641
r-squared adj
n_val <- 146
k_val <- 3
1 - (var_resid/var_outcome) * ((n_val - 1)/(n_val - k_val - 1))
## [1] 0.07009704
Answer:
The learner status should be removed since it results in a better adjusted R-squared factor.
Answer:
First, I will take the data that is available from openintro.org
ch_data <- read.table("orings.txt", header = TRUE)
names(ch_data) <- c("temperature", "damaged")
ch_data$mission <- c(rep(1:nrow(ch_data)))
ch_data$undamaged <- 7 - ch_data$damaged
Plots for temperature vs orings:
ggplot(ch_data, aes(y = damaged, x = temperature)) + geom_point()
It appears that lower temperatures may correlate to a higher probability of o-ring failure.
Answer:
The predictor variable has a p-value less than 0.05; therefore, it appears it is a significant predictor of the data. The log odds ratio of the predictor is:
(ci_logoddratio_lower <- -0.2162 - 1.96 * 0.0532)
## [1] -0.320472
(ci_logoddratio_higher <- -0.2162 + 1.96 * 0.0532)
## [1] -0.111928
The odds raito is:
(ci_oddratio_lower <- exp(ci_logoddratio_lower))
## [1] 0.7258064
(ci_oddraito_higher <- exp(ci_logoddratio_higher))
## [1] 0.8941086
For every unit change in temperature, the log odds of failure decreases by 0.2162. At a temperature of zero, the log odds of failure is 11.6630.
Answer:
\[log\left( \frac { \widehat { p } }{ 1-\widehat { p } } \right) \quad =\quad 11.6630\quad +\quad Temperature\quad *\quad -0.2162\]
Answer:
(p53 <- exp(11.663 + 53 * -0.2162)/(1 + exp(11.663 + 53 * -0.2162)))
## [1] 0.5509228
(p81 <- exp(11.663 + 81 * -0.2162)/(1 + exp(11.663 + 81 * -0.2162)))
## [1] 0.002873921
Yes, the p-values show that temperature is a significant predictor of o-ring failure. Additionally, we can see that the predicted failure at the minimum temperature of the data set is 0.55 and the predicted failure at maximum temperature of the data set is 0 which shows a practically significant difference.
Answer:
The probability that there will be an o-ring failure is summarized for the following temperatures:
# 51F:
(p51 <- exp(11.663 + 51 * -0.2162)/(1 + exp(11.663 + 51 * -0.2162)))
## [1] 0.6540297
# 53F
(p53 <- exp(11.663 + 53 * -0.2162)/(1 + exp(11.663 + 53 * -0.2162)))
## [1] 0.5509228
# 55F
(p55 <- exp(11.663 + 55 * -0.2162)/(1 + exp(11.663 + 55 * -0.2162)))
## [1] 0.4432456
# 57F
(p57 <- exp(11.663 + 57 * -0.2162)/(1 + exp(11.663 + 57 * -0.2162)))
## [1] 0.3406498
# 59F
(p59 <- exp(11.663 + 59 * -0.2162)/(1 + exp(11.663 + 59 * -0.2162)))
## [1] 0.2510914
# 61F
(p61 <- exp(11.663 + 61 * -0.2162)/(1 + exp(11.663 + 61 * -0.2162)))
## [1] 0.1786971
# 63F
(p63 <- exp(11.663 + 63 * -0.2162)/(1 + exp(11.663 + 63 * -0.2162)))
## [1] 0.123727
# 65F
(p65 <- exp(11.663 + 65 * -0.2162)/(1 + exp(11.663 + 65 * -0.2162)))
## [1] 0.08393843
# 67F
(p67 <- exp(11.663 + 67 * -0.2162)/(1 + exp(11.663 + 67 * -0.2162)))
## [1] 0.05612566
# 69F
(p69 <- exp(11.663 + 69 * -0.2162)/(1 + exp(11.663 + 69 * -0.2162)))
## [1] 0.03715479
# 71F
(p71 <- exp(11.663 + 71 * -0.2162)/(1 + exp(11.663 + 71 * -0.2162)))
## [1] 0.02443024
Answer:
ch_fit <- function(temp) {
exp(11.663 + temp * -0.2162)/(1 + exp(11.663 + temp * -0.2162))
}
ch_df <- data.frame(c(51, 53, 55, 57, 59, 61, 63, 65, 67, 69,
71))
names(ch_df) <- c("temp")
ch_df$predicteddamage <- ch_fit(ch_df$temp)
names(ch_df) <- c("temp", "predicteddamage")
ggplot(ch_df, aes(y = predicteddamage, x = temp)) + geom_point() +
stat_smooth(method = "glm", method.args = list(family = "binomial"),
se = FALSE)
## Warning: non-integer #successes in a binomial glm!
Answer:
My main concern is that each outcome is not independent of other outcomes (assumption 2 below) since there ar so few launches that it is likely there are upgrades and/or modifications most of the launch equipment; therefore, it is not guaranteed that ther was not some other fators, dependent on time series, that affected the results. Additionally, we should check that the predictor is linearly related to the logit(pi) values.
The assumptions that must be met are the following:
Each predictor xi is linearly related to logit(pi) if all other predictors are held constant.
Each outcome Yi is independent of the other outcomes.