(a) Write the equation of the regression line.
\(y=120.07 - 1.93 x_{parity}\)
(b) Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.
(c) Is there a statistically significant relationship between the average birth weight and parity?.
(a) Write the equation of the regression line.
\(y = 18.93 - 9.11 x_{eth} + 3.10 x_{sex} + 2.15 x_{lrn}]\)
(b) Interpret each one of the slopes in this context.
The slope of eth indicates that, all else being equal, there is a 9.11 day reduction in the predicted absenteeism when the subject is no aboriginal.
The slope of sex indicates that, all else being equal, there is a 3.10 day increase in the predicted absenteeism when the subject is male.
The slope of lrn indicates that, all else being equal, there is a 2.15 day increase in the predicted absenteeism when the subject is a slow learner.
(c) Calculate the residual for the first observation in the data set: a student who is aboriginal, male, a slow learner, and missed 2 days of school.
eth <- 0
sex <- 1
lrn <- 1
predictdays <- 18.93 - 9.11*eth + 3.1*sex + 2.15*lrn
days <- 2
resid <- days - predictdays
resid
## [1] -22.18
The variance of the residuals is 240.57, and the variance of the number of absent days for all students in the data set is 264.17. Calculate the R2 and the adjusted R2. Note that there are 146 observations in the data set.
varresid <- 240.57
varabs <- 264.17
n <- 146
k <- 3
R2 <- 1-(varresid/varabs)
R2a <- 1 - ((varresid/varabs)*((n-1)/(n-k-1)))
R2
## [1] 0.08933641
R2a
## [1] 0.07009704
Each column of theh table above represents a different shutttle mission. Examine these data and describe what you observe with respect to the relationship between temperature and damaged O-rings
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,
78,79,81)
damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)
undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)
data <- data.frame(temperature = temperature, damaged = damaged,
undamaged = undamaged)
ggplot(data,aes(x=temperature,y=damaged)) + geom_point()
(b) Failures have been coded as 1 for a damaged O-ring and 0 for an undamaged O-ring, and a logistic regression model was fit to these data. A summary of this model is given below. Describe the key components of this summary table in words.
(c) Write out the logistic model using the point estimates of the model parameters.
\(\log_e(\frac{p_i}{1-p_i}) = 11.6630 - 0.2162 x_{temp}\)
(d) Based on the model, do you think concerns regarding O-rings are justified? Explain.
oringModel <- function(temp)
{
right <- 11.6630 - 0.2162 * temp
prob <- exp(right) / (1 + exp(right))
return (prob)
}
temps <- seq(32, 85)
dfProbDamage <- data.frame(Temperature=temps, ProbDamage=oringModel(temps))
g1 <- ggplot(dfProbDamage) + geom_line(aes(x=Temperature, y=ProbDamage ))
g1
temps <- c(51,53,55)
dfProbDamage <- data.frame(Temperature=temps, ProbDamage=oringModel(temps))
dfProbDamage
## Temperature ProbDamage
## 1 51 0.6540297
## 2 53 0.5509228
## 3 55 0.4432456
dfRaw <- data.frame(Missing=seq(1, 23),
Temp=c(53,57,58,63,66,67,67,67,68,69,70,70,70,
70,72,73,75,75,76,76,78,79,81),
Damaged=c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0),
Undamaged=c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6))
dfRaw$ProbDamage <- dfRaw$Damaged / (dfRaw$Damaged + dfRaw$Undamaged)
head(dfRaw)
## Missing Temp Damaged Undamaged ProbDamage
## 1 1 53 5 1 0.8333333
## 2 2 57 1 5 0.1666667
## 3 3 58 1 5 0.1666667
## 4 4 63 1 5 0.1666667
## 5 5 66 0 6 0.0000000
## 6 6 67 0 6 0.0000000
temps <- seq(51, 71, by=2)
dfProbDamage <- data.frame(Temperature=temps, ProbDamage=oringModel(temps))
g1 <- ggplot(dfRaw) +
geom_point(aes(x=Temp, y=ProbDamage), alpha=0.5, colour="blue") +
geom_line(data=dfProbDamage, aes(x=Temperature, y=ProbDamage), colour="red") +
geom_point(data=dfProbDamage, aes(x=Temperature, y=ProbDamage), colour="red") +
labs(x="Temperature", y="Probability of damage") +
ylim(0, 1)
g1
(c) Describe any concerns you may have regarding applying logistic regression in this application, and note any assumptions that are required to accept the model’s validity.