Graded: 8.2, 8.4, 8.8, 8.16, 8.18
Exercise 8.1 introduces a data set on birth weight of babies.Another variable we consider is parity, which is 0 if the child is the first born, and 1 otherwise.The summary table below shows the results of a linear regression model for predicting the average birth weight of babies, measured in ounces, from parity.
Write the equation of the regression line.
avg_birth_weight <- 120.07 - 1.93 * parity
Interpret the slope in this context, and calculate the predicted birth weight of first borns and others.
Every child who is not a first born will predicted to be have 1.93 less than birth weight of the first child
First born: parity = 0; the baby’s weight will be 120.07 ounces.
Others: parity = 1; the baby’s weight will be 118.14 ounces.
Researchers interested in the relationship between absenteeism from school and certain demographic characteristics of children collected data from 146 randomly sampled students in rural New SouthWales, Australia, in a particular school year. Below are three observations from this data set.
The summary table below shows the results of a linear regression model for predicting the average number of days absent based on ethnic background (eth: 0 - aboriginal, 1 - not aboriginal), sex (sex: 0 - female, 1 - male), and learner status (lrn: 0 - average learner, 1 - slow learner).
Write the equation of the regression line.
Absenteeism = 18.93 - 9.11 * (ethnic background) + 3.10 * (sex) + 2.15 * (learner status)
Interpret each one of the slopes in this context.
eth <- The slope of ethic indicate that it is 9.11 decrease in predicted absenteeism when children are no aboriginal.
sex <- The slope of sex indicate that it is 3.10 increase in predicated absenteeism when children are male.
lrn <- The slope of lrn indicate that it is 2.15 increas in predicated absenteeism when children are in slow learner.
eth <- 0 # aboriginal
sex <- 1 # male
lrn <- 1 # slow learner
Absenteeism <- 18.93 - 9.11 * eth + 3.1 * sex + 2.15 * lrn
residual <- 2 - Absenteeism
residual
## [1] -22.18
n <- 146 #observations
k <- 3 #predictor variable
var_residual <- 240.57 #variance of the residuals
var_students <- 264.17 #variance for all students
R2 <- 1 - (var_residual / var_students) #R2
adjusted_R2 <- 1 - (var_residual / var_students) * ( (n-1) / (n-k-1) ) #Adjusted R2
adjusted_R2
## [1] 0.07009704
Exercise 8.4 considers a model that predicts the number of days absent using three predictors: ethnic background (eth), gender (sex), and learner status (lrn). The table below shows the adjusted R-squared for the model as well as adjusted R-squared values for all models we evaluate in the first step of the backwards elimination process. Which, if any, variable should be removed from the model first?
The Adjusted R2=0.0723 is better, so the lrn variable should be removed from the model first.
On January 28, 1986, a routine launch was anticipated for the Challenger space shuttle. Seventy-three seconds into the flight, disaster happened: the shuttle broke apart, killing all seven crew members on board. An investigation into the cause of the disaster focused on a critical seal called an O-ring, and it is believed that damage to these O-rings during a shuttle launch may be related to the ambient temperature during the launch. The table below summarizes observational data on O-rings for 23 shuttle missions, where the mission order is based on the temperature at the time of the launch. Temp gives the temperature in Fahrenheit, Damaged represents the number of damaged O-rings, and Undamaged represents the number of O-rings that were not damaged.
temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)
undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)
shuttle_mission <- data.frame(temperature, damaged, undamaged)
plot(shuttle_mission)
It looks like the higher damage O-rings when lower temperatures were recorded.
The key components are the Intercept and the Temperature values. The intercepts is the value of damage o-ring when temperature is 0. And the temperature values is the slope of increasing temperature per each degrees.
loge(p/1-p)=11.6630-0.2162×Temperature
p value for temperature is 0. The estimate is O-ring damages through temperature, So that concerns regarding O-rings are justi???ed.
Exercise 8.16 introduced us to O-rings that were identified as a plausible explanation for the breakup of the Challenger space shuttle 73 seconds into takeoff in 1986. The investigation found that the ambient temperature at the time of the shuttle launch was closely related to the damage of O-rings, which are a critical component of the shuttle. See this earlier exercise if you would like to browse the original data.
temp <- c(51,53,55)
model <- function(temp)
{
damage <- 11.6630 - 0.2162 * temp
p <- exp(damage) / (1 + exp(damage))
return (round(p*100,2))
}
model(temp)
## [1] 65.40 55.09 44.32
library(ggplot2)
temp <- 51:71
model_estimate <- data.frame(temp,model(temp))
ggplot(model_estimate,aes(x = temp, y = model(temp))) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
(c) Describe any concerns you may have regarding applying logistic regression in this application, and note any assumptions that are required to accept the model’s validity.
The logistic regression appears to be a linear, but the data points do not show as they are linear. So the assumption is each point is independent of one another in order to have a valid model.