Practice: 8.1, 8.3, 8.7, 8.15, 8.17 Graded: 8.2, 8.4, 8.8, 8.16, 8.18

8.1

  1. hatbaby_weight = 123.05 - 8.94 * smoke
  2. Slope is the estimated weight of babies born to mothers that smoke. smokers = 123.05-8.94(1) = 114.11 nonsmokers - 123.05-8.94(0)=123.05
  3. H0 = B1 equals 0 HA = B1 does not equal zero. T = -8.95 P val = approx 0 The p value is very small, we reject H0. We can assume that smoking is associated with lower birth weights

8.2

  1. hatbirth_weight = 120.07 - 1.93 * parity
  2. slope is the estimate of weight of babies born first or otherwise. first = 120.08 - 1.93(0) = 120.08 otherwise = 120.08 - 1.93(1) = 118.15
  3. H0 = b1 does not equal zero HA = B1 equals zero T = -1.93 p val = .1052

P-val is very small, we can conclude that there is not a statiscal significance.

8.3

  1. hatbaby_weight = -80.41 + .44 * gestation - 3.33 * parity - .01 * age + 1.15 * height + .05 * weight - 8.40 * smoke
  2. Bgestation: The model predicts that a .44 oz increase in baby weight for each additional day of pregnancy. Bage: The model predicts that a negative .01 decrease in baby weight for each year older the mother is
  3. Parity might be correlated with one of the other variables in the new model.
gestation <- 284
parity <- 0
age <- 27
height <- 62
weight <- 100
smoke <- 0
btw<- 120

baby_weight = -80.41 + .44 * gestation - 3.33 * parity - .01 * age + 1.15 * height + .05 * weight - 8.40 * smoke
residuals<- baby_weight - btw
residuals
## [1] 0.58

model over predicts baby weight e.

n <- 1236
k <- 6
varresiduals <- 249.28
varbabies <- 332.57
r2 <- 1 - (varresiduals/varbabies)
adjr2 <- 1 - (varresiduals / varbabies) * ( (n-1) / (n-k-1) )
r2
## [1] 0.2504435
adjr2
## [1] 0.2467842

8.4

  1. hatY = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn
  2. beth - The model predicts a 9.11 absent days decrease in non-aoriginal children bsex - the model predicts a 3.10 increase in absent days in males blearner - the model predicts a 2.15 absent day increase in slow learner
eth <- 0
sex <- 1
lrn <- 1
days <- 2

Y = 18.93 - 9.11 * eth + 3.10 * sex + 2.15 * lrn
Y
## [1] 24.18
residuals<- Y - days
residuals
## [1] 22.18
n <- 146
k <- 3
varresidual <- 240.57
varstudent <- 264.17
r2 <- 1 - (varresidual/varstudent)
adjr2 <- 1 - (varresidual / varstudent) * ( (n-1) / (n-k-1) )
r2
## [1] 0.08933641
adjr2
## [1] 0.07009704

8.7

Remove age, it has the highest adjust r2 value

8.8

We should remove the no learned status. It has the highest adjusted r2 value

8.15

  1. total_lenght has some outliers

8.16

  1. there appears to be a correlation between colder temperatures and damager o-rings. At 53 degrees, there is 5 damaged rings, the most of any mission.
  2. If the numerical variable temperature increases, in this case 1 degree, then the likelihood of being damaged decreases by .2162.

  3. logit(pi) = log((pi)/(1-pi)) = 11.6630 - 0.2162 * (temperature)
  4. Since the p-val for the temperature is very close to zero, it is statisically significant

8.17

  1. logit(pi) = log((pi)/(1-pi)) = 33.5095 - 1.4207 * (sex_male) - .2787 * (skull_width) + .5687 * (total_lenght) - 1.87057 * (tail_lenght) Total lenght is the only positively associated variable
sex_male <- 1
skull_width <- 63
tail_length <- 37
total_length <- 83

log = 33.5095 - 1.4207 * sex_male - 0.2787 * skull_width + 0.5687 * total_length - 1.8057 * tail_length
log
## [1] -5.0781
p <- exp(log) / (1 + exp(log))
p
## [1] 0.006193144

The probability is very low, close to .0062.

8.18

  1. lognull <- 11.6630 - 0.2162 * temperature O.ring.failure.prob <- exp(logit)/(1 + exp(logit))
temperature51 <- 51
log51 <- 11.6630 - 0.2162 * temperature51
failprob51 <- exp(log51)/(1 + exp(log51))
failprob51
## [1] 0.6540297
temperature53 <- 53
log53 <- 11.6630 - 0.2162 * temperature53
failprob53 <- exp(log53)/(1 + exp(log53))
failprob53
## [1] 0.5509228
temperature55 <- 55
log55 <- 11.6630 - 0.2162 * temperature55
failprob55 <- exp(log55)/(1 + exp(log55))
failprob55
## [1] 0.4432456
temps <- seq(from = 51, to = 81)
predicted_prob <- exp(11.6630-(0.2162*temps))/(1+exp(11.6630-(0.2162*temps)))
dtemp <- as.data.frame(cbind(temps, predicted_prob))
plot(dtemp$temps, dtemp$predicted_prob)

c. The sample size is very small, only 23. The outcome Y appears to be independent, but I do not believe the conditions are met.