Assignment #8: Multiple and Logistic Regression

## [1] "C:/CUNY/606Statistics/Assignments"
##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 488574 26.1     940480 50.3   750400 40.1
## Vcells 883893  6.8    1650153 12.6  1125908  8.6

Import Libraries

## 
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics 
## This package is designed to support this course. The text book used 
## is OpenIntro Statistics, 3rd Edition. You can read this by typing 
## vignette('os3') or visit www.OpenIntro.org. 
##  
## The getLabs() function will return a list of the labs available. 
##  
## The demo(package='DATA606') will list the demos that are available.

Exercise: 8.2: Baby Weights, Part II

a.  Regression line equaltion is 

\(\widehat{babyweight} = 120.07 - 1.93 \times parity\)

b. Predicted birth weight of first borns is 1.93 (120.07) ounces more than others (120.07-1.93)

c. Since the NULL Hypothesis cannot be rejected as the P-value is 0.1052 and is greater than 0.05. 

Exercise: 8.4 Absenteeism, Part I

a.  Regression Equation: 

\(\widehat{daysabsent} = 18.93 - 9.11 \times eth + 3.10 \times sex + 2.15 \times lrn\)

b. 
    1. Slope of *eth*: Model predicts that non-aboriginal students miss 9.11 less days to school.
    2. Slope of *sex*: Model predicts that for male students there is increase in attendance of 3.10 days.
    3. Slope of *lrn*: Model predicts that Slow learning students miss 2.15 additional days to school.

c. 
    Residual is -22.18

\(\widehat{daysabsent} = 18.93 - (9.11 \times 0) + (3.10 \times 1) + (2.15 \times 1)\)

daysabsent <- 18.93 - (9.11 * 0) + (3.10 * 1) + (2.15 * 1)
daysabsent
## [1] 24.18
residual <- 2 - daysabsent
residual
## [1] -22.18
d.  
variance_e <- 240.57
variance_y <- 264.17
n <- 146
k <- 3

r2 <- 1 - (variance_e / variance_y); 
r2
## [1] 0.08933641
adjR2 <- 1 - ((variance_e / variance_y) * (n-1) / (n-k-1)); 
adjR2
## [1] 0.07009704

Exercise: 8.8 Absenteeism, Part II

a.  The highest adjusted R^2 is 0.0723, so the *Lrn* status variable should be removed first.

Exercise: 8.16 Challenger disaster, Part I

a.  The shuttle mission has more damaged O rings at lower temperature.
temperature <- c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

damaged <- c(5,1,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0)

undamaged <- c(1,5,5,5,6,6,6,6,6,6,5,6,5,6,6,6,6,5,6,6,6,6,6)

shuttleMission <- data.frame(temperature, damaged, undamaged)

summary(shuttleMission)
##   temperature       damaged         undamaged    
##  Min.   :53.00   Min.   :0.0000   Min.   :1.000  
##  1st Qu.:67.00   1st Qu.:0.0000   1st Qu.:5.000  
##  Median :70.00   Median :0.0000   Median :6.000  
##  Mean   :69.57   Mean   :0.4783   Mean   :5.522  
##  3rd Qu.:75.00   3rd Qu.:1.0000   3rd Qu.:6.000  
##  Max.   :81.00   Max.   :5.0000   Max.   :6.000
plot(shuttleMission)

b. Intercept is 11.6630 is the propability of O ring damage at zero degrees. For every 1 degree drop in temprature there is 0.2162 o rings damaged.

c. 

\(log_e({\frac{p}{1-p}}) = 116630 - 0.2162 \times temperature\)

d.  Since the P-value is low, it is an important concern on damage to O rings, which are critical components of the shuttle.

Exercise: 8.18 : Challenger disaster, Part II

a.  

if \(\hat{p}\) is probobility of O ring would be damaged \(log_e({\frac{\hat{p}}{1-\hat{p}}}) = 116630 - 0.2162 \times temperature\)

#phat <- exp(11.6630 - 0.2162 * temp) / (1 + exp(11.6630 - 0.2162 * temp))

phat51 <- exp(11.6630 - 0.2162 * 51) / (1 + exp(11.6630 - 0.2162 * 51))

phat53 <- exp(11.6630 - 0.2162 * 53) / (1 + exp(11.6630 - 0.2162 * 53))

phat55 <- exp(11.6630 - 0.2162 * 55) / (1 + exp(11.6630 - 0.2162 * 55))

phat51
## [1] 0.6540297
phat53
## [1] 0.5509228
phat55
## [1] 0.4432456

\(\widehat{p_{51}}\) = 0.6540297

\(\widehat{p_{53}}\) = 0.5509228

\(\widehat{p_{55}}\) = 0.4432456

b. 
library(ggplot2)
ggplot(shuttleMission,aes(x=temperature,y=damaged)) + geom_point() + stat_smooth(method = 'glm', family = 'binomial')
## Warning: Ignoring unknown parameters: family

c. 
    For logistic regression, the predictor is linearly related and theoutcome is independent of other data.