Exercise 1:

liver <- read.csv("C:/Users/lacyb/Documents/2021 Fall/STA 6443 - DA Algorithms I/liver.csv", header = TRUE)
str(liver)

## 'data.frame':    558 obs. of  10 variables:
##  $ Age         : int  65 62 62 58 72 46 26 29 55 57 ...
##  $ Gender      : chr  "Female" "Male" "Male" "Male" ...
##  $ TB          : num  0.7 10.9 7.3 1 3.9 1.8 0.9 0.9 0.7 0.6 ...
##  $ DB          : num  0.1 5.5 4.1 0.4 2 0.7 0.2 0.3 0.2 0.1 ...
##  $ Alkphos     : int  187 699 490 182 195 208 154 202 290 210 ...
##  $ Alamine     : int  16 64 60 14 27 19 16 14 53 51 ...
##  $ Aspartate   : int  18 100 68 20 59 14 12 11 58 59 ...
##  $ TP          : num  6.8 7.5 7 6.8 7.3 7.6 7 6.7 6.8 5.9 ...
##  $ ALB         : num  3.3 3.2 3.3 3.4 2.4 4.4 3.5 3.6 3.4 2.7 ...
##  $ LiverPatient: int  1 1 1 1 1 1 1 1 1 1 ...

Ex 1 Part A

Female: Fit Logistic Regression Model

liver_Female = liver[which(liver$Gender == "Female"),]
glm.null.F = glm(LiverPatient ~ 1, data = liver_Female, family = "binomial")
glm.full.F = glm(LiverPatient ~ Age + TB + DB + Alkphos + Alamine + Aspartate + TP + ALB, data = liver_Female, family = "binomial")

Stepwise Selection with AIC Criteria

step.1 = step(glm.null.F, scope = list(upper = glm.full.F), direction = "both", test = "Chisq", trace = F)
summary(step.1)

## 
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = "binomial", 
##     data = liver_Female)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8178  -1.2223   0.4402   1.1091   1.2049  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.32480    0.31013  -1.047   0.2950  
## DB           0.94479    0.55808   1.693   0.0905 .
## Aspartate    0.01106    0.00616   1.796   0.0726 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 175.72  on 134  degrees of freedom
## Residual deviance: 154.27  on 132  degrees of freedom
## AIC: 160.27
## 
## Number of Fisher Scoring iterations: 7

Based on Results

DB & Aspartate are sinificant predictors since they have p-values lower than 0.1

Ex 1 Part B

Estimates:

DB has p-value of 0.0905 (below 0.1), so there is a significant relationship between DB and whether a female patient has active liver disease.Aspartate has p-value of 0.726 (below 0.1), so there is a significant relationship between Aspartate and whether a female patient has active liver disease.

Hosmer-Lemeshow Test

hoslem.test(step.1$y, fitted(step.1), g = 10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.1$y, fitted(step.1)
## X-squared = 7.7535, df = 8, p-value = 0.4579

Based on Results

The Hosmer-Lemeshow test gave a p-value of 0.4579 which is above 0.1 Therefore, we do not reject the null and the model is adequate

Residual Plots

resid.d = residuals(step.1, type = "deviance")
resid.p = residuals(step.1, type = "pearson")
std.resid.d = residuals(step.1, type = "deviance")/sqrt(1 - hatvalues(step.1))
std.resid.p = residuals(step.1, type = "pearson")/sqrt(1 - hatvalues(step.1))

par(mfrow = c(1,2))
plot(std.resid.d[step.1$model$LiverPatient==0], col = "blue", 
     ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d[step.1$model$LiverPatient==1], col = "red")

plot(std.resid.p[step.1$model$LiverPatient==0], col = "red", 
     ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p[step.1$model$LiverPatient==1], col = "blue")

Based on Results

Residual plots show a parallel pattern - this means there are similar estimated probabilities. This pattern is due to data feature - not a violation of assumptions. Plot points fall between 0 and 1 for Red and -2 and -1 for Blue. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.

Cook’s Distance - Influence Diagnostics

plot(step.1, which = 4, id.n = 5)

inf.id.1 = which(cooks.distance(step.1)>0.25)
inf.id.1

## named integer(0)

Based on Results

Identified Issue: There are no observations with a Cook’s Distance value greater than 0.25

Ex 1 Part C

Final Model

summary(step.1)

## 
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = "binomial", 
##     data = liver_Female)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8178  -1.2223   0.4402   1.1091   1.2049  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.32480    0.31013  -1.047   0.2950  
## DB           0.94479    0.55808   1.693   0.0905 .
## Aspartate    0.01106    0.00616   1.796   0.0726 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 175.72  on 134  degrees of freedom
## Residual deviance: 154.27  on 132  degrees of freedom
## AIC: 160.27
## 
## Number of Fisher Scoring iterations: 7

Based on Results

Final Model: Log(p/1-p) = -0.32480 + 0.94479 * DB + 0.01106 * Aspartate

Odds Ratio

round(exp(step.1$coefficients), 3)

Based on Results

Odds of an adult female as a liver patient with active liver disease: Increases by exp(0.94479) = 2.572 with a one-unit increase in DB (Aspartate held constant) Same odds: Increase by exp(0.01106) = 1.011 with a one-unit increase in Aspartate (DB held constant) Conclusion: An adult female with high levels of DB and Aspartate is more likely to be a liver patient with active liver disease.

Exercise 2:

Alpha = 0.1

Ex 2 Part A

Male: Fit Logistic Regression Model

liver_Male = liver[which(liver$Gender == "Male"),]
glm.null.M = glm(LiverPatient ~ 1, data = liver_Male, family = "binomial")
glm.full.M = glm(LiverPatient ~ Age + TB + DB + Alkphos + Alamine + Aspartate + TP + ALB, data = liver_Male, family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Stepwise Selection with AIC Criteria

step.2 = step(glm.null.M, scope = list(upper = glm.full.M), direction = "both", test = "Chisq", trace = F)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(step.2)

## 
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = "binomial", 
##     data = liver_Male)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.3405  -0.5170   0.3978   0.8614   1.3756  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.476570   0.481336  -3.068  0.00216 **
## DB           0.512503   0.176066   2.911  0.00360 **
## Alamine      0.016218   0.005239   3.095  0.00197 **
## Age          0.020616   0.008095   2.547  0.01087 * 
## Alkphos      0.001740   0.001058   1.645  0.09992 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 476.28  on 422  degrees of freedom
## Residual deviance: 395.05  on 418  degrees of freedom
## AIC: 405.05
## 
## Number of Fisher Scoring iterations: 7

Based on Results

DB, Alamine, Age, & Alkphos are significant predictors since they have p-values lower than 0.1

Ex 2 Part B

Estimates

All the p-values for DB, Alamine, Age, & Alkphos are below 0.1, so they all have a siginficant relationship with whether a male patient has active liver disease.

Hosmer-Lemeshow Test

hoslem.test(step.2$y, fitted(step.2), g = 10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.2$y, fitted(step.2)
## X-squared = 7.043, df = 8, p-value = 0.532

Based on Results

The above test resulted in a p-value of 0.532 (above 0.1), so we do NOT reject the null.This means the model is adequate.

Residual Plots

resid.d.2 = residuals(step.2, type = "deviance")
resid.p.2 = residuals(step.2, type = "pearson")
std.resid.d.2 = residuals(step.2, type = "deviance")/sqrt(1 - hatvalues(step.2))
std.resid.p.2 = residuals(step.2, type = "pearson")/sqrt(1 - hatvalues(step.2))

par(mfrow = c(1,2))
plot(std.resid.d.2[step.2$model$LiverPatient==0], col = "blue",
     ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.2[step.2$model$LiverPatient==1], col = "red")

plot(std.resid.p.2[step.2$model$LiverPatient==0], col = "blue",
     ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.d.2[step.2$model$LiverPatient==1], col = "red")

Based on Results

Residual plots show a parallel pattern - this means there are similar estimated probabilities. This pattern is due to data feature - not a violation of assumptions. Plot points fall between 0 and 1 for Red and -2 and -1 for Blue. There are a set of Blue points outside the range. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.

Cook’s Distance - Influence Diagnostics

plot(step.2, which = 4, id.n = 5)

inf.id.2 = which(cooks.distance(step.2)>0.25)
inf.id.2

## 111 
##  86

Based on Results

Identified Issue: Observations 111 and 86 have a value larger than 0.25

Refitted Model (Excluding Obs 111 & 86)

glm.liver.final.2 = glm(LiverPatient ~ DB + Alamine + Age + Alkphos, data = liver_Male[-inf.id.2, ], family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Ex 2 Part C

Final Model

summary(glm.liver.final.2)

## 
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = "binomial", 
##     data = liver_Male[-inf.id.2, ])
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5166   0.0000   0.3301   0.8648   1.4696  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.902754   0.527386  -3.608 0.000309 ***
## DB           0.573104   0.198893   2.881 0.003958 ** 
## Alamine      0.015850   0.005466   2.900 0.003737 ** 
## Age          0.020418   0.008210   2.487 0.012883 *  
## Alkphos      0.003744   0.001477   2.534 0.011262 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 473.51  on 421  degrees of freedom
## Residual deviance: 381.31  on 417  degrees of freedom
## AIC: 391.31
## 
## Number of Fisher Scoring iterations: 8

Based on Results

Final Model: Log(p/1-p) = -1.902754 + 0.573104 * DB + 0.015850 * Alamine + 0.020418 * Age +0.003744 * Alkphos

Odds Ratio

round(exp(glm.liver.final.2$coefficients), 3)

## (Intercept)          DB     Alamine         Age     Alkphos 
##       0.149       1.774       1.016       1.021       1.004

Based on Results

Odds of an adult male as a liver patient with active liver disease: Increase by exp(0.573104) = 1.774 with a one-unit increase in DB (all others constant) Same odds: Increase by exp(0.015850) = 1.016 with a one-unit increase in Alamine (all others constant) Same odds: Increase by exp(0.020418) = 1.021 with a one-unit increase in Age (all others constant) Same odds: Increase by exp(0.003744) = 1.004 with a one-unit increase in Alkphos (all others constant) Conclusion: Ad adult male with high levels of DB, Alamine, Alkphos, and an older Age is more likely to be a liver patient with active liver disease.

Ex 2 Part D

How Models Differ

Females have fewer significant predictors (2) while Males have 4 significant predictors. Both of these increase the chance of the disease but the chances are further increased with twice the predictors.

Exercise 3:

Alpha = 0.1

Data Structure

sleep <- read.csv("C:/Users/lacyb/Documents/2021 Fall/STA 6443 - DA Algorithms I/sleep.csv", header = TRUE)
str(sleep)

## 'data.frame':    51 obs. of  8 variables:
##  $ species           : chr  "African" "African" "Arctic F" "Asian el" ...
##  $ bodyweight        : num  6654 1 3.38 2547 10.55 ...
##  $ brainweight       : num  5712 6.6 44.5 4603 179.5 ...
##  $ totalsleep        : num  3.3 8.3 12.5 3.9 9.8 19.7 6.2 14.5 9.7 12.5 ...
##  $ gestationtime     : num  645 42 60 624 180 35 392 63 230 112 ...
##  $ predationindex    : int  3 3 1 3 4 1 4 1 1 5 ...
##  $ sleepexposureindex: int  5 1 1 5 4 1 5 2 1 4 ...
##  $ maxlife10         : int  1 0 1 1 1 1 1 1 1 0 ...

Ex 3 Part A

Fit Logistic Regression Model

glm.null.sleep = glm(maxlife10 ~ 1, data = sleep, family = "binomial")
glm.full.sleep = glm(maxlife10 ~ bodyweight + brainweight + totalsleep + gestationtime + as.factor(predationindex) + as.factor(sleepexposureindex),
                     data = sleep, family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Stepwise Selection with AIC Criteria

step.sleep = step(glm.null.sleep, scope = list(upper=glm.full.sleep),
                  direction = "both", test = "Chisq", trace = F)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(step.sleep)

## 
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(sleepexposureindex) + 
##     as.factor(predationindex), family = "binomial", data = sleep)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.42528  -0.00004   0.00000   0.00013   2.37523  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                    -6.602e+00  4.864e+00  -1.357   0.1747  
## brainweight                     5.101e-02  5.084e-02   1.003   0.3157  
## totalsleep                      4.230e-01  2.647e-01   1.598   0.1100  
## as.factor(sleepexposureindex)2  4.998e+00  2.559e+00   1.953   0.0508 .
## as.factor(sleepexposureindex)3  3.636e+01  9.624e+03   0.004   0.9970  
## as.factor(sleepexposureindex)4  3.370e+01  1.037e+04   0.003   0.9974  
## as.factor(sleepexposureindex)5  7.341e+01  1.262e+04   0.006   0.9954  
## as.factor(predationindex)2     -2.535e+00  1.960e+00  -1.293   0.1960  
## as.factor(predationindex)3     -2.512e+01  1.253e+04  -0.002   0.9984  
## as.factor(predationindex)4     -1.826e+01  6.795e+03  -0.003   0.9979  
## as.factor(predationindex)5     -5.264e+01  1.143e+04  -0.005   0.9963  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.31  on 50  degrees of freedom
## Residual deviance: 15.88  on 40  degrees of freedom
## AIC: 37.88
## 
## Number of Fisher Scoring iterations: 20

Based on Results

sleepexposureindex is the only significant predictor - it has a p-value lower than 0.1

Ex 3 Part B

Estimates

sleepexposureindex Levels 1,3,4, and 5 have no effect on the probability of an event. Only Level 2 has a significant different probability of an event - even so, sleepexposureindex is a sigificant predictor.

Hosmer-Lemeshow Test

hoslem.test(step.sleep$y, fitted(step.sleep), g=10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.sleep$y, fitted(step.sleep)
## X-squared = 7.0397, df = 8, p-value = 0.5324

Based on Results

The above test produced a p-value greater than 0.1 - therefore, we do NOT reject the null & the model is adequate.

Residual Plots

resid.d.sleep = residuals(step.sleep, type = "deviance")
resid.p.sleep = residuals(step.sleep, type = "pearson")
std.resid.d.sleep = residuals(step.sleep, type = "deviance")/sqrt(1 - hatvalues(step.sleep))
std.resid.p.sleep = residuals(step.sleep, type = "pearson")/sqrt(1 - hatvalues(step.sleep))

par(mfrow = c(1,2))
plot(std.resid.d.sleep[step.sleep$model$maxlife10==0], col = "blue", 
     ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.sleep[step.sleep$model$maxlife10==1], col = "red")

plot(std.resid.p.sleep[step.sleep$model$maxlife10==0], col = "blue", 
     ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p.sleep[step.sleep$model$maxlife10==1], col = "red")

Based on Results

All Red points are between 0 and 1; all Blue points are between -2 and 0; some do overlap There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.

Cook’s Distance - Influence Diagnostics

plot(step.sleep, which = 4, id.n = 5)

inf.id.3 = which(cooks.distance(step.sleep)>0.25)
inf.id.3

## 35 40 
## 35 40

Based on Results

Identified Issue: Observations 35 and 40 have a value larger than 0.25

Refitted Model (Excluding Obs 35 & 40)

glm.sleep.final = glm(maxlife10 ~ brainweight + totalsleep + as.factor(predationindex) + as.factor(sleepexposureindex),
                     data = sleep[-inf.id.3], family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Ex 3 Part C

Final Model

summary(glm.sleep.final)

## 
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(predationindex) + 
##     as.factor(sleepexposureindex), family = "binomial", data = sleep[-inf.id.3])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.42528  -0.00004   0.00000   0.00013   2.37523  
## 
## Coefficients:
##                                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)                    -6.602e+00  4.864e+00  -1.357   0.1747  
## brainweight                     5.101e-02  5.084e-02   1.003   0.3157  
## totalsleep                      4.230e-01  2.647e-01   1.598   0.1100  
## as.factor(predationindex)2     -2.535e+00  1.960e+00  -1.293   0.1960  
## as.factor(predationindex)3     -2.512e+01  1.253e+04  -0.002   0.9984  
## as.factor(predationindex)4     -1.826e+01  6.795e+03  -0.003   0.9979  
## as.factor(predationindex)5     -5.264e+01  1.143e+04  -0.005   0.9963  
## as.factor(sleepexposureindex)2  4.998e+00  2.559e+00   1.953   0.0508 .
## as.factor(sleepexposureindex)3  3.636e+01  9.624e+03   0.004   0.9970  
## as.factor(sleepexposureindex)4  3.370e+01  1.037e+04   0.003   0.9974  
## as.factor(sleepexposureindex)5  7.341e+01  1.262e+04   0.006   0.9954  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.31  on 50  degrees of freedom
## Residual deviance: 15.88  on 40  degrees of freedom
## AIC: 37.88
## 
## Number of Fisher Scoring iterations: 20

Odds Ratio

round(exp(glm.sleep.final$coefficients), 3)

##                    (Intercept)                    brainweight 
##                   1.000000e-03                   1.052000e+00 
##                     totalsleep     as.factor(predationindex)2 
##                   1.527000e+00                   7.900000e-02 
##     as.factor(predationindex)3     as.factor(predationindex)4 
##                   0.000000e+00                   0.000000e+00 
##     as.factor(predationindex)5 as.factor(sleepexposureindex)2 
##                   0.000000e+00                   1.480500e+02 
## as.factor(sleepexposureindex)3 as.factor(sleepexposureindex)4 
##                   6.173141e+15                   4.332708e+14 
## as.factor(sleepexposureindex)5 
##                   7.603846e+31

Based on Results

odds(sleepexposureindex = 2)/odds(sleepexposureindex = 1) = exp(4.998e+00) = 1.480500e+02 No need to evaluation insignificant levels (1,3,4,5) Odds that a species’ max life span would be at least 10 years is exp(4.998e+00) = 1.480500e+02 times for the sleepexposureindex Level 2 group Therefore, animals in the second-best well-protected den have a higher probability of living more than 10 years.

Exercise 4:

Alpha = 0.1

Ex 4 Part A

Fit Logistic Regression Model

glm.null.sleep.2 = glm(maxlife10 ~ 1, data = sleep, family = "binomial")
glm.full.sleep.2 = glm(maxlife10 ~ bodyweight + brainweight + totalsleep + gestationtime + predationindex + sleepexposureindex, data = sleep, family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Stepwise Selection with AIC Criteria

step.sleep.2 = step(glm.null.sleep.2, scope = list(upper=glm.full.sleep.2),
                    direction = "both", test = "Chisq", trace = F)

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(step.sleep.2)

## 
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + sleepexposureindex + 
##     predationindex, family = "binomial", data = sleep)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.82148  -0.04746   0.00000   0.05811   2.41681  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -6.16387    3.59301  -1.716   0.0863 .
## brainweight         0.06018    0.03544   1.698   0.0895 .
## totalsleep          0.35985    0.20995   1.714   0.0865 .
## sleepexposureindex  4.42111    1.97540   2.238   0.0252 *
## predationindex     -3.36917    1.51823  -2.219   0.0265 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.310  on 50  degrees of freedom
## Residual deviance: 19.212  on 46  degrees of freedom
## AIC: 29.212
## 
## Number of Fisher Scoring iterations: 11

Based on Results

Based on the results from above, the significant predictors are brainweight, totalsleep, sleepexposureindex, and predationindex because they have a p-value less than 0.1

Ex 4 Part B

Estimates

All four significant predictors have p-values less than 0.1 therefore there is significant relationship between them and and whether a species’ max life span execeds 10 years

Hosmer-Lemeshow Test

hoslem.test(step.sleep.2$y, fitted(step.sleep.2), g=10)

## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  step.sleep.2$y, fitted(step.sleep.2)
## X-squared = 1.4406, df = 8, p-value = 0.9937

Based on Results

The above test produced a p-value greater than 0.1. Therefore we do NOT reject the null and the model is adequate.

Residual Plots

resid.d.sleep2 = residuals(step.sleep.2, type = "deviance")
resid.p.sleep2 = residuals(step.sleep.2, type = "pearson")
std.resid.d.sleep2 = residuals(step.sleep.2, type = "deviance")/sqrt(1 - hatvalues(step.sleep.2))
std.resid.p.sleep2 = residuals(step.sleep.2, type = "pearson")/sqrt(1 - hatvalues(step.sleep.2))

par(mfrow=c(1,2))
plot(std.resid.d.sleep2[step.sleep.2$model$maxlife10==0], col = "blue",
     ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.sleep2[step.sleep.2$model$maxlife10==1], col = "red")

plot(std.resid.p.sleep2[step.sleep.2$model$maxlife10==0], col = "blue",
     ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p.sleep2[step.sleep.2$model$maxlife10==1], col = "red")

Based on Results

Blue plots fall between 0 and 1 | Red plots fall between -2 and 0. There is one red point outside the range.There are some overlap between the points. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.

Cook’s Distance - Influence Diagnostics

plot(step.sleep.2, which = 4, id.n = 5)

inf.id.4 = which(cooks.distance(step.sleep.2)>0.25)
inf.id.4

## 10 35 40 50 
## 10 35 40 50

Based on Results

Identified Issue: Observations 10, 35, 40, and 50 have a value larger than 0.25

Refitted Model (Excluding Obs 10, 35, 40, & 50)

glm.sleep.final2 = glm(maxlife10 ~ brainweight + totalsleep + predationindex + sleepexposureindex,
                       data = sleep[-inf.id.4], family = "binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Ex 4 Part C

Final Model

summary(glm.sleep.final2)

## 
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + predationindex + 
##     sleepexposureindex, family = "binomial", data = sleep[-inf.id.4])
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.82148  -0.04746   0.00000   0.05811   2.41681  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -6.16387    3.59301  -1.716   0.0863 .
## brainweight         0.06018    0.03544   1.698   0.0895 .
## totalsleep          0.35985    0.20995   1.714   0.0865 .
## predationindex     -3.36917    1.51823  -2.219   0.0265 *
## sleepexposureindex  4.42111    1.97540   2.238   0.0252 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.310  on 50  degrees of freedom
## Residual deviance: 19.212  on 46  degrees of freedom
## AIC: 29.212
## 
## Number of Fisher Scoring iterations: 11

Based on Results

Final Model: Log(p/1-p) = -6.16387 + 0.06018 * brainweight + 0.35985 * totalsleep + 4.42111 * sleepexposureindex + -3.36917 * predationindex

Odds Ratio

round(exp(glm.sleep.final2$coefficients),3)

##        (Intercept)        brainweight         totalsleep     predationindex 
##              0.002              1.062              1.433              0.034 
## sleepexposureindex 
##             83.188

Based on Results

Odds that a species’ lifespan would exceed 10 years: Increase by exp(0.06018) = 1.062 with a one-unit increase in brainweight (all others constant) Same odds: Increase by exp(0.35985) = 1.433 with a one-unit increase in totalsleep (all others constant) Same odds: Increase by exp(-3.36917) = 0.034 with a one-unit increase in predationindex (all others constant) Same odds: Increase by exp(4.42111) = 83.188 with a one-unit increase in sleepexposureindex (all others constant) Conclusion: A species that has a higher brainweight, totalsleep, lower predationindex, and sleepexposureindex is more likely to live at least 10 years.

STA 6443 HW 4

Lacy Burke

12/2/2021

Exercise 1:

Ex 1 Part A

Female: Fit Logistic Regression Model

Stepwise Selection with AIC Criteria

Based on Results

Ex 1 Part B

Estimates:

Hosmer-Lemeshow Test

Based on Results

Residual Plots

Based on Results

Cook’s Distance - Influence Diagnostics

Based on Results

Ex 1 Part C

Final Model

Based on Results

Odds Ratio

Based on Results

Exercise 2:

Alpha = 0.1

Ex 2 Part A

Male: Fit Logistic Regression Model

Stepwise Selection with AIC Criteria

Based on Results

Ex 2 Part B

Estimates

Hosmer-Lemeshow Test

Based on Results

Residual Plots

Based on Results

Cook’s Distance - Influence Diagnostics

Based on Results

Refitted Model (Excluding Obs 111 & 86)

Ex 2 Part C

Final Model

Based on Results

Odds Ratio

Based on Results

Ex 2 Part D

How Models Differ

Exercise 3:

Alpha = 0.1

Data Structure

Ex 3 Part A

Fit Logistic Regression Model

Stepwise Selection with AIC Criteria

Based on Results

Ex 3 Part B

Estimates

Hosmer-Lemeshow Test

Based on Results

Residual Plots

Based on Results

Cook’s Distance - Influence Diagnostics

Based on Results

Refitted Model (Excluding Obs 35 & 40)

Ex 3 Part C

Final Model

Odds Ratio

Based on Results

Exercise 4:

Alpha = 0.1

Ex 4 Part A

Fit Logistic Regression Model

Stepwise Selection with AIC Criteria

Based on Results

Ex 4 Part B

Estimates

Hosmer-Lemeshow Test

Based on Results

Residual Plots

Based on Results

Cook’s Distance - Influence Diagnostics

Based on Results

Refitted Model (Excluding Obs 10, 35, 40, & 50)

Ex 4 Part C

Final Model