liver <- read.csv("C:/Users/lacyb/Documents/2021 Fall/STA 6443 - DA Algorithms I/liver.csv", header = TRUE)
str(liver)
## 'data.frame': 558 obs. of 10 variables:
## $ Age : int 65 62 62 58 72 46 26 29 55 57 ...
## $ Gender : chr "Female" "Male" "Male" "Male" ...
## $ TB : num 0.7 10.9 7.3 1 3.9 1.8 0.9 0.9 0.7 0.6 ...
## $ DB : num 0.1 5.5 4.1 0.4 2 0.7 0.2 0.3 0.2 0.1 ...
## $ Alkphos : int 187 699 490 182 195 208 154 202 290 210 ...
## $ Alamine : int 16 64 60 14 27 19 16 14 53 51 ...
## $ Aspartate : int 18 100 68 20 59 14 12 11 58 59 ...
## $ TP : num 6.8 7.5 7 6.8 7.3 7.6 7 6.7 6.8 5.9 ...
## $ ALB : num 3.3 3.2 3.3 3.4 2.4 4.4 3.5 3.6 3.4 2.7 ...
## $ LiverPatient: int 1 1 1 1 1 1 1 1 1 1 ...
liver_Female = liver[which(liver$Gender == "Female"),]
glm.null.F = glm(LiverPatient ~ 1, data = liver_Female, family = "binomial")
glm.full.F = glm(LiverPatient ~ Age + TB + DB + Alkphos + Alamine + Aspartate + TP + ALB, data = liver_Female, family = "binomial")
step.1 = step(glm.null.F, scope = list(upper = glm.full.F), direction = "both", test = "Chisq", trace = F)
summary(step.1)
##
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = "binomial",
## data = liver_Female)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8178 -1.2223 0.4402 1.1091 1.2049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32480 0.31013 -1.047 0.2950
## DB 0.94479 0.55808 1.693 0.0905 .
## Aspartate 0.01106 0.00616 1.796 0.0726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.72 on 134 degrees of freedom
## Residual deviance: 154.27 on 132 degrees of freedom
## AIC: 160.27
##
## Number of Fisher Scoring iterations: 7
DB & Aspartate are sinificant predictors since they have p-values lower than 0.1
DB has p-value of 0.0905 (below 0.1), so there is a significant relationship between DB and whether a female patient has active liver disease.Aspartate has p-value of 0.726 (below 0.1), so there is a significant relationship between Aspartate and whether a female patient has active liver disease.
hoslem.test(step.1$y, fitted(step.1), g = 10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.1$y, fitted(step.1)
## X-squared = 7.7535, df = 8, p-value = 0.4579
The Hosmer-Lemeshow test gave a p-value of 0.4579 which is above 0.1 Therefore, we do not reject the null and the model is adequate
resid.d = residuals(step.1, type = "deviance")
resid.p = residuals(step.1, type = "pearson")
std.resid.d = residuals(step.1, type = "deviance")/sqrt(1 - hatvalues(step.1))
std.resid.p = residuals(step.1, type = "pearson")/sqrt(1 - hatvalues(step.1))
par(mfrow = c(1,2))
plot(std.resid.d[step.1$model$LiverPatient==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d[step.1$model$LiverPatient==1], col = "red")
plot(std.resid.p[step.1$model$LiverPatient==0], col = "red",
ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p[step.1$model$LiverPatient==1], col = "blue")
Residual plots show a parallel pattern - this means there are similar estimated probabilities. This pattern is due to data feature - not a violation of assumptions. Plot points fall between 0 and 1 for Red and -2 and -1 for Blue. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.
plot(step.1, which = 4, id.n = 5)
inf.id.1 = which(cooks.distance(step.1)>0.25)
inf.id.1
## named integer(0)
Identified Issue: There are no observations with a Cook’s Distance value greater than 0.25
summary(step.1)
##
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = "binomial",
## data = liver_Female)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8178 -1.2223 0.4402 1.1091 1.2049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32480 0.31013 -1.047 0.2950
## DB 0.94479 0.55808 1.693 0.0905 .
## Aspartate 0.01106 0.00616 1.796 0.0726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.72 on 134 degrees of freedom
## Residual deviance: 154.27 on 132 degrees of freedom
## AIC: 160.27
##
## Number of Fisher Scoring iterations: 7
Final Model: Log(p/1-p) = -0.32480 + 0.94479 * DB + 0.01106 * Aspartate
round(exp(step.1$coefficients), 3)
Odds of an adult female as a liver patient with active liver disease: Increases by exp(0.94479) = 2.572 with a one-unit increase in DB (Aspartate held constant) Same odds: Increase by exp(0.01106) = 1.011 with a one-unit increase in Aspartate (DB held constant) Conclusion: An adult female with high levels of DB and Aspartate is more likely to be a liver patient with active liver disease.
liver_Male = liver[which(liver$Gender == "Male"),]
glm.null.M = glm(LiverPatient ~ 1, data = liver_Male, family = "binomial")
glm.full.M = glm(LiverPatient ~ Age + TB + DB + Alkphos + Alamine + Aspartate + TP + ALB, data = liver_Male, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
step.2 = step(glm.null.M, scope = list(upper = glm.full.M), direction = "both", test = "Chisq", trace = F)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(step.2)
##
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = "binomial",
## data = liver_Male)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3405 -0.5170 0.3978 0.8614 1.3756
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.476570 0.481336 -3.068 0.00216 **
## DB 0.512503 0.176066 2.911 0.00360 **
## Alamine 0.016218 0.005239 3.095 0.00197 **
## Age 0.020616 0.008095 2.547 0.01087 *
## Alkphos 0.001740 0.001058 1.645 0.09992 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 476.28 on 422 degrees of freedom
## Residual deviance: 395.05 on 418 degrees of freedom
## AIC: 405.05
##
## Number of Fisher Scoring iterations: 7
DB, Alamine, Age, & Alkphos are significant predictors since they have p-values lower than 0.1
All the p-values for DB, Alamine, Age, & Alkphos are below 0.1, so they all have a siginficant relationship with whether a male patient has active liver disease.
hoslem.test(step.2$y, fitted(step.2), g = 10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.2$y, fitted(step.2)
## X-squared = 7.043, df = 8, p-value = 0.532
The above test resulted in a p-value of 0.532 (above 0.1), so we do NOT reject the null.This means the model is adequate.
resid.d.2 = residuals(step.2, type = "deviance")
resid.p.2 = residuals(step.2, type = "pearson")
std.resid.d.2 = residuals(step.2, type = "deviance")/sqrt(1 - hatvalues(step.2))
std.resid.p.2 = residuals(step.2, type = "pearson")/sqrt(1 - hatvalues(step.2))
par(mfrow = c(1,2))
plot(std.resid.d.2[step.2$model$LiverPatient==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.2[step.2$model$LiverPatient==1], col = "red")
plot(std.resid.p.2[step.2$model$LiverPatient==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.d.2[step.2$model$LiverPatient==1], col = "red")
Residual plots show a parallel pattern - this means there are similar estimated probabilities. This pattern is due to data feature - not a violation of assumptions. Plot points fall between 0 and 1 for Red and -2 and -1 for Blue. There are a set of Blue points outside the range. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.
plot(step.2, which = 4, id.n = 5)
inf.id.2 = which(cooks.distance(step.2)>0.25)
inf.id.2
## 111
## 86
Identified Issue: Observations 111 and 86 have a value larger than 0.25
glm.liver.final.2 = glm(LiverPatient ~ DB + Alamine + Age + Alkphos, data = liver_Male[-inf.id.2, ], family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(glm.liver.final.2)
##
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = "binomial",
## data = liver_Male[-inf.id.2, ])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5166 0.0000 0.3301 0.8648 1.4696
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.902754 0.527386 -3.608 0.000309 ***
## DB 0.573104 0.198893 2.881 0.003958 **
## Alamine 0.015850 0.005466 2.900 0.003737 **
## Age 0.020418 0.008210 2.487 0.012883 *
## Alkphos 0.003744 0.001477 2.534 0.011262 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 473.51 on 421 degrees of freedom
## Residual deviance: 381.31 on 417 degrees of freedom
## AIC: 391.31
##
## Number of Fisher Scoring iterations: 8
Final Model: Log(p/1-p) = -1.902754 + 0.573104 * DB + 0.015850 * Alamine + 0.020418 * Age +0.003744 * Alkphos
round(exp(glm.liver.final.2$coefficients), 3)
## (Intercept) DB Alamine Age Alkphos
## 0.149 1.774 1.016 1.021 1.004
Odds of an adult male as a liver patient with active liver disease: Increase by exp(0.573104) = 1.774 with a one-unit increase in DB (all others constant) Same odds: Increase by exp(0.015850) = 1.016 with a one-unit increase in Alamine (all others constant) Same odds: Increase by exp(0.020418) = 1.021 with a one-unit increase in Age (all others constant) Same odds: Increase by exp(0.003744) = 1.004 with a one-unit increase in Alkphos (all others constant) Conclusion: Ad adult male with high levels of DB, Alamine, Alkphos, and an older Age is more likely to be a liver patient with active liver disease.
Females have fewer significant predictors (2) while Males have 4 significant predictors. Both of these increase the chance of the disease but the chances are further increased with twice the predictors.
sleep <- read.csv("C:/Users/lacyb/Documents/2021 Fall/STA 6443 - DA Algorithms I/sleep.csv", header = TRUE)
str(sleep)
## 'data.frame': 51 obs. of 8 variables:
## $ species : chr "African" "African" "Arctic F" "Asian el" ...
## $ bodyweight : num 6654 1 3.38 2547 10.55 ...
## $ brainweight : num 5712 6.6 44.5 4603 179.5 ...
## $ totalsleep : num 3.3 8.3 12.5 3.9 9.8 19.7 6.2 14.5 9.7 12.5 ...
## $ gestationtime : num 645 42 60 624 180 35 392 63 230 112 ...
## $ predationindex : int 3 3 1 3 4 1 4 1 1 5 ...
## $ sleepexposureindex: int 5 1 1 5 4 1 5 2 1 4 ...
## $ maxlife10 : int 1 0 1 1 1 1 1 1 1 0 ...
glm.null.sleep = glm(maxlife10 ~ 1, data = sleep, family = "binomial")
glm.full.sleep = glm(maxlife10 ~ bodyweight + brainweight + totalsleep + gestationtime + as.factor(predationindex) + as.factor(sleepexposureindex),
data = sleep, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
step.sleep = step(glm.null.sleep, scope = list(upper=glm.full.sleep),
direction = "both", test = "Chisq", trace = F)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(step.sleep)
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(sleepexposureindex) +
## as.factor(predationindex), family = "binomial", data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.42528 -0.00004 0.00000 0.00013 2.37523
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.602e+00 4.864e+00 -1.357 0.1747
## brainweight 5.101e-02 5.084e-02 1.003 0.3157
## totalsleep 4.230e-01 2.647e-01 1.598 0.1100
## as.factor(sleepexposureindex)2 4.998e+00 2.559e+00 1.953 0.0508 .
## as.factor(sleepexposureindex)3 3.636e+01 9.624e+03 0.004 0.9970
## as.factor(sleepexposureindex)4 3.370e+01 1.037e+04 0.003 0.9974
## as.factor(sleepexposureindex)5 7.341e+01 1.262e+04 0.006 0.9954
## as.factor(predationindex)2 -2.535e+00 1.960e+00 -1.293 0.1960
## as.factor(predationindex)3 -2.512e+01 1.253e+04 -0.002 0.9984
## as.factor(predationindex)4 -1.826e+01 6.795e+03 -0.003 0.9979
## as.factor(predationindex)5 -5.264e+01 1.143e+04 -0.005 0.9963
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.31 on 50 degrees of freedom
## Residual deviance: 15.88 on 40 degrees of freedom
## AIC: 37.88
##
## Number of Fisher Scoring iterations: 20
sleepexposureindex is the only significant predictor - it has a p-value lower than 0.1
sleepexposureindex Levels 1,3,4, and 5 have no effect on the probability of an event. Only Level 2 has a significant different probability of an event - even so, sleepexposureindex is a sigificant predictor.
hoslem.test(step.sleep$y, fitted(step.sleep), g=10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.sleep$y, fitted(step.sleep)
## X-squared = 7.0397, df = 8, p-value = 0.5324
The above test produced a p-value greater than 0.1 - therefore, we do NOT reject the null & the model is adequate.
resid.d.sleep = residuals(step.sleep, type = "deviance")
resid.p.sleep = residuals(step.sleep, type = "pearson")
std.resid.d.sleep = residuals(step.sleep, type = "deviance")/sqrt(1 - hatvalues(step.sleep))
std.resid.p.sleep = residuals(step.sleep, type = "pearson")/sqrt(1 - hatvalues(step.sleep))
par(mfrow = c(1,2))
plot(std.resid.d.sleep[step.sleep$model$maxlife10==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.sleep[step.sleep$model$maxlife10==1], col = "red")
plot(std.resid.p.sleep[step.sleep$model$maxlife10==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p.sleep[step.sleep$model$maxlife10==1], col = "red")
All Red points are between 0 and 1; all Blue points are between -2 and 0; some do overlap There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.
plot(step.sleep, which = 4, id.n = 5)
inf.id.3 = which(cooks.distance(step.sleep)>0.25)
inf.id.3
## 35 40
## 35 40
Identified Issue: Observations 35 and 40 have a value larger than 0.25
glm.sleep.final = glm(maxlife10 ~ brainweight + totalsleep + as.factor(predationindex) + as.factor(sleepexposureindex),
data = sleep[-inf.id.3], family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(glm.sleep.final)
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(predationindex) +
## as.factor(sleepexposureindex), family = "binomial", data = sleep[-inf.id.3])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.42528 -0.00004 0.00000 0.00013 2.37523
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.602e+00 4.864e+00 -1.357 0.1747
## brainweight 5.101e-02 5.084e-02 1.003 0.3157
## totalsleep 4.230e-01 2.647e-01 1.598 0.1100
## as.factor(predationindex)2 -2.535e+00 1.960e+00 -1.293 0.1960
## as.factor(predationindex)3 -2.512e+01 1.253e+04 -0.002 0.9984
## as.factor(predationindex)4 -1.826e+01 6.795e+03 -0.003 0.9979
## as.factor(predationindex)5 -5.264e+01 1.143e+04 -0.005 0.9963
## as.factor(sleepexposureindex)2 4.998e+00 2.559e+00 1.953 0.0508 .
## as.factor(sleepexposureindex)3 3.636e+01 9.624e+03 0.004 0.9970
## as.factor(sleepexposureindex)4 3.370e+01 1.037e+04 0.003 0.9974
## as.factor(sleepexposureindex)5 7.341e+01 1.262e+04 0.006 0.9954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.31 on 50 degrees of freedom
## Residual deviance: 15.88 on 40 degrees of freedom
## AIC: 37.88
##
## Number of Fisher Scoring iterations: 20
round(exp(glm.sleep.final$coefficients), 3)
## (Intercept) brainweight
## 1.000000e-03 1.052000e+00
## totalsleep as.factor(predationindex)2
## 1.527000e+00 7.900000e-02
## as.factor(predationindex)3 as.factor(predationindex)4
## 0.000000e+00 0.000000e+00
## as.factor(predationindex)5 as.factor(sleepexposureindex)2
## 0.000000e+00 1.480500e+02
## as.factor(sleepexposureindex)3 as.factor(sleepexposureindex)4
## 6.173141e+15 4.332708e+14
## as.factor(sleepexposureindex)5
## 7.603846e+31
odds(sleepexposureindex = 2)/odds(sleepexposureindex = 1) = exp(4.998e+00) = 1.480500e+02 No need to evaluation insignificant levels (1,3,4,5) Odds that a species’ max life span would be at least 10 years is exp(4.998e+00) = 1.480500e+02 times for the sleepexposureindex Level 2 group Therefore, animals in the second-best well-protected den have a higher probability of living more than 10 years.
glm.null.sleep.2 = glm(maxlife10 ~ 1, data = sleep, family = "binomial")
glm.full.sleep.2 = glm(maxlife10 ~ bodyweight + brainweight + totalsleep + gestationtime + predationindex + sleepexposureindex, data = sleep, family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
step.sleep.2 = step(glm.null.sleep.2, scope = list(upper=glm.full.sleep.2),
direction = "both", test = "Chisq", trace = F)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(step.sleep.2)
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + sleepexposureindex +
## predationindex, family = "binomial", data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.82148 -0.04746 0.00000 0.05811 2.41681
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.16387 3.59301 -1.716 0.0863 .
## brainweight 0.06018 0.03544 1.698 0.0895 .
## totalsleep 0.35985 0.20995 1.714 0.0865 .
## sleepexposureindex 4.42111 1.97540 2.238 0.0252 *
## predationindex -3.36917 1.51823 -2.219 0.0265 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 19.212 on 46 degrees of freedom
## AIC: 29.212
##
## Number of Fisher Scoring iterations: 11
Based on the results from above, the significant predictors are brainweight, totalsleep, sleepexposureindex, and predationindex because they have a p-value less than 0.1
All four significant predictors have p-values less than 0.1 therefore there is significant relationship between them and and whether a species’ max life span execeds 10 years
hoslem.test(step.sleep.2$y, fitted(step.sleep.2), g=10)
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: step.sleep.2$y, fitted(step.sleep.2)
## X-squared = 1.4406, df = 8, p-value = 0.9937
The above test produced a p-value greater than 0.1. Therefore we do NOT reject the null and the model is adequate.
resid.d.sleep2 = residuals(step.sleep.2, type = "deviance")
resid.p.sleep2 = residuals(step.sleep.2, type = "pearson")
std.resid.d.sleep2 = residuals(step.sleep.2, type = "deviance")/sqrt(1 - hatvalues(step.sleep.2))
std.resid.p.sleep2 = residuals(step.sleep.2, type = "pearson")/sqrt(1 - hatvalues(step.sleep.2))
par(mfrow=c(1,2))
plot(std.resid.d.sleep2[step.sleep.2$model$maxlife10==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Deviation Residuals", xlab = "ID")
points(std.resid.d.sleep2[step.sleep.2$model$maxlife10==1], col = "red")
plot(std.resid.p.sleep2[step.sleep.2$model$maxlife10==0], col = "blue",
ylim = c(-3.5,3.5), ylab = "Standard Pearson Residuals", xlab = "ID")
points(std.resid.p.sleep2[step.sleep.2$model$maxlife10==1], col = "red")
Blue plots fall between 0 and 1 | Red plots fall between -2 and 0. There is one red point outside the range.There are some overlap between the points. There are no significantly large values, therefore the Bernoulli assumption is valid - there also is no systematic pattern meaning it does not violate the linearity assumption.
plot(step.sleep.2, which = 4, id.n = 5)
inf.id.4 = which(cooks.distance(step.sleep.2)>0.25)
inf.id.4
## 10 35 40 50
## 10 35 40 50
Identified Issue: Observations 10, 35, 40, and 50 have a value larger than 0.25
glm.sleep.final2 = glm(maxlife10 ~ brainweight + totalsleep + predationindex + sleepexposureindex,
data = sleep[-inf.id.4], family = "binomial")
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(glm.sleep.final2)
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + predationindex +
## sleepexposureindex, family = "binomial", data = sleep[-inf.id.4])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.82148 -0.04746 0.00000 0.05811 2.41681
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.16387 3.59301 -1.716 0.0863 .
## brainweight 0.06018 0.03544 1.698 0.0895 .
## totalsleep 0.35985 0.20995 1.714 0.0865 .
## predationindex -3.36917 1.51823 -2.219 0.0265 *
## sleepexposureindex 4.42111 1.97540 2.238 0.0252 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 19.212 on 46 degrees of freedom
## AIC: 29.212
##
## Number of Fisher Scoring iterations: 11
Final Model: Log(p/1-p) = -6.16387 + 0.06018 * brainweight + 0.35985 * totalsleep + 4.42111 * sleepexposureindex + -3.36917 * predationindex
round(exp(glm.sleep.final2$coefficients),3)
## (Intercept) brainweight totalsleep predationindex
## 0.002 1.062 1.433 0.034
## sleepexposureindex
## 83.188
Odds that a species’ lifespan would exceed 10 years: Increase by exp(0.06018) = 1.062 with a one-unit increase in brainweight (all others constant) Same odds: Increase by exp(0.35985) = 1.433 with a one-unit increase in totalsleep (all others constant) Same odds: Increase by exp(-3.36917) = 0.034 with a one-unit increase in predationindex (all others constant) Same odds: Increase by exp(4.42111) = 83.188 with a one-unit increase in sleepexposureindex (all others constant) Conclusion: A species that has a higher brainweight, totalsleep, lower predationindex, and sleepexposureindex is more likely to live at least 10 years.