## 'data.frame': 558 obs. of 10 variables:
## $ Age : int 65 62 62 58 72 46 26 29 55 57 ...
## $ Gender : chr "Female" "Male" "Male" "Male" ...
## $ TB : num 0.7 10.9 7.3 1 3.9 1.8 0.9 0.9 0.7 0.6 ...
## $ DB : num 0.1 5.5 4.1 0.4 2 0.7 0.2 0.3 0.2 0.1 ...
## $ Alkphos : int 187 699 490 182 195 208 154 202 290 210 ...
## $ Alamine : int 16 64 60 14 27 19 16 14 53 51 ...
## $ Aspartate : int 18 100 68 20 59 14 12 11 58 59 ...
## $ TP : num 6.8 7.5 7 6.8 7.3 7.6 7 6.7 6.8 5.9 ...
## $ ALB : num 3.3 3.2 3.3 3.4 2.4 4.4 3.5 3.6 3.4 2.7 ...
## $ LiverPatient: int 1 1 1 1 1 1 1 1 1 1 ...
After having ran the stepwise selection with the AIC criteria for both the full and null model they both result in the same two predictors; DB and Aspartate.
##
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = "binomial",
## data = liverFemale)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8178 -1.2223 0.4402 1.1091 1.2049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32480 0.31013 -1.047 0.2950
## DB 0.94479 0.55808 1.693 0.0905 .
## Aspartate 0.01106 0.00616 1.796 0.0726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.72 on 134 degrees of freedom
## Residual deviance: 154.27 on 132 degrees of freedom
## AIC: 160.27
##
## Number of Fisher Scoring iterations: 7
##
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = binomial,
## data = liverFemale)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8178 -1.2223 0.4402 1.1091 1.2049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32480 0.31013 -1.047 0.2950
## DB 0.94479 0.55808 1.693 0.0905 .
## Aspartate 0.01106 0.00616 1.796 0.0726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.72 on 134 degrees of freedom
## Residual deviance: 154.27 on 132 degrees of freedom
## AIC: 160.27
##
## Number of Fisher Scoring iterations: 7
Both parameters are significant as their p-values are less than our level of significant .1. To check the goodness of fit for the model we run the Hosmer-Lemeshow test and return a p-value of .46, which again is larger than our level of significance so we cannot reject the null hypothesis and conclude that the model is adequate. Checking cooks distance we see no obvious issues present within the new dataset and looking at the residual plot there is no clear patters, but a few outliers. I don’t think it’s significant enough and can still assume that the data is a good fit.
##
## Call:
## glm(formula = LiverPatient ~ DB + Aspartate, family = binomial,
## data = liverFemale)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8178 -1.2223 0.4402 1.1091 1.2049
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.32480 0.31013 -1.047 0.2950
## DB 0.94479 0.55808 1.693 0.0905 .
## Aspartate 0.01106 0.00616 1.796 0.0726 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 175.72 on 134 degrees of freedom
## Residual deviance: 154.27 on 132 degrees of freedom
## AIC: 160.27
##
## Number of Fisher Scoring iterations: 7
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: glm.New.Female$y, fitted(glm.New.Female)
## X-squared = 7.7535, df = 8, p-value = 0.4579
## [1] Age Gender TB DB Alkphos
## [6] Alamine Aspartate TP ALB LiverPatient
## <0 rows> (or 0-length row.names)
When examining the odds ratio output we see that both DB (2.57) and Aspartate(1.01) are both greater than the value of one, which gives us evidence to conclude that the increase in DB and Aspartate increases the chances of an adult female being a liver patient.
## (Intercept) DB Aspartate
## 0.723 2.572 1.011
For the male set the best set of predictors after running the stepwise selection with AIC criteria are DB, Alamine, Age, and Alkphos.
##
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = "binomial",
## data = liverMale)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3405 -0.5170 0.3978 0.8614 1.3756
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.476570 0.481336 -3.068 0.00216 **
## DB 0.512503 0.176066 2.911 0.00360 **
## Alamine 0.016218 0.005239 3.095 0.00197 **
## Age 0.020616 0.008095 2.547 0.01087 *
## Alkphos 0.001740 0.001058 1.645 0.09992 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 476.28 on 422 degrees of freedom
## Residual deviance: 395.05 on 418 degrees of freedom
## AIC: 405.05
##
## Number of Fisher Scoring iterations: 7
Under the significance level of .1 we can see that all four of our estimates are significant as they all are less than .1. Checking goodness of fit with the Hosmer-Lemeshow test return a p-value of .53 which is greater than our level of significance level which tells us that our model is adequate. When checking cooks we do see that we have one point that is greater than our .25 cut off so we will ultimately have to remove this variable from our final model. Looking at the residual plot there is no clear patters, but a few outliers. I don’t think it’s significant enough and can still assume that the data is a good fit.
##
## Call:
## glm(formula = LiverPatient ~ DB + Alamine + Age + Alkphos, family = binomial,
## data = liverMale)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3405 -0.5170 0.3978 0.8614 1.3756
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.476570 0.481336 -3.068 0.00216 **
## DB 0.512503 0.176066 2.911 0.00360 **
## Alamine 0.016218 0.005239 3.095 0.00197 **
## Age 0.020616 0.008095 2.547 0.01087 *
## Alkphos 0.001740 0.001058 1.645 0.09992 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 476.28 on 422 degrees of freedom
## Residual deviance: 395.05 on 418 degrees of freedom
## AIC: 405.05
##
## Number of Fisher Scoring iterations: 7
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: glm.N.Male$y, fitted(glm.N.Male)
## X-squared = 7.043, df = 8, p-value = 0.532
## Age Gender TB DB Alkphos Alamine Aspartate TP ALB LiverPatient
## 111 50 Male 7.3 3.6 1580 88 64 5.6 2.3 0
When examining the odds ratio output we see that DB (1.77), Alamine(1.01), Age(1.02), and Alkphos(1.004) are greater than the value of one, which gives us evidence to conclude that the increase in all four of these variables increases the chances of an adult male being a liver patient.
## (Intercept) DB Alamine Age Alkphos
## 0.149 1.774 1.016 1.021 1.004
## 'data.frame': 51 obs. of 8 variables:
## $ species : chr "African" "African" "Arctic F" "Asian el" ...
## $ bodyweight : num 6654 1 3.38 2547 10.55 ...
## $ brainweight : num 5712 6.6 44.5 4603 179.5 ...
## $ totalsleep : num 3.3 8.3 12.5 3.9 9.8 19.7 6.2 14.5 9.7 12.5 ...
## $ gestationtime : num 645 42 60 624 180 35 392 63 230 112 ...
## $ predationindex : int 3 3 1 3 4 1 4 1 1 5 ...
## $ sleepexposureindex: int 5 1 1 5 4 1 5 2 1 4 ...
## $ maxlife10 : int 1 0 1 1 1 1 1 1 1 0 ...
After running the stepwise selection with AIC criteria our best set of predictors come out to brainweight, totalsleep, as.factor(sleepexposureindex)2, as.factor(sleepexposureindex)3, as.factor(sleepexposureindex)4, as.factor(sleepexposureindex)5, as.factor(predationindex)2
as.factor(predationindex)3, as.factor(predationindex)4, as.factor(predationindex)5.
##
## Call:
## glm(formula = maxlife10 ~ as.factor(sleepexposureindex) + as.factor(predationindex) +
## gestationtime + totalsleep + brainweight + bodyweight, family = binomial,
## data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.61399 -0.00002 0.00000 0.00013 2.23584
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.686e+00 5.194e+00 -1.095 0.2736
## as.factor(sleepexposureindex)2 5.156e+00 2.482e+00 2.077 0.0378 *
## as.factor(sleepexposureindex)3 3.955e+01 1.519e+04 0.003 0.9979
## as.factor(sleepexposureindex)4 3.715e+01 1.581e+04 0.002 0.9981
## as.factor(sleepexposureindex)5 7.779e+01 1.917e+04 0.004 0.9968
## as.factor(predationindex)2 -2.467e+00 1.981e+00 -1.245 0.2130
## as.factor(predationindex)3 -2.637e+01 2.204e+04 -0.001 0.9990
## as.factor(predationindex)4 -1.945e+01 1.080e+04 -0.002 0.9986
## as.factor(predationindex)5 -5.674e+01 1.799e+04 -0.003 0.9975
## gestationtime -7.872e-03 1.444e-02 -0.545 0.5857
## totalsleep 3.873e-01 2.700e-01 1.434 0.1515
## brainweight 5.221e-02 6.207e-02 0.841 0.4003
## bodyweight -1.696e-02 1.254e-01 -0.135 0.8924
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 15.518 on 38 degrees of freedom
## AIC: 41.518
##
## Number of Fisher Scoring iterations: 21
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(sleepexposureindex) +
## as.factor(predationindex), family = "binomial", data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.42528 -0.00004 0.00000 0.00013 2.37523
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.602e+00 4.864e+00 -1.357 0.1747
## brainweight 5.101e-02 5.084e-02 1.003 0.3157
## totalsleep 4.230e-01 2.647e-01 1.598 0.1100
## as.factor(sleepexposureindex)2 4.998e+00 2.559e+00 1.953 0.0508 .
## as.factor(sleepexposureindex)3 3.636e+01 9.624e+03 0.004 0.9970
## as.factor(sleepexposureindex)4 3.370e+01 1.037e+04 0.003 0.9974
## as.factor(sleepexposureindex)5 7.341e+01 1.262e+04 0.006 0.9954
## as.factor(predationindex)2 -2.535e+00 1.960e+00 -1.293 0.1960
## as.factor(predationindex)3 -2.512e+01 1.253e+04 -0.002 0.9984
## as.factor(predationindex)4 -1.826e+01 6.795e+03 -0.003 0.9979
## as.factor(predationindex)5 -5.264e+01 1.143e+04 -0.005 0.9963
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.31 on 50 degrees of freedom
## Residual deviance: 15.88 on 40 degrees of freedom
## AIC: 37.88
##
## Number of Fisher Scoring iterations: 20
Examining the significance of the parameters only one variable, as.factor(sleepexposureindex)2, is significant, which tells us that the sleepexposureindex is the only significant term meaning that sleepexposureindex2 is the only term to increase the max life span. while all of the others are greater than .1, thus insignificant. To check the goodness of fit we run the Hosmer-Lemeshow test. After running the test we get a p-value of .53 which tells us that the model is adequate. Checking cooks distance we see that we had two variables that are outside our cut off of .25. Looking at the residual plot we see that it follows a pattern suggesting a less than perfect fit.
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + as.factor(sleepexposureindex) +
## as.factor(predationindex), family = "binomial", data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.42528 -0.00004 0.00000 0.00013 2.37523
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.602e+00 4.864e+00 -1.357 0.1747
## brainweight 5.101e-02 5.084e-02 1.003 0.3157
## totalsleep 4.230e-01 2.647e-01 1.598 0.1100
## as.factor(sleepexposureindex)2 4.998e+00 2.559e+00 1.953 0.0508 .
## as.factor(sleepexposureindex)3 3.636e+01 9.624e+03 0.004 0.9970
## as.factor(sleepexposureindex)4 3.370e+01 1.037e+04 0.003 0.9974
## as.factor(sleepexposureindex)5 7.341e+01 1.262e+04 0.006 0.9954
## as.factor(predationindex)2 -2.535e+00 1.960e+00 -1.293 0.1960
## as.factor(predationindex)3 -2.512e+01 1.253e+04 -0.002 0.9984
## as.factor(predationindex)4 -1.826e+01 6.795e+03 -0.003 0.9979
## as.factor(predationindex)5 -5.264e+01 1.143e+04 -0.005 0.9963
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.31 on 50 degrees of freedom
## Residual deviance: 15.88 on 40 degrees of freedom
## AIC: 37.88
##
## Number of Fisher Scoring iterations: 20
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: glm.New.Sleep$y, fitted(glm.New.Sleep)
## X-squared = 7.0397, df = 8, p-value = 0.5324
## species bodyweight brainweight totalsleep gestationtime predationindex
## 35 Phanlang 1.620 11.4 13.7 17 2
## 40 Red fox 4.235 50.4 9.8 52 1
## sleepexposureindex maxlife10
## 35 1 1
## 40 1 0
Calculating odds ratio our results are shockingly low, suggesting that none of the predictors increase the odds of a species life span being at least 10 years with their increase.
## (Intercept) as.factor(sleepexposureindex)2
## 6.150000e-01 1.950000e+00
## as.factor(sleepexposureindex)3 as.factor(sleepexposureindex)4
## 4.875000e+00 6.500000e+00
## as.factor(sleepexposureindex)5
## 1.879293e+08
## (Intercept) brainweight
## "1.00e-03" "1.05e+00"
## totalsleep as.factor(sleepexposureindex)2
## "1.53e+00" "1.48e+02"
## as.factor(sleepexposureindex)3 as.factor(sleepexposureindex)4
## "6.17e+15" "4.33e+14"
## as.factor(sleepexposureindex)5 as.factor(predationindex)2
## "7.60e+31" "7.90e-02"
## as.factor(predationindex)3 as.factor(predationindex)4
## "0.00e+00" "0.00e+00"
## as.factor(predationindex)5
## "0.00e+00"
After running the aic stepwise method for deciding the best set of predictors our variables are brainweight, totalsleep, sleepexposureindex, and predationindex.
##
## Call:
## glm(formula = maxlife10 ~ sleepexposureindex + predationindex +
## gestationtime + totalsleep + brainweight + bodyweight, family = binomial,
## data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.86102 -0.05003 0.00001 0.07291 2.39011
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.0737241 3.8232822 -1.589 0.1121
## sleepexposureindex 4.3918846 1.9754558 2.223 0.0262 *
## predationindex -3.3345335 1.5455267 -2.158 0.0310 *
## gestationtime -0.0009856 0.0109710 -0.090 0.9284
## totalsleep 0.3568800 0.2222444 1.606 0.1083
## brainweight 0.0643799 0.0404442 1.592 0.1114
## bodyweight -0.0324958 0.1153870 -0.282 0.7782
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 19.174 on 44 degrees of freedom
## AIC: 33.174
##
## Number of Fisher Scoring iterations: 13
##
## Call:
## glm(formula = maxlife10 ~ brainweight + totalsleep + sleepexposureindex +
## predationindex, family = "binomial", data = sleep)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.82148 -0.04746 0.00000 0.05811 2.41681
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.16387 3.59301 -1.716 0.0863 .
## brainweight 0.06018 0.03544 1.698 0.0895 .
## totalsleep 0.35985 0.20995 1.714 0.0865 .
## sleepexposureindex 4.42111 1.97540 2.238 0.0252 *
## predationindex -3.36917 1.51823 -2.219 0.0265 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 19.212 on 46 degrees of freedom
## AIC: 29.212
##
## Number of Fisher Scoring iterations: 11
After running the AIC stepwise selection we see that all of the selected parameters are less than .1 thus it can be assumed that they’re all significant. To check the goodness of fit we run the Hosmer-Lemeshow test. After running the test we get a p-value of .99 which tells us that the model is adequate. Looking at cooks distance we see that four different points end up being greater than our .25 cut off critera. Variable 10,35,40, and 50 were all greater than .25. Looking at the residual plot it appears that there is a clear shape to it, and it has several larger outliers suggesting Heteroscedasticity.
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: glm.Sleep4$y, fitted(glm.Sleep4)
## X-squared = 1.4406, df = 8, p-value = 0.9937
## species bodyweight brainweight totalsleep gestationtime predationindex
## 10 Chinchil 0.425 6.4 12.5 112 5
## 35 Phanlang 1.620 11.4 13.7 17 2
## 40 Red fox 4.235 50.4 9.8 52 1
## 50 Vervet 4.190 58.0 10.3 210 4
## sleepexposureindex maxlife10
## 10 4 0
## 35 1 1
## 40 1 0
## 50 3 1
When examining the odds ratio output we see that sleepexposureindex (83.2), totalsleep(1.43), and brainweight(1.062) are greater than the value of one, which gives us evidence to conclude that the increase in these 3 variables increases the odds of a species’ maximum lifespan being at least 10 years. While predation index comes in at .034 which tells us that it will not increase the chances of a species lifespan.
## (Intercept) sleepexposureindex predationindex totalsleep
## 0.002 83.188 0.034 1.433
## brainweight
## 1.062