Binary outcome: Territorial (yes/no)
Predictor variables: diet, social organization (used last week) Additional variable added to model: group size
LifeHistory_territorial<-read_xlsx("/Users/alliesheldon/Documents/UTSA/Classes/Demography/LifeHistory_territorial.xlsx")
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
## The following object is masked from 'package:survival':
##
## cluster
set.seed(1004)
training<-createDataPartition(y=LifeHistory_territorial$territorial,p=.80,list=F)
model.training<-LifeHistory_territorial[training,]
model.testing<-LifeHistory_territorial[-training,]
table(model.training$territorial)
##
## 0 1
## 58 146
prop.table(table(model.training$territorial))
##
## 0 1
## 0.2843137 0.7156863
summary(model.training)
## Common Name Species Name Genus Species
## Length:204 Length:204 Length:204 Length:204
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## malecare alloparent bodymass socialorg
## Min. :0.0000 Min. :0.0000 Min. : 62.3 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.: 756.9 1st Qu.:2.750
## Median :1.0000 Median :1.0000 Median : 3625.0 Median :4.000
## Mean :0.5159 Mean :0.9444 Mean : 8275.0 Mean :3.882
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.: 6723.2 3rd Qu.:5.000
## Max. :1.0000 Max. :1.0000 Max. :117549.5 Max. :7.000
## NA's :78 NA's :60 NA's :90
## reproductive groupsize territorial diet
## Length:204 Min. : 1.000 Min. :0.0000 Length:204
## Class :character 1st Qu.: 5.327 1st Qu.:0.0000 Class :character
## Mode :character Median : 9.545 Median :1.0000 Mode :character
## Mean : 22.666 Mean :0.7157
## 3rd Qu.: 22.667 3rd Qu.:1.0000
## Max. :425.000 Max. :1.0000
##
glmtrain<-glm(territorial~socialorg+diet+scale(groupsize),data=model.training,family=binomial)
summary(glmtrain)
##
## Call:
## glm(formula = territorial ~ socialorg + diet + scale(groupsize),
## family = binomial, data = model.training)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9622 -1.1570 0.6584 0.8692 1.2704
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.02064 0.48074 0.043 0.9658
## socialorg 0.06910 0.10230 0.675 0.4994
## dietfrugivore 0.82788 0.34186 2.422 0.0154 *
## dietgum -0.54278 1.45618 -0.373 0.7093
## dietinsectivore 17.01192 883.89937 0.019 0.9846
## scale(groupsize) -0.97218 0.39674 -2.450 0.0143 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 243.57 on 203 degrees of freedom
## Residual deviance: 209.07 on 198 degrees of freedom
## AIC: 221.07
##
## Number of Fisher Scoring iterations: 16
train.predict<-predict(glmtrain,newdata=model.training,type = "response")
head(train.predict)
## 1 2 3 4 5 6
## 1.0000000 0.7945900 0.8056388 0.8162307 0.8541353 0.7935138
#.5 decision rule
decisionrule5<-factor(ifelse(train.predict>.5,1,0))
#mean of the outcome decision rule
decisionrulemean<-factor(ifelse(train.predict>mean(I(model.training$territorial==1)),1,0))
The function for the confusion matrix accuracy was giving me an error with all thresholds and mean that I tried.
#cmtrain<-confusionMatrix(data=decisionrule5,reference = model.training$territorial)
So I calculated the proportions divided by the total N to get a measure of the accuracy for the .5 and mean decision rules.
#.5 decision rule
table(decisionrule5,model.training$territorial)
##
## decisionrule5 0 1
## 0 9 3
## 1 49 143
(9+143)/255 #59.61%
## [1] 0.5960784
(3+49)/255 #20.39%
## [1] 0.2039216
#confusion matrix mean decision rule
table(decisionrulemean,model.training$territorial)
##
## decisionrulemean 0 1
## 0 38 54
## 1 20 92
(38+92)/255 #50.99%
## [1] 0.5098039
(20+54)/255 #29.02%
## [1] 0.2901961
Yes, changing the decision rule threshold affected the classification accuracy. The accuracy of the mean decision rule was lower for the observed outcome that matched the predicted outcome (mean decision rule accuracy was 51.0% compared to the .5 decision rule accuracy which was 59.6%), and higher for the outcomes that did not match the prediction (29.0% accuracy for the mean decision rule and 20.4% accuracy for the .5 decision rule).
glmtest<-glm(territorial~socialorg+diet+scale(groupsize),data=model.testing,family=binomial)
summary(glmtest)
##
## Call:
## glm(formula = territorial ~ socialorg + diet + scale(groupsize),
## family = binomial, data = model.testing)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7171 -1.0397 0.7067 0.7788 1.6222
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.30414 0.96953 -0.314 0.7537
## socialorg -0.04362 0.20282 -0.215 0.8297
## dietfrugivore 1.29286 0.65487 1.974 0.0484 *
## dietinsectivore 17.74223 1765.44670 0.010 0.9920
## scale(groupsize) -0.59307 0.47894 -1.238 0.2156
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 68.310 on 50 degrees of freedom
## Residual deviance: 55.475 on 46 degrees of freedom
## AIC: 65.475
##
## Number of Fisher Scoring iterations: 16
test.predict<-predict(glmtrain,newdata=model.testing,type = "response")
head(test.predict)
## 1 2 3 4 5 6
## 0.8030094 0.8337591 0.8030094 0.6357761 1.0000000 1.0000000
#.5 decision rule
decisionrule5.test<-factor(ifelse(test.predict>.5,1,0))
#mean of the outcome decision rule
decisionrulemean.test<-factor(ifelse(test.predict>mean(I(model.testing$territorial==1)),1,0))
#.5 decision rule
table(decisionrule5.test,model.testing$territorial)
##
## decisionrule5.test 0 1
## 0 2 0
## 1 18 31
#mean decision rule
table(decisionrulemean.test,model.testing$territorial)
##
## decisionrulemean.test 0 1
## 0 6 3
## 1 14 28
In the case of the test data, again changing the classification rule threshold affected the classification accuracy. The accuracy of the .5 decision rule was lower for the observed outcome that matched the predicted outcome (.5 decision rule accuracy was 12.9% compared to the mean decision rule accuracy which was 13.3%), and higher for the outcomes that did not match the prediction (7.1%% accuracy for the .5 decision rule and 6.7% accuracy for the mean decision rule).