Research Question

What life history variables influence whether a population of primates will be territorial?

Binary outcome: Territorial (yes/no)

Predictor variables: diet, social organization (used last week) Additional variable added to model: group size

LifeHistory_territorial<-read_xlsx("/Users/alliesheldon/Documents/UTSA/Classes/Demography/LifeHistory_territorial.xlsx")
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## The following object is masked from 'package:survival':
## 
##     cluster
set.seed(1004)
training<-createDataPartition(y=LifeHistory_territorial$territorial,p=.80,list=F)
model.training<-LifeHistory_territorial[training,]
model.testing<-LifeHistory_territorial[-training,]

Training Data

table(model.training$territorial)
## 
##   0   1 
##  58 146
prop.table(table(model.training$territorial))
## 
##         0         1 
## 0.2843137 0.7156863
summary(model.training)
##  Common Name        Species Name          Genus             Species         
##  Length:204         Length:204         Length:204         Length:204        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     malecare        alloparent        bodymass          socialorg    
##  Min.   :0.0000   Min.   :0.0000   Min.   :    62.3   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:1.0000   1st Qu.:   756.9   1st Qu.:2.750  
##  Median :1.0000   Median :1.0000   Median :  3625.0   Median :4.000  
##  Mean   :0.5159   Mean   :0.9444   Mean   :  8275.0   Mean   :3.882  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:  6723.2   3rd Qu.:5.000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :117549.5   Max.   :7.000  
##  NA's   :78       NA's   :60       NA's   :90                        
##  reproductive         groupsize        territorial         diet          
##  Length:204         Min.   :  1.000   Min.   :0.0000   Length:204        
##  Class :character   1st Qu.:  5.327   1st Qu.:0.0000   Class :character  
##  Mode  :character   Median :  9.545   Median :1.0000   Mode  :character  
##                     Mean   : 22.666   Mean   :0.7157                     
##                     3rd Qu.: 22.667   3rd Qu.:1.0000                     
##                     Max.   :425.000   Max.   :1.0000                     
## 
glmtrain<-glm(territorial~socialorg+diet+scale(groupsize),data=model.training,family=binomial)
summary(glmtrain)
## 
## Call:
## glm(formula = territorial ~ socialorg + diet + scale(groupsize), 
##     family = binomial, data = model.training)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9622  -1.1570   0.6584   0.8692   1.2704  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        0.02064    0.48074   0.043   0.9658  
## socialorg          0.06910    0.10230   0.675   0.4994  
## dietfrugivore      0.82788    0.34186   2.422   0.0154 *
## dietgum           -0.54278    1.45618  -0.373   0.7093  
## dietinsectivore   17.01192  883.89937   0.019   0.9846  
## scale(groupsize)  -0.97218    0.39674  -2.450   0.0143 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 243.57  on 203  degrees of freedom
## Residual deviance: 209.07  on 198  degrees of freedom
## AIC: 221.07
## 
## Number of Fisher Scoring iterations: 16
train.predict<-predict(glmtrain,newdata=model.training,type = "response")
head(train.predict)
##         1         2         3         4         5         6 
## 1.0000000 0.7945900 0.8056388 0.8162307 0.8541353 0.7935138
#.5 decision rule
decisionrule5<-factor(ifelse(train.predict>.5,1,0))

#mean of the outcome decision rule
decisionrulemean<-factor(ifelse(train.predict>mean(I(model.training$territorial==1)),1,0))

The function for the confusion matrix accuracy was giving me an error with all thresholds and mean that I tried.

#cmtrain<-confusionMatrix(data=decisionrule5,reference = model.training$territorial)

So I calculated the proportions divided by the total N to get a measure of the accuracy for the .5 and mean decision rules.

#.5 decision rule
table(decisionrule5,model.training$territorial)
##              
## decisionrule5   0   1
##             0   9   3
##             1  49 143
(9+143)/255 #59.61%
## [1] 0.5960784
(3+49)/255 #20.39%
## [1] 0.2039216
#confusion matrix mean decision rule
table(decisionrulemean,model.training$territorial)
##                 
## decisionrulemean  0  1
##                0 38 54
##                1 20 92
(38+92)/255 #50.99%
## [1] 0.5098039
(20+54)/255 #29.02%
## [1] 0.2901961

Yes, changing the decision rule threshold affected the classification accuracy. The accuracy of the mean decision rule was lower for the observed outcome that matched the predicted outcome (mean decision rule accuracy was 51.0% compared to the .5 decision rule accuracy which was 59.6%), and higher for the outcomes that did not match the prediction (29.0% accuracy for the mean decision rule and 20.4% accuracy for the .5 decision rule).

Test Data

glmtest<-glm(territorial~socialorg+diet+scale(groupsize),data=model.testing,family=binomial)
summary(glmtest)
## 
## Call:
## glm(formula = territorial ~ socialorg + diet + scale(groupsize), 
##     family = binomial, data = model.testing)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7171  -1.0397   0.7067   0.7788   1.6222  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)  
## (Intercept)        -0.30414    0.96953  -0.314   0.7537  
## socialorg          -0.04362    0.20282  -0.215   0.8297  
## dietfrugivore       1.29286    0.65487   1.974   0.0484 *
## dietinsectivore    17.74223 1765.44670   0.010   0.9920  
## scale(groupsize)   -0.59307    0.47894  -1.238   0.2156  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 68.310  on 50  degrees of freedom
## Residual deviance: 55.475  on 46  degrees of freedom
## AIC: 65.475
## 
## Number of Fisher Scoring iterations: 16
test.predict<-predict(glmtrain,newdata=model.testing,type = "response")
head(test.predict)
##         1         2         3         4         5         6 
## 0.8030094 0.8337591 0.8030094 0.6357761 1.0000000 1.0000000
#.5 decision rule
decisionrule5.test<-factor(ifelse(test.predict>.5,1,0))

#mean of the outcome decision rule
decisionrulemean.test<-factor(ifelse(test.predict>mean(I(model.testing$territorial==1)),1,0))
#.5 decision rule
table(decisionrule5.test,model.testing$territorial)
##                   
## decisionrule5.test  0  1
##                  0  2  0
##                  1 18 31
#mean decision rule
table(decisionrulemean.test,model.testing$territorial)
##                      
## decisionrulemean.test  0  1
##                     0  6  3
##                     1 14 28

In the case of the test data, again changing the classification rule threshold affected the classification accuracy. The accuracy of the .5 decision rule was lower for the observed outcome that matched the predicted outcome (.5 decision rule accuracy was 12.9% compared to the mean decision rule accuracy which was 13.3%), and higher for the outcomes that did not match the prediction (7.1%% accuracy for the .5 decision rule and 6.7% accuracy for the mean decision rule).