Load libraries and data

library(here)
library(tidyverse)
library(GGally)
library(gridExtra)
library(arm)
library(ggeffects)
library(car)
library(ggResidpanel)
library(patchwork)

pox <- read.csv(here("Week_05", "Data", "Pox_Data_revised.csv")) 

Question 1

1. The twelve sites have been coded as low/medium/high elevation. First, summarize the number of observations of each bird species at each elevation. This will be important context for what we can and cannot ask about avian pox patterns. Also, what is the prevalence of chronic malaria in the bird species at the difference elevations? Note that some birds do not have a known malaria status (code = 2). Only use birds with code = 0 or 1 to quantify the prevalence of malaria.

Observations of each bird species at each elevation

# number of observations of each bird species at each elevation
# we want to group by elevation and species

pox %>%
  group_by(Species, Elev) %>%
  tally()
## # A tibble: 11 × 3
## # Groups:   Species [4]
##    Species Elev      n
##    <chr>   <chr> <int>
##  1 APAP    high    839
##  2 APAP    low       7
##  3 APAP    mid    1308
##  4 HAAM    high    623
##  5 HAAM    low    1539
##  6 HAAM    mid    3037
##  7 IIWI    high   1100
##  8 IIWI    mid      20
##  9 JAWE    high    283
## 10 JAWE    low     400
## 11 JAWE    mid    2587

Prevalence of chronic malaria

pox %>%
  mutate(Elev = factor(Elev, levels = (c("low", "mid", "high")))) %>%
  filter(Malaria %in% c('0','1')) %>%
  group_by(Species, Elev, Malaria) %>%
  tally() %>%
  mutate(Malaria_Prevalence = n / sum(n)) %>%
  filter(Malaria == 1)
## # A tibble: 8 × 5
## # Groups:   Species, Elev [8]
##   Species Elev  Malaria     n Malaria_Prevalence
##   <chr>   <fct>   <int> <int>              <dbl>
## 1 APAP    low         1     6             0.857 
## 2 APAP    mid         1   715             0.588 
## 3 APAP    high        1    58             0.0789
## 4 HAAM    low         1  1138             0.856 
## 5 HAAM    mid         1   486             0.176 
## 6 HAAM    high        1    10             0.0194
## 7 IIWI    mid         1     5             0.263 
## 8 IIWI    high        1    23             0.0243

We observe that malaria prevalence is the highest for ʻApapane and Hawaiʻi ʻAmakihi in low elevation. ʻIʻiwi has a higher prevalence of malaria in mid elevation compared to high elevation. There are no occurrences of Japanese White-eye cases with malaria. These findings are congruent with the study’s findings that pox infection rates were typically highest in low-elevation forests, followed by mid-elevation forests, and lowest in high-elevation forests.

Question 2

2. Create a model that tests whether pox prevalence depends on elevation, and whether it differs between species, and whether species differences depend on elevation. Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us. If you look at summary() of the model there is an NA for one of the coefficients. Why do you think this is?

Pox prevalence model

# binomial GLM to look at presence/absence of pox

poxESmodel = glm(Activepox ~ Elev*Species, 
                 data = pox,
                 family = binomial)
summary(poxESmodel)
## 
## Call:
## glm(formula = Activepox ~ Elev * Species, family = binomial, 
##     data = pox)
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -4.6432     0.3553 -13.070  < 2e-16 ***
## Elevlow               3.7269     0.9090   4.100 4.13e-05 ***
## Elevmid               3.4182     0.3613   9.460  < 2e-16 ***
## SpeciesHAAM           0.8704     0.4464   1.950   0.0512 .  
## SpeciesIIWI           0.8820     0.4088   2.157   0.0310 *  
## SpeciesJAWE          -0.9987     1.0629  -0.940   0.3474    
## Elevlow:SpeciesHAAM  -2.9684     0.9560  -3.105   0.0019 ** 
## Elevmid:SpeciesHAAM  -3.0784     0.4631  -6.647 2.99e-11 ***
## Elevlow:SpeciesIIWI       NA         NA      NA       NA    
## Elevmid:SpeciesIIWI  -1.3916     0.7508  -1.854   0.0638 .  
## Elevlow:SpeciesJAWE  -4.0740     1.6828  -2.421   0.0155 *  
## Elevmid:SpeciesJAWE  -2.6828     1.0895  -2.462   0.0138 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4362.4  on 11742  degrees of freedom
## Residual deviance: 3568.3  on 11732  degrees of freedom
## AIC: 3590.3
## 
## Number of Fisher Scoring iterations: 8

Likelihood ratio test

Anova(poxESmodel)
## Analysis of Deviance Table (Type II tests)
## 
## Response: Activepox
##              LR Chisq Df Pr(>Chisq)    
## Elev           235.43  2  < 2.2e-16 ***
## Species        645.45  3  < 2.2e-16 ***
## Elev:Species    47.32  5  4.884e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The likelihood ratio test yields low p values for elevation, species, and the interaction between elevation and species which indicates that elevation level, species type, and both elevation and species type together are significant predictors of cases of pox.

Residuals plot

resid_panel(poxESmodel, plots = c('resid', 'qq', 'lev', 'hist')) #residuals plot

The model does not fall under a normal distribution as we can interpret from the histogram and Q-Q plot. The residual plot shows the binomial nature of the residuals and is heteroscedastic. We can see that this is not a good fit for the model.

Effects plot

plot(ggeffect(poxESmodel, terms = c('Elev', 'Species'))) #effects plot

We see at low elevation, ʻApapane have the highest cases of active pox. It appears that collectively, the lowest average cases of active pox acrooss the bird species occurs at high elevation. There is likely an NA value at low elevation for ʻIʻiwi in the summary output because there are no observed ʻIʻiwi at low elevation.

Question 3

3. Create a subset of the data that only includes birds with known malaria status (0 or 1). Also, you’ll probably want to exclude ʻIʻiwi (look at your answer to #1 to see why). Create a model that tests whether malaria status and elevation affect pox prevalence (and whether those effects differ between species). Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us.

Malaria status & elevation

# subset of data
malariadata <- pox %>% 
#I get very different answers if I leave this line of code in:  mutate(Elev = factor(Elev, levels = (c("low", "mid", "high")))). When would I want to have this as factors?
  filter(Malaria != 2, #filter for all values not equal to 2
         Species != "IIWI") #filter for all values not equal to IIWI

# model that tests whether malaria status and elevation affect pox prevalence
malariamodel = glm(Activepox ~ Species*Malaria + Species*Elev,
                   data = malariadata,
                   family = binomial)
summary(malariamodel)
## 
## Call:
## glm(formula = Activepox ~ Species * Malaria + Species * Elev, 
##     family = binomial, data = malariadata)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -4.9800     0.3862 -12.895  < 2e-16 ***
## SpeciesHAAM           1.2972     0.4777   2.715  0.00662 ** 
## Malaria               1.8277     0.1809  10.106  < 2e-16 ***
## Elevlow               2.4055     0.9410   2.556  0.01058 *  
## Elevmid               2.4864     0.3980   6.247 4.18e-10 ***
## SpeciesHAAM:Malaria  -0.8783     0.2797  -3.140  0.00169 ** 
## SpeciesHAAM:Elevlow  -2.5614     1.0085  -2.540  0.01109 *  
## SpeciesHAAM:Elevmid  -2.5313     0.5040  -5.023 5.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3276.3  on 6570  degrees of freedom
## Residual deviance: 2624.3  on 6563  degrees of freedom
## AIC: 2640.3
## 
## Number of Fisher Scoring iterations: 7

From the summary, we can interpret that the positive slope of malaria, low elevation, and mid elevation indicate that there cases of active pox increase with these three predictors for ʻApapane. It appears that malaria and mid elevation are significant predictors with much smaller p-values than low elevation, although low elevation still appears to be statistically significant.

The negative slopes of Hawaiʻi ʻAmakihi for malaria, low elevation, and mid elevation indicate that this species is less sensitive than ʻApapane to these three predictors.

Likelihood ratio test

Anova(malariamodel)
## Analysis of Deviance Table (Type II tests)
## 
## Response: Activepox
##                 LR Chisq Df Pr(>Chisq)    
## Species          151.175  1  < 2.2e-16 ***
## Malaria          141.429  1  < 2.2e-16 ***
## Elev              46.717  2  7.169e-11 ***
## Species:Malaria   10.184  1   0.001416 ** 
## Species:Elev      25.947  2  2.322e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residuals plot

resid_panel(malariamodel, plots = c('resid', 'qq', 'lev', 'hist')) #residuals plot

The model does not fall under a normal distribution as we can interpret from the histogram and Q-Q plot. The residual plot shows the binomial nature of the residuals and is heteroscedastic. We can see that this is not a good fit for the model. Given the disclaimer, it can be assumed that the a generalized linear mixed model would be a better fit because there may be large differences between sites in the same elevation group.

Effects plot

plot(ggeffect(malariamodel, terms = c('Elev', 'Species', 'Malaria'))) #effects plot

We see at low and mid elevation, ʻApapane have the highest cases of active pox and more of them they will have pox if they have malaria. It appears that there is a significant difference between Hawaiʻi ʻAmakihi and ʻApapane as elevation levels change. Hawaiʻi ʻAmakihi appear to have nearly the same level of active pox whether or not they have malaria at any elevation level.