library(here)
library(tidyverse)
library(GGally)
library(gridExtra)
library(arm)
library(ggeffects)
library(car)
library(ggResidpanel)
library(patchwork)
pox <- read.csv(here("Week_05", "Data", "Pox_Data_revised.csv"))
1. The twelve sites have been coded as low/medium/high elevation. First, summarize the number of observations of each bird species at each elevation. This will be important context for what we can and cannot ask about avian pox patterns. Also, what is the prevalence of chronic malaria in the bird species at the difference elevations? Note that some birds do not have a known malaria status (code = 2). Only use birds with code = 0 or 1 to quantify the prevalence of malaria.
# number of observations of each bird species at each elevation
# we want to group by elevation and species
pox %>%
group_by(Species, Elev) %>%
tally()
## # A tibble: 11 × 3
## # Groups: Species [4]
## Species Elev n
## <chr> <chr> <int>
## 1 APAP high 839
## 2 APAP low 7
## 3 APAP mid 1308
## 4 HAAM high 623
## 5 HAAM low 1539
## 6 HAAM mid 3037
## 7 IIWI high 1100
## 8 IIWI mid 20
## 9 JAWE high 283
## 10 JAWE low 400
## 11 JAWE mid 2587
pox %>%
mutate(Elev = factor(Elev, levels = (c("low", "mid", "high")))) %>%
filter(Malaria %in% c('0','1')) %>%
group_by(Species, Elev, Malaria) %>%
tally() %>%
mutate(Malaria_Prevalence = n / sum(n)) %>%
filter(Malaria == 1)
## # A tibble: 8 × 5
## # Groups: Species, Elev [8]
## Species Elev Malaria n Malaria_Prevalence
## <chr> <fct> <int> <int> <dbl>
## 1 APAP low 1 6 0.857
## 2 APAP mid 1 715 0.588
## 3 APAP high 1 58 0.0789
## 4 HAAM low 1 1138 0.856
## 5 HAAM mid 1 486 0.176
## 6 HAAM high 1 10 0.0194
## 7 IIWI mid 1 5 0.263
## 8 IIWI high 1 23 0.0243
We observe that malaria prevalence is the highest for ʻApapane and Hawaiʻi ʻAmakihi in low elevation. ʻIʻiwi has a higher prevalence of malaria in mid elevation compared to high elevation. There are no occurrences of Japanese White-eye cases with malaria. These findings are congruent with the study’s findings that pox infection rates were typically highest in low-elevation forests, followed by mid-elevation forests, and lowest in high-elevation forests.
2. Create a model that tests whether pox prevalence depends on elevation, and whether it differs between species, and whether species differences depend on elevation. Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us. If you look at summary() of the model there is an NA for one of the coefficients. Why do you think this is?
# binomial GLM to look at presence/absence of pox
poxESmodel = glm(Activepox ~ Elev*Species,
data = pox,
family = binomial)
summary(poxESmodel)
##
## Call:
## glm(formula = Activepox ~ Elev * Species, family = binomial,
## data = pox)
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.6432 0.3553 -13.070 < 2e-16 ***
## Elevlow 3.7269 0.9090 4.100 4.13e-05 ***
## Elevmid 3.4182 0.3613 9.460 < 2e-16 ***
## SpeciesHAAM 0.8704 0.4464 1.950 0.0512 .
## SpeciesIIWI 0.8820 0.4088 2.157 0.0310 *
## SpeciesJAWE -0.9987 1.0629 -0.940 0.3474
## Elevlow:SpeciesHAAM -2.9684 0.9560 -3.105 0.0019 **
## Elevmid:SpeciesHAAM -3.0784 0.4631 -6.647 2.99e-11 ***
## Elevlow:SpeciesIIWI NA NA NA NA
## Elevmid:SpeciesIIWI -1.3916 0.7508 -1.854 0.0638 .
## Elevlow:SpeciesJAWE -4.0740 1.6828 -2.421 0.0155 *
## Elevmid:SpeciesJAWE -2.6828 1.0895 -2.462 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4362.4 on 11742 degrees of freedom
## Residual deviance: 3568.3 on 11732 degrees of freedom
## AIC: 3590.3
##
## Number of Fisher Scoring iterations: 8
Anova(poxESmodel)
## Analysis of Deviance Table (Type II tests)
##
## Response: Activepox
## LR Chisq Df Pr(>Chisq)
## Elev 235.43 2 < 2.2e-16 ***
## Species 645.45 3 < 2.2e-16 ***
## Elev:Species 47.32 5 4.884e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The likelihood ratio test yields low p values for elevation, species, and the interaction between elevation and species which indicates that elevation level, species type, and both elevation and species type together are significant predictors of cases of pox.
resid_panel(poxESmodel, plots = c('resid', 'qq', 'lev', 'hist')) #residuals plot
The model does not fall under a normal distribution as we can interpret from the histogram and Q-Q plot. The residual plot shows the binomial nature of the residuals and is heteroscedastic. We can see that this is not a good fit for the model.
plot(ggeffect(poxESmodel, terms = c('Elev', 'Species'))) #effects plot
We see at low elevation, ʻApapane have the highest cases of active pox. It appears that collectively, the lowest average cases of active pox acrooss the bird species occurs at high elevation. There is likely an NA value at low elevation for ʻIʻiwi in the summary output because there are no observed ʻIʻiwi at low elevation.
3. Create a subset of the data that only includes birds with known malaria status (0 or 1). Also, you’ll probably want to exclude ʻIʻiwi (look at your answer to #1 to see why). Create a model that tests whether malaria status and elevation affect pox prevalence (and whether those effects differ between species). Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us.
# subset of data
malariadata <- pox %>%
#I get very different answers if I leave this line of code in: mutate(Elev = factor(Elev, levels = (c("low", "mid", "high")))). When would I want to have this as factors?
filter(Malaria != 2, #filter for all values not equal to 2
Species != "IIWI") #filter for all values not equal to IIWI
# model that tests whether malaria status and elevation affect pox prevalence
malariamodel = glm(Activepox ~ Species*Malaria + Species*Elev,
data = malariadata,
family = binomial)
summary(malariamodel)
##
## Call:
## glm(formula = Activepox ~ Species * Malaria + Species * Elev,
## family = binomial, data = malariadata)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9800 0.3862 -12.895 < 2e-16 ***
## SpeciesHAAM 1.2972 0.4777 2.715 0.00662 **
## Malaria 1.8277 0.1809 10.106 < 2e-16 ***
## Elevlow 2.4055 0.9410 2.556 0.01058 *
## Elevmid 2.4864 0.3980 6.247 4.18e-10 ***
## SpeciesHAAM:Malaria -0.8783 0.2797 -3.140 0.00169 **
## SpeciesHAAM:Elevlow -2.5614 1.0085 -2.540 0.01109 *
## SpeciesHAAM:Elevmid -2.5313 0.5040 -5.023 5.09e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3276.3 on 6570 degrees of freedom
## Residual deviance: 2624.3 on 6563 degrees of freedom
## AIC: 2640.3
##
## Number of Fisher Scoring iterations: 7
From the summary, we can interpret that the positive slope of malaria, low elevation, and mid elevation indicate that there cases of active pox increase with these three predictors for ʻApapane. It appears that malaria and mid elevation are significant predictors with much smaller p-values than low elevation, although low elevation still appears to be statistically significant.
The negative slopes of Hawaiʻi ʻAmakihi for malaria, low elevation, and mid elevation indicate that this species is less sensitive than ʻApapane to these three predictors.
Anova(malariamodel)
## Analysis of Deviance Table (Type II tests)
##
## Response: Activepox
## LR Chisq Df Pr(>Chisq)
## Species 151.175 1 < 2.2e-16 ***
## Malaria 141.429 1 < 2.2e-16 ***
## Elev 46.717 2 7.169e-11 ***
## Species:Malaria 10.184 1 0.001416 **
## Species:Elev 25.947 2 2.322e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
resid_panel(malariamodel, plots = c('resid', 'qq', 'lev', 'hist')) #residuals plot
The model does not fall under a normal distribution as we can interpret from the histogram and Q-Q plot. The residual plot shows the binomial nature of the residuals and is heteroscedastic. We can see that this is not a good fit for the model. Given the disclaimer, it can be assumed that the a generalized linear mixed model would be a better fit because there may be large differences between sites in the same elevation group.
plot(ggeffect(malariamodel, terms = c('Elev', 'Species', 'Malaria'))) #effects plot
We see at low and mid elevation, ʻApapane have the highest cases of active pox and more of them they will have pox if they have malaria. It appears that there is a significant difference between Hawaiʻi ʻAmakihi and ʻApapane as elevation levels change. Hawaiʻi ʻAmakihi appear to have nearly the same level of active pox whether or not they have malaria at any elevation level.