Sarah Pennington
In the attached paper the researchers investigated the prevalence of
the disease at twelve sites in windward rainforests, over a range of
altitudes, in four bird species – three endemic (ʻApapane, Hawaiʻi
ʻAmakihi, ʻIʻiwi) and one introduced (Japanese White-eye). Individual
birds were assayed for active pox infection (lesions), as well as
chronic malaria infection (birds that survive malaria become chronically
infected). ʻIʻiwi is highly susceptible to malaria, while ʻApapane and
Hawaiʻi ʻAmakihi are moderately susceptible, and the White-eye is
resistant.
birds <- read_csv("Pox_Data_revised.csv",
col_types = cols(...1 = col_skip()))
## New names:
## • `` -> `...1`
#Each row= 1 bird
#Condition = binary = infected or not infected with pox
#Malaria: 1= infected, 0= not infected, 2= unknown
str(birds)
## spc_tbl_ [11,743 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Site : chr [1:11743] "AIN" "AIN" "AIN" "AIN" ...
## $ Species : chr [1:11743] "APAP" "APAP" "APAP" "APAP" ...
## $ Bandno : chr [1:11743] "159174793" "159174964" "159174965" "159174966" ...
## $ Date : chr [1:11743] "7/22/04" "7/6/04" "7/7/04" "7/7/04" ...
## $ Sex : chr [1:11743] "U" "M" "F" "F" ...
## $ Age : chr [1:11743] "H" "H" "H" "H" ...
## $ Oldpox : num [1:11743] 0 0 0 0 0 0 0 0 0 0 ...
## $ Activepox: num [1:11743] 0 1 0 0 0 0 0 0 0 1 ...
## $ Malaria : num [1:11743] 2 2 2 2 2 2 2 2 2 2 ...
## $ Elev : chr [1:11743] "mid" "mid" "mid" "mid" ...
## - attr(*, "spec")=
## .. cols(
## .. ...1 = col_skip(),
## .. Site = col_character(),
## .. Species = col_character(),
## .. Bandno = col_character(),
## .. Date = col_character(),
## .. Sex = col_character(),
## .. Age = col_character(),
## .. Oldpox = col_double(),
## .. Activepox = col_double(),
## .. Malaria = col_double(),
## .. Elev = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
birds$Elev <- as.factor(birds$Elev)
birds$Species <- as.factor(birds$Species)
(1) 1. The twelve sites have been coded as low/medium/high
elevation. First, summarize the number of observations of each bird
species at each elevation. This will be important context for what we
can and cannot ask about avian pox patterns. Also, what is the
prevalence of chronic malaria in the bird species at the difference
elevations? Note that some birds do not have a known malaria status
(code = 2). Only use birds with code = 0 or 1 to quantify the prevalence
of malaria.
#First, summarize the number of observations of each bird species at each elevation.
birds_elev <- birds %>%
group_by(Species, Elev) %>%
summarise(
n=n())
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
print(birds_elev)
## # A tibble: 11 × 3
## # Groups: Species [4]
## Species Elev n
## <fct> <fct> <int>
## 1 APAP high 839
## 2 APAP low 7
## 3 APAP mid 1308
## 4 HAAM high 623
## 5 HAAM low 1539
## 6 HAAM mid 3037
## 7 IIWI high 1100
## 8 IIWI mid 20
## 9 JAWE high 283
## 10 JAWE low 400
## 11 JAWE mid 2587
###APAP at low elevation only has 7 entries, and IIWI at mid elevation only has 20 entries. We should be careful drawing conclusions about these groups given the small sample sizes.
#Also, what is the prevalence of chronic malaria in the bird species at the difference elevations? Note that some birds do not have a known malaria status (code = 2). Only use birds with code = 0 or 1 to quantify the prevalence of malaria.
filtered <- birds %>% filter(birds$Malaria != "2")
filtered <- filtered %>%
group_by(Species, Elev) %>%
summarise(
n=n(),
malaria_count = sum(Malaria, na.rm = TRUE ),
prevalence = malaria_count/n)
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
print(filtered) #Lowest and mid elevations have the highest instances of avian malaria.
## # A tibble: 8 × 5
## # Groups: Species [3]
## Species Elev n malaria_count prevalence
## <fct> <fct> <int> <dbl> <dbl>
## 1 APAP high 735 58 0.0789
## 2 APAP low 7 6 0.857
## 3 APAP mid 1217 715 0.588
## 4 HAAM high 515 10 0.0194
## 5 HAAM low 1330 1138 0.856
## 6 HAAM mid 2767 486 0.176
## 7 IIWI high 948 23 0.0243
## 8 IIWI mid 19 5 0.263
##No JAWE observations have malaria status
(2) Create a model that tests whether pox prevalence depends on
elevation, and whether it differs between species, and whether species
differences depend on elevation. Evaluate the model results using the
general methods we use in this course. Provide a verbal explanation of
what the model tells us.If you look at summary() of the model there is
an NA for one of the coefficients. Why do you think this is?
model <- glm(Activepox ~ Elev * Species, data = birds, family= binomial)
summary(model)
##
## Call:
## glm(formula = Activepox ~ Elev * Species, family = binomial,
## data = birds)
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.6432 0.3553 -13.070 < 2e-16 ***
## Elevlow 3.7269 0.9090 4.100 4.13e-05 ***
## Elevmid 3.4182 0.3613 9.460 < 2e-16 ***
## SpeciesHAAM 0.8704 0.4464 1.950 0.0512 .
## SpeciesIIWI 0.8820 0.4088 2.157 0.0310 *
## SpeciesJAWE -0.9987 1.0629 -0.940 0.3474
## Elevlow:SpeciesHAAM -2.9684 0.9560 -3.105 0.0019 **
## Elevmid:SpeciesHAAM -3.0784 0.4631 -6.647 2.99e-11 ***
## Elevlow:SpeciesIIWI NA NA NA NA
## Elevmid:SpeciesIIWI -1.3916 0.7508 -1.854 0.0638 .
## Elevlow:SpeciesJAWE -4.0740 1.6828 -2.421 0.0155 *
## Elevmid:SpeciesJAWE -2.6828 1.0895 -2.462 0.0138 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 4362.4 on 11742 degrees of freedom
## Residual deviance: 3568.3 on 11732 degrees of freedom
## AIC: 3590.3
##
## Number of Fisher Scoring iterations: 8
#If you look at summary() of the model there is an NA for one of the coefficients. Why do you think this is?
###There were no IIWI species at lower elevation, so there is no output for this group
#Evaluate the model results using the general methods we use in this course.
Anova(model)
## Analysis of Deviance Table (Type II tests)
##
## Response: Activepox
## LR Chisq Df Pr(>Chisq)
## Elev 235.43 2 < 2.2e-16 ***
## Species 645.45 3 < 2.2e-16 ***
## Elev:Species 47.32 5 4.884e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##p values
##Elev 2.2e-16 **
##Species 2.2e-16 ***
##Elev:Species: 4.884e-09 ***
#Looking at residuals
resid_panel(model)
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion

#Pretty crazy looking honestly but binomial residuals aren't normally distributed
plot(ggeffect(model, terms = c('Elev', 'Species')))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

#Provide a verbal explanation of what the model tells us.
#First, at lower sample numbers, the variation is a lot higher in the effects plot. APAP has higher active pox at mid elevation than HAAM and JAWE.
(3) Create a subset of the data that only includes birds with known
malaria status (0 or 1). Also, you’ll probably want to exclude ʻIʻiwi
(look at your answer to #1 to see why). Create a model that tests
whether malaria status and elevation affect pox prevalence (and whether
those effects differ between species). Evaluate the model results using
the general methods we use in this course. Provide a verbal explanation
of what the model tells us.
#Create a subset of the data that only includes birds with known malaria status and exclude IIWI.
filtered <- birds %>% filter(birds$Malaria != "2") ##Got rid of all the J
filtered <- filtered %>% filter(filtered$Species != "IIWI")
#Create a model that tests whether malaria status and elevation affect pox prevalence (and whether those effects differ between species)
model2 <- glm(Activepox ~ Species*Malaria + Species*Elev, data = filtered, family= binomial)
summary(model2)
##
## Call:
## glm(formula = Activepox ~ Species * Malaria + Species * Elev,
## family = binomial, data = filtered)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.9800 0.3862 -12.895 < 2e-16 ***
## SpeciesHAAM 1.2972 0.4777 2.715 0.00662 **
## Malaria 1.8277 0.1809 10.106 < 2e-16 ***
## Elevlow 2.4055 0.9410 2.556 0.01058 *
## Elevmid 2.4864 0.3980 6.247 4.18e-10 ***
## SpeciesHAAM:Malaria -0.8783 0.2797 -3.140 0.00169 **
## SpeciesHAAM:Elevlow -2.5614 1.0085 -2.540 0.01109 *
## SpeciesHAAM:Elevmid -2.5313 0.5040 -5.023 5.09e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3276.3 on 6570 degrees of freedom
## Residual deviance: 2624.3 on 6563 degrees of freedom
## AIC: 2640.3
##
## Number of Fisher Scoring iterations: 7
#Effects plots
plot(ggeffect(model2, terms = c('Elev', 'Species','Malaria')))

###Active Malaria at mid-elevation there was a lot of Activepox in APAP
library(effects)
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(model2))

#Evaluate the models
## From the effects plot we can see that in APAP in low and mid elevations there is more active pox than in high elevation. In HAAM, theres no major difference in active pox across elevations. More malaria prevalence is also associated with more active pox in both species, but the effect is more pronounced in APAP.