OCN 683 - Homework 6

Sarah Pennington

In the attached paper the researchers investigated the prevalence of the disease at twelve sites in windward rainforests, over a range of altitudes, in four bird species – three endemic (ʻApapane, Hawaiʻi ʻAmakihi, ʻIʻiwi) and one introduced (Japanese White-eye). Individual birds were assayed for active pox infection (lesions), as well as chronic malaria infection (birds that survive malaria become chronically infected). ʻIʻiwi is highly susceptible to malaria, while ʻApapane and Hawaiʻi ʻAmakihi are moderately susceptible, and the White-eye is resistant.
birds <- read_csv("Pox_Data_revised.csv", 
    col_types = cols(...1 = col_skip()))
## New names:
## • `` -> `...1`
#Each row= 1 bird 
#Condition = binary = infected or not infected with pox 
#Malaria: 1= infected, 0= not infected, 2= unknown 

str(birds)
## spc_tbl_ [11,743 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Site     : chr [1:11743] "AIN" "AIN" "AIN" "AIN" ...
##  $ Species  : chr [1:11743] "APAP" "APAP" "APAP" "APAP" ...
##  $ Bandno   : chr [1:11743] "159174793" "159174964" "159174965" "159174966" ...
##  $ Date     : chr [1:11743] "7/22/04" "7/6/04" "7/7/04" "7/7/04" ...
##  $ Sex      : chr [1:11743] "U" "M" "F" "F" ...
##  $ Age      : chr [1:11743] "H" "H" "H" "H" ...
##  $ Oldpox   : num [1:11743] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Activepox: num [1:11743] 0 1 0 0 0 0 0 0 0 1 ...
##  $ Malaria  : num [1:11743] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Elev     : chr [1:11743] "mid" "mid" "mid" "mid" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ...1 = col_skip(),
##   ..   Site = col_character(),
##   ..   Species = col_character(),
##   ..   Bandno = col_character(),
##   ..   Date = col_character(),
##   ..   Sex = col_character(),
##   ..   Age = col_character(),
##   ..   Oldpox = col_double(),
##   ..   Activepox = col_double(),
##   ..   Malaria = col_double(),
##   ..   Elev = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
birds$Elev <- as.factor(birds$Elev)
birds$Species <- as.factor(birds$Species)
(1) 1. The twelve sites have been coded as low/medium/high elevation. First, summarize the number of observations of each bird species at each elevation. This will be important context for what we can and cannot ask about avian pox patterns. Also, what is the prevalence of chronic malaria in the bird species at the difference elevations? Note that some birds do not have a known malaria status (code = 2). Only use birds with code = 0 or 1 to quantify the prevalence of malaria.
#First, summarize the number of observations of each bird species at each elevation.

birds_elev <- birds %>% 
  group_by(Species, Elev) %>%
  summarise( 
    n=n())
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
print(birds_elev)
## # A tibble: 11 × 3
## # Groups:   Species [4]
##    Species Elev      n
##    <fct>   <fct> <int>
##  1 APAP    high    839
##  2 APAP    low       7
##  3 APAP    mid    1308
##  4 HAAM    high    623
##  5 HAAM    low    1539
##  6 HAAM    mid    3037
##  7 IIWI    high   1100
##  8 IIWI    mid      20
##  9 JAWE    high    283
## 10 JAWE    low     400
## 11 JAWE    mid    2587
  ###APAP at low elevation only has 7 entries, and IIWI at mid elevation only has 20 entries. We should be careful drawing conclusions about these groups given the small sample sizes. 


#Also, what is the prevalence of chronic malaria in the bird species at the difference elevations? Note that some birds do not have a known malaria status (code = 2). Only use birds with code = 0 or 1 to quantify the prevalence of malaria. 
filtered <- birds %>% filter(birds$Malaria != "2")
filtered <- filtered %>% 
  group_by(Species, Elev) %>%
  summarise( 
    n=n(), 
   malaria_count = sum(Malaria, na.rm = TRUE ),
   prevalence = malaria_count/n) 
## `summarise()` has grouped output by 'Species'. You can override using the
## `.groups` argument.
print(filtered) #Lowest and mid elevations have the highest instances of avian malaria. 
## # A tibble: 8 × 5
## # Groups:   Species [3]
##   Species Elev      n malaria_count prevalence
##   <fct>   <fct> <int>         <dbl>      <dbl>
## 1 APAP    high    735            58     0.0789
## 2 APAP    low       7             6     0.857 
## 3 APAP    mid    1217           715     0.588 
## 4 HAAM    high    515            10     0.0194
## 5 HAAM    low    1330          1138     0.856 
## 6 HAAM    mid    2767           486     0.176 
## 7 IIWI    high    948            23     0.0243
## 8 IIWI    mid      19             5     0.263
    ##No JAWE observations have malaria status 
(2) Create a model that tests whether pox prevalence depends on elevation, and whether it differs between species, and whether species differences depend on elevation. Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us.If you look at summary() of the model there is an NA for one of the coefficients. Why do you think this is?
model <- glm(Activepox ~ Elev * Species, data = birds, family= binomial)
summary(model)
## 
## Call:
## glm(formula = Activepox ~ Elev * Species, family = binomial, 
##     data = birds)
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -4.6432     0.3553 -13.070  < 2e-16 ***
## Elevlow               3.7269     0.9090   4.100 4.13e-05 ***
## Elevmid               3.4182     0.3613   9.460  < 2e-16 ***
## SpeciesHAAM           0.8704     0.4464   1.950   0.0512 .  
## SpeciesIIWI           0.8820     0.4088   2.157   0.0310 *  
## SpeciesJAWE          -0.9987     1.0629  -0.940   0.3474    
## Elevlow:SpeciesHAAM  -2.9684     0.9560  -3.105   0.0019 ** 
## Elevmid:SpeciesHAAM  -3.0784     0.4631  -6.647 2.99e-11 ***
## Elevlow:SpeciesIIWI       NA         NA      NA       NA    
## Elevmid:SpeciesIIWI  -1.3916     0.7508  -1.854   0.0638 .  
## Elevlow:SpeciesJAWE  -4.0740     1.6828  -2.421   0.0155 *  
## Elevmid:SpeciesJAWE  -2.6828     1.0895  -2.462   0.0138 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 4362.4  on 11742  degrees of freedom
## Residual deviance: 3568.3  on 11732  degrees of freedom
## AIC: 3590.3
## 
## Number of Fisher Scoring iterations: 8
#If you look at summary() of the model there is an NA for one of the coefficients. Why do you think this is? 
  ###There were no IIWI species at lower elevation, so there is no output for this group 

#Evaluate the model results using the general methods we use in this course.
Anova(model)
## Analysis of Deviance Table (Type II tests)
## 
## Response: Activepox
##              LR Chisq Df Pr(>Chisq)    
## Elev           235.43  2  < 2.2e-16 ***
## Species        645.45  3  < 2.2e-16 ***
## Elev:Species    47.32  5  4.884e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##p values 
##Elev  2.2e-16 ** 
##Species  2.2e-16 ***
##Elev:Species: 4.884e-09 ***

#Looking at residuals 
resid_panel(model)
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion
## Warning in helper_plotly_label(model): NAs introduced by coercion

   #Pretty crazy looking honestly but binomial residuals aren't normally distributed

plot(ggeffect(model, terms = c('Elev', 'Species')))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

#Provide a verbal explanation of what the model tells us.   
    #First, at lower sample numbers, the variation is a lot higher in the effects plot. APAP has higher active pox at mid elevation than HAAM and JAWE. 
(3) Create a subset of the data that only includes birds with known malaria status (0 or 1). Also, you’ll probably want to exclude ʻIʻiwi (look at your answer to #1 to see why). Create a model that tests whether malaria status and elevation affect pox prevalence (and whether those effects differ between species). Evaluate the model results using the general methods we use in this course. Provide a verbal explanation of what the model tells us.
#Create a subset of the data that only includes birds with known malaria status and exclude IIWI. 
filtered <- birds %>% filter(birds$Malaria != "2") ##Got rid of all the J
filtered <- filtered %>% filter(filtered$Species != "IIWI")

#Create a model that tests whether malaria status and elevation affect pox prevalence (and whether those effects differ between species)

model2 <-  glm(Activepox ~ Species*Malaria + Species*Elev, data = filtered, family= binomial)
summary(model2)
## 
## Call:
## glm(formula = Activepox ~ Species * Malaria + Species * Elev, 
##     family = binomial, data = filtered)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          -4.9800     0.3862 -12.895  < 2e-16 ***
## SpeciesHAAM           1.2972     0.4777   2.715  0.00662 ** 
## Malaria               1.8277     0.1809  10.106  < 2e-16 ***
## Elevlow               2.4055     0.9410   2.556  0.01058 *  
## Elevmid               2.4864     0.3980   6.247 4.18e-10 ***
## SpeciesHAAM:Malaria  -0.8783     0.2797  -3.140  0.00169 ** 
## SpeciesHAAM:Elevlow  -2.5614     1.0085  -2.540  0.01109 *  
## SpeciesHAAM:Elevmid  -2.5313     0.5040  -5.023 5.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3276.3  on 6570  degrees of freedom
## Residual deviance: 2624.3  on 6563  degrees of freedom
## AIC: 2640.3
## 
## Number of Fisher Scoring iterations: 7
#Effects plots   
plot(ggeffect(model2, terms = c('Elev', 'Species','Malaria')))

  ###Active Malaria at mid-elevation there was a lot of Activepox in APAP
library(effects)
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(model2))

#Evaluate the models 
    ## From the effects plot we can see that in APAP in low and mid elevations there is more active pox than in high elevation. In HAAM, theres no major difference in active pox across elevations. More malaria prevalence is also associated with more active pox in both species, but the effect is more pronounced in APAP.