EXAMPLE

Load packages

library(ggplot2)
library(tidyverse)
library(knitr)
library(MASS)
library(car)
library(MuMIn)
library(multcomp)

Load data

data <- read.csv("PAINTS_data_v3.csv")
data$site <- as.factor(data$site)

Plot data

ggplot(data, aes(site, flowering_species_richness)) +
  geom_boxplot() +
  theme_classic() +
  xlab("Site") +
  ylab("Flowering Species Richness")

Figure 1. Species richness at three different sites near the Chicago Botanic Garden. Data is from the Plant-Animal Interactions course project. Data collected by Sam Rosa, Andrew Davies, and myself.

What does the response variable look like?

hist(data$flowering_species_richness, data=data)

## Warning in plot.window(xlim, ylim, "", ...): "data" is not a graphical parameter

## Warning in title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...): "data"
## is not a graphical parameter

## Warning in axis(1, ...): "data" is not a graphical parameter

## Warning in axis(2, ...): "data" is not a graphical parameter

Figure 2. These data do not look like they follow a normal distribution. There appears to be a lot of data clustered at lower values, which is a result of this being count data.

Create a GLM with a Poisson error distribution and view the model summary
[code adapted from PAINTS salty nectar R activity by Paul]

sr.poisson <- glm(flowering_species_richness ~ site, family = poisson, data = data)
car::Anova(sr.poisson, type = 3, test.statistic = "F")

## Analysis of Deviance Table (Type III tests)
## 
## Response: flowering_species_richness
## Error estimate based on Pearson residuals 
## 
##            Sum Sq Df F values  Pr(>F)  
## site       6.4781  2   4.6738 0.01807 *
## Residuals 18.7116 27                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Create a GLM with a negative binomial error distribution and view the model summary
[code adapted from PAINTS salty nectar R activity by Paul]

sr.nb <- MASS::glm.nb(flowering_species_richness ~ site, data = data)

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

car::Anova(sr.nb, type = 3, test.statistic = "F") #view summary

## Analysis of Deviance Table (Type III tests)
## 
## Response: flowering_species_richness
## Error estimate based on Pearson residuals 
## 
##            Sum Sq Df F values  Pr(>F)  
## site       6.4779  2   4.6738 0.01807 *
## Residuals 18.7108 27                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which model is best? Compare AIC values to find out.

AIC <- AICc(sr.poisson, sr.nb)
knitr::kable(AIC,
             align = "ccc",
             caption = "Table 1. AIC values of two GLMs with Poisson or negative binomial error distributions. The model with the Poisson distribution is approximately 3 values lower than the other, suggesting that the GLM with the Poisson distribution is a better model for the data.")

Table 1. AIC values of two GLMs with Poisson or negative binomial error distributions. The model with the Poisson distribution is approximately 3 values lower than the other, suggesting that the GLM with the Poisson distribution is a better model for the data.
	df	AICc
sr.poisson	3	87.84062
sr.nb	4	90.51802

Conduct a post-hoc test for the GLM with the Poisson distribution to detect which groups are different from one another.

tukey.sr.poisson <- glht(sr.poisson, mcp(site = "Tukey"))
summary(tukey.sr.poisson)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glm(formula = flowering_species_richness ~ site, family = poisson, 
##     data = data)
## 
## Linear Hypotheses:
##                                      Estimate Std. Error z value Pr(>|z|)  
## McDonald Woods - Barbara Brown == 0    0.1112     0.3338   0.333   0.9400  
## Skokie Lagoons - Barbara Brown == 0   -0.8873     0.4491  -1.976   0.1162  
## Skokie Lagoons - McDonald Woods == 0  -0.9985     0.4421  -2.258   0.0607 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

It appears that none of the groups are technically signficantly different from one another, but there is likely a biologically significant difference in species richness between McDonald Woods and Skokie Lagoons.

II. Generalized Linear Models

Kathryn Dawdy

2022-06-10

INTRODUCTION

GLM Assumptions

EXAMPLE