PART 1: Poisson GLM for Species Richness

In this section, we analyze species richness, which represents count data (number of species). Therefore, a Poisson Generalized Linear Model (GLM) is the most appropriate approach.

Step 1: Load and Explore Data

First, we load the dataset and inspect its structure to understand the variables we are working with.

# Load the dataset from the CSV file
forest <- read.csv("forest.csv")

# Display the internal structure of the data frame (variable types)
str(forest)
## 'data.frame':    200 obs. of  4 variables:
##  $ forest       : num  71.9 20.9 83.2 37 74.5 ...
##  $ forest.clumpy: num  0.861 0.796 0.822 0.799 0.902 ...
##  $ sp.richness  : int  109 48 117 50 78 71 79 96 102 61 ...
##  $ Empenigr     : int  1 0 1 0 0 0 0 0 1 0 ...
# Provide a statistical summary (Min, Max, Mean) for all variables
summary(forest)
##      forest      forest.clumpy     sp.richness        Empenigr   
##  Min.   : 8.28   Min.   :0.7609   Min.   : 23.00   Min.   :0.00  
##  1st Qu.:39.09   1st Qu.:0.8457   1st Qu.: 72.00   1st Qu.:0.00  
##  Median :53.89   Median :0.8695   Median : 85.00   Median :0.00  
##  Mean   :53.69   Mean   :0.8637   Mean   : 85.17   Mean   :0.35  
##  3rd Qu.:68.40   3rd Qu.:0.8858   3rd Qu.:100.00   3rd Qu.:1.00  
##  Max.   :91.60   Max.   :0.9487   Max.   :127.00   Max.   :1.00

Step 2: Explore the Response Variable

Before modeling, it is crucial to visualize the distribution of our response variable (Species Richness). Count data typically follows a right-skewed Poisson distribution.

# Plot a histogram to visualize the frequency of species richness counts
hist(forest$sp.richness, 
     main = "Histogram of Species Richness", 
     xlab = "Species Richness (Count)",
     col = "lightblue")

Step 3: Full Model (With Interaction)

We fit a Poisson GLM using the log link function. The mathematical representation is:

\[ \log(\lambda) = \beta_0 + \beta_1(Forest) + \beta_2(Clumpy) + \beta_3(Forest \times Clumpy) \]

# Fit the full Poisson GLM including the interaction term (*) between predictors
model_full <- glm(sp.richness ~ forest * forest.clumpy, 
                  family = poisson(link = "log"), 
                  data = forest)

# Display the model summary to check the significance of the interaction term
summary(model_full)
## 
## Call:
## glm(formula = sp.richness ~ forest * forest.clumpy, family = poisson(link = "log"), 
##     data = forest)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           1.903143   0.526995   3.611 0.000305 ***
## forest                0.025377   0.008490   2.989 0.002797 ** 
## forest.clumpy         2.502664   0.614149   4.075  4.6e-05 ***
## forest:forest.clumpy -0.021457   0.009995  -2.147 0.031813 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 887.57  on 199  degrees of freedom
## Residual deviance: 575.27  on 196  degrees of freedom
## AIC: 1834.5
## 
## Number of Fisher Scoring iterations: 4

Step 4: Final Additive Model

If the interaction is not significant, we simplify the model by removing it, leaving only the additive main effects (+).

# Fit the final additive model without the interaction term (+)
model_final <- glm(sp.richness ~ forest + forest.clumpy, 
                   family = poisson(link = "log"), 
                   data = forest)

# Display the summary to interpret the main effects
summary(model_final)
## 
## Call:
## glm(formula = sp.richness ~ forest + forest.clumpy, family = poisson(link = "log"), 
##     data = forest)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.9556051  0.1892674  15.616  < 2e-16 ***
## forest        0.0071777  0.0004181  17.166  < 2e-16 ***
## forest.clumpy 1.2671322  0.2103598   6.024 1.71e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 887.57  on 199  degrees of freedom
## Residual deviance: 579.88  on 197  degrees of freedom
## AIC: 1837.1
## 
## Number of Fisher Scoring iterations: 4

Step 5: Check Overdispersion

A strict assumption of the Poisson model is that the variance equals the mean. We check for overdispersion by dividing the residual deviance by the residual degrees of freedom.

\[ Dispersion\_Ratio = \frac{Residual\_Deviance}{Residual\_df} \]

# Calculate the dispersion ratio
dispersion_ratio <- deviance(model_final) / df.residual(model_final)
print(dispersion_ratio)
## [1] 2.943544
# Decision Rule:
# If value is close to 1 -> Good (No overdispersion)
# If value > 1.5 or 2 -> Overdispersion problem exists

# If overdispersion exists, we must use the 'quasipoisson' family to correct standard errors
model_final_quasi <- glm(sp.richness ~ forest + forest.clumpy, 
                         family = quasipoisson(link = "log"), 
                         data = forest)

Step 6: Model Diagnostics

We use diagnostic plots to verify the model assumptions (e.g., residual patterns, influential outliers).

# Divide the plotting area into a 2x2 grid to show all 4 diagnostic plots together
par(mfrow = c(2, 2))

# Generate the diagnostic plots for the final model
plot(model_final)

# Reset the plotting layout to default
par(mfrow = c(1, 1))

Interpretation of Part 1:

  • A Poisson GLM was used because species richness is discrete count data.
  • Forest aggregation (forest.clumpy) had a significant positive effect, indicating that highly aggregated forests support higher species richness.

PART 2: Binomial GLM for Presence of Empetrum nigrum

In this section, we analyze the presence or absence of a specific plant species (Empetrum nigrum). Since the data is binary (0 = absent, 1 = present), a Binomial GLM (Logistic Regression) is required.

Step 1: Full Model

We use the logit link function to model the probability (\(p\)) of the species being present:

\[ \text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \dots \]

# Fit the full binomial GLM including the interaction term (*)
model2_full <- glm(Empenigr ~ forest * forest.clumpy, 
                   family = binomial(link = "logit"), 
                   data = forest)

# Check the summary to evaluate predictors and their interaction
summary(model2_full)
## 
## Call:
## glm(formula = Empenigr ~ forest * forest.clumpy, family = binomial(link = "logit"), 
##     data = forest)
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)          -30.7165    13.4065  -2.291   0.0220 *
## forest                 0.5084     0.2101   2.420   0.0155 *
## forest.clumpy         32.8914    15.4238   2.133   0.0330 *
## forest:forest.clumpy  -0.5558     0.2441  -2.276   0.0228 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 258.98  on 199  degrees of freedom
## Residual deviance: 235.90  on 196  degrees of freedom
## AIC: 243.9
## 
## Number of Fisher Scoring iterations: 4

Step 2: Final Additive Model

Depending on the significance of the interaction in the full model, we can fit an additive model for comparison or final interpretation.

# Fit the additive binomial model without the interaction term (+)
model2_final <- glm(Empenigr ~ forest + forest.clumpy, 
                    family = binomial(link = "logit"), 
                    data = forest)

summary(model2_final)
## 
## Call:
## glm(formula = Empenigr ~ forest + forest.clumpy, family = binomial(link = "logit"), 
##     data = forest)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -2.306573   3.895496  -0.592 0.553775    
## forest         0.033682   0.008951   3.763 0.000168 ***
## forest.clumpy -0.214035   4.306505  -0.050 0.960361    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 258.98  on 199  degrees of freedom
## Residual deviance: 241.67  on 197  degrees of freedom
## AIC: 247.67
## 
## Number of Fisher Scoring iterations: 4

Step 3: Model Diagnostics

Just like the Poisson model, we check the residuals for the Binomial model.

# Set up a 2x2 grid for diagnostic plots
par(mfrow = c(2, 2))

# Generate plots to check for influential data points or severe deviations
plot(model2_final)

# Reset the plotting layout to default
par(mfrow = c(1, 1))

Interpretation of Part 2:

  • A Binomial GLM was used to analyze the binary presence/absence of Empetrum nigrum.
  • The interaction between forest amount and forest aggregation was significant.
  • From the negative estimate values, we can deduce that the probability of occurrence increases with forest amount, but this increase is weaker (dampened) in highly aggregated forests.