PART 1: Poisson GLM for Species Richness

In this section, we analyze species richness, which represents count data (number of species). Therefore, a Poisson Generalized Linear Model (GLM) is the most appropriate approach.

Step 1: Load and Explore Data

First, we load the dataset and inspect its structure to understand the variables we are working with.

# Load the dataset from the CSV file
forest <- read.csv("forest.csv")

# Display the internal structure of the data frame (variable types)
str(forest)

## 'data.frame':    200 obs. of  4 variables:
##  $ forest       : num  71.9 20.9 83.2 37 74.5 ...
##  $ forest.clumpy: num  0.861 0.796 0.822 0.799 0.902 ...
##  $ sp.richness  : int  109 48 117 50 78 71 79 96 102 61 ...
##  $ Empenigr     : int  1 0 1 0 0 0 0 0 1 0 ...

# Provide a statistical summary (Min, Max, Mean) for all variables
summary(forest)

##      forest      forest.clumpy     sp.richness        Empenigr   
##  Min.   : 8.28   Min.   :0.7609   Min.   : 23.00   Min.   :0.00  
##  1st Qu.:39.09   1st Qu.:0.8457   1st Qu.: 72.00   1st Qu.:0.00  
##  Median :53.89   Median :0.8695   Median : 85.00   Median :0.00  
##  Mean   :53.69   Mean   :0.8637   Mean   : 85.17   Mean   :0.35  
##  3rd Qu.:68.40   3rd Qu.:0.8858   3rd Qu.:100.00   3rd Qu.:1.00  
##  Max.   :91.60   Max.   :0.9487   Max.   :127.00   Max.   :1.00

Step 2: Explore the Response Variable

Before modeling, it is crucial to visualize the distribution of our response variable (Species Richness). Count data typically follows a right-skewed Poisson distribution.

# Plot a histogram to visualize the frequency of species richness counts
hist(forest$sp.richness, 
     main = "Histogram of Species Richness", 
     xlab = "Species Richness (Count)",
     col = "lightblue")

Step 3: Full Model (With Interaction)

We fit a Poisson GLM using the log link function. The mathematical representation is:

\[ \log(\lambda) = \beta_0 + \beta_1(Forest) + \beta_2(Clumpy) + \beta_3(Forest \times Clumpy) \]

# Fit the full Poisson GLM including the interaction term (*) between predictors
model_full <- glm(sp.richness ~ forest * forest.clumpy, 
                  family = poisson(link = "log"), 
                  data = forest)

# Display the model summary to check the significance of the interaction term
summary(model_full)

## 
## Call:
## glm(formula = sp.richness ~ forest * forest.clumpy, family = poisson(link = "log"), 
##     data = forest)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           1.903143   0.526995   3.611 0.000305 ***
## forest                0.025377   0.008490   2.989 0.002797 ** 
## forest.clumpy         2.502664   0.614149   4.075  4.6e-05 ***
## forest:forest.clumpy -0.021457   0.009995  -2.147 0.031813 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 887.57  on 199  degrees of freedom
## Residual deviance: 575.27  on 196  degrees of freedom
## AIC: 1834.5
## 
## Number of Fisher Scoring iterations: 4

Step 4: Final Additive Model

If the interaction is not significant, we simplify the model by removing it, leaving only the additive main effects (+).

# Fit the final additive model without the interaction term (+)
model_final <- glm(sp.richness ~ forest + forest.clumpy, 
                   family = poisson(link = "log"), 
                   data = forest)

# Display the summary to interpret the main effects
summary(model_final)

## 
## Call:
## glm(formula = sp.richness ~ forest + forest.clumpy, family = poisson(link = "log"), 
##     data = forest)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   2.9556051  0.1892674  15.616  < 2e-16 ***
## forest        0.0071777  0.0004181  17.166  < 2e-16 ***
## forest.clumpy 1.2671322  0.2103598   6.024 1.71e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 887.57  on 199  degrees of freedom
## Residual deviance: 579.88  on 197  degrees of freedom
## AIC: 1837.1
## 
## Number of Fisher Scoring iterations: 4

Step 5: Check Overdispersion

A strict assumption of the Poisson model is that the variance equals the mean. We check for overdispersion by dividing the residual deviance by the residual degrees of freedom.

\[ Dispersion\_Ratio = \frac{Residual\_Deviance}{Residual\_df} \]

# Calculate the dispersion ratio
dispersion_ratio <- deviance(model_final) / df.residual(model_final)
print(dispersion_ratio)

## [1] 2.943544

# Decision Rule:
# If value is close to 1 -> Good (No overdispersion)
# If value > 1.5 or 2 -> Overdispersion problem exists

# If overdispersion exists, we must use the 'quasipoisson' family to correct standard errors
model_final_quasi <- glm(sp.richness ~ forest + forest.clumpy, 
                         family = quasipoisson(link = "log"), 
                         data = forest)

Step 6: Model Diagnostics

We use diagnostic plots to verify the model assumptions (e.g., residual patterns, influential outliers).

# Divide the plotting area into a 2x2 grid to show all 4 diagnostic plots together
par(mfrow = c(2, 2))

# Generate the diagnostic plots for the final model
plot(model_final)

# Reset the plotting layout to default
par(mfrow = c(1, 1))

Interpretation of Part 1:

A Poisson GLM was used because species richness is discrete count data.
Forest aggregation (forest.clumpy) had a significant positive effect, indicating that highly aggregated forests support higher species richness.

PART 2: Binomial GLM for Presence of Empetrum nigrum

In this section, we analyze the presence or absence of a specific plant species (Empetrum nigrum). Since the data is binary (0 = absent, 1 = present), a Binomial GLM (Logistic Regression) is required.

Step 1: Full Model

We use the logit link function to model the probability (\(p\)) of the species being present:

\[ \text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \dots \]

# Fit the full binomial GLM including the interaction term (*)
model2_full <- glm(Empenigr ~ forest * forest.clumpy, 
                   family = binomial(link = "logit"), 
                   data = forest)

# Check the summary to evaluate predictors and their interaction
summary(model2_full)

## 
## Call:
## glm(formula = Empenigr ~ forest * forest.clumpy, family = binomial(link = "logit"), 
##     data = forest)
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)  
## (Intercept)          -30.7165    13.4065  -2.291   0.0220 *
## forest                 0.5084     0.2101   2.420   0.0155 *
## forest.clumpy         32.8914    15.4238   2.133   0.0330 *
## forest:forest.clumpy  -0.5558     0.2441  -2.276   0.0228 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 258.98  on 199  degrees of freedom
## Residual deviance: 235.90  on 196  degrees of freedom
## AIC: 243.9
## 
## Number of Fisher Scoring iterations: 4

Step 2: Final Additive Model

Depending on the significance of the interaction in the full model, we can fit an additive model for comparison or final interpretation.

# Fit the additive binomial model without the interaction term (+)
model2_final <- glm(Empenigr ~ forest + forest.clumpy, 
                    family = binomial(link = "logit"), 
                    data = forest)

summary(model2_final)

## 
## Call:
## glm(formula = Empenigr ~ forest + forest.clumpy, family = binomial(link = "logit"), 
##     data = forest)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -2.306573   3.895496  -0.592 0.553775    
## forest         0.033682   0.008951   3.763 0.000168 ***
## forest.clumpy -0.214035   4.306505  -0.050 0.960361    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 258.98  on 199  degrees of freedom
## Residual deviance: 241.67  on 197  degrees of freedom
## AIC: 247.67
## 
## Number of Fisher Scoring iterations: 4

Step 3: Model Diagnostics

Just like the Poisson model, we check the residuals for the Binomial model.

# Set up a 2x2 grid for diagnostic plots
par(mfrow = c(2, 2))

# Generate plots to check for influential data points or severe deviations
plot(model2_final)

# Reset the plotting layout to default
par(mfrow = c(1, 1))

Interpretation of Part 2:

A Binomial GLM was used to analyze the binary presence/absence of Empetrum nigrum.
The interaction between forest amount and forest aggregation was significant.
From the negative estimate values, we can deduce that the probability of occurrence increases with forest amount, but this increase is weaker (dampened) in highly aggregated forests.

Generalized Linear Models: Forest Data Analysis

Iman

2026-03-02