About the Dataset

For this activity, I worked with a dataset on White Browed Sparrow Weaver nesting, applying logistic regression and data visualization using R.

##Loading the libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readxl)
library(corrplot)
## corrplot 0.95 loaded
library(tidyr)
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(ggplot2)

##Load Data

data <- read_excel("C:/Users/User/Downloads/Data rangu/Data rangu/data.xlsx")
View(data)
data
## # A tibble: 915 × 8
##    TS     `P/A`    TH    CD    CC    BA    DS    NN
##    <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Mopane     1  10.6  9.4   70.2  8.94     0    14
##  2 Mopane     1  11.1 10.2   23.3  0.7     10     1
##  3 Mopane     0  10.6  9.95  29.2  1.53    10     0
##  4 Mopane     0  14   11.7   48.3  1.43     0     0
##  5 Mopane     0  10.2  8.55  41.4  1.88    10     0
##  6 Mopane     1   8.5  7.27  48.1  1.76    50    10
##  7 Mopane     1  10.5  8.99  61.4  1.47    10     4
##  8 Mopane     0  11.5  9.19  47.1  1.72    10     0
##  9 Mopane     0  13   11.6   40.8  1.22    10     0
## 10 Mopane     0  10.6  9.75  64.1  1.03     0     0
## # ℹ 905 more rows

Steps Taken

1. Data Loading & Cleaning

Loaded data using read_excel() and read.csv().

Converted the P/A (Presence/Absence) variable into a factor for binary classification.

data$`P/A` <- as.factor(data$`P/A`)
  1. Logistic Regression Modeling Modeled nest presence using tree structural variables: Tree Height (TH), Canopy Depth (CD), Canopy Cover (CC), Basal Area (BA), and Damage Score (DS):
model <- glm(`P/A` ~ TH + CD + CC + BA + DS, data = data, family = binomial)
summary(model)
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.600539   0.331455  -4.829 1.37e-06 ***
## TH           0.151297   0.045991   3.290    0.001 ** 
## CD          -0.039035   0.044741  -0.872    0.383    
## CC           0.022358   0.005591   3.999 6.37e-05 ***
## BA           0.099349   0.085080   1.168    0.243    
## DS          -0.026155   0.004276  -6.117 9.55e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1163.2  on 912  degrees of freedom
## Residual deviance:  773.8  on 907  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 785.8
## 
## Number of Fisher Scoring iterations: 5

##Checked model quality using AIC and interpreted odds ratios:

AIC(model)
## [1] 785.7988
exp(cbind(OR = coef(model), confint(model)))
## Waiting for profiling to be done...
##                    OR     2.5 %    97.5 %
## (Intercept) 0.2017877 0.1040817 0.3824559
## TH          1.1633423 1.0641990 1.2748632
## CD          0.9617168 0.8801620 1.0492515
## CC          1.0226101 1.0116467 1.0341130
## BA          1.1044513 0.9212414 1.3105013
## DS          0.9741841 0.9659280 0.9822820

Interaction Effects

To examine whether combinations of tree features influence nesting more than individual ones, I added interaction terms:

interaction_model <- glm(`P/A` ~ TH * CD + BA * CC, data = data, family = binomial)
summary(interaction_model)
## 
## Call:
## glm(formula = `P/A` ~ TH * CD + BA * CC, family = binomial, data = data)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -4.730676   0.429975 -11.002  < 2e-16 ***
## TH           0.352388   0.057488   6.130 8.80e-10 ***
## CD           0.253930   0.084345   3.011  0.00261 ** 
## BA           0.286579   0.115433   2.483  0.01304 *  
## CC           0.042077   0.007171   5.868 4.42e-09 ***
## TH:CD       -0.026586   0.006452  -4.121 3.78e-05 ***
## BA:CC       -0.006998   0.002146  -3.262  0.00111 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1163.20  on 912  degrees of freedom
## Residual deviance:  782.96  on 906  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 796.96
## 
## Number of Fisher Scoring iterations: 5

🌳 Some interaction terms were significant, suggesting that a combination of taller trees with deeper canopy might increase nesting likelihood.

Species-Specific Analysis

I ran stratified models to explore whether nesting predictors vary by tree species (TS):

levels(data$TS)
## NULL
by_species <- split(data, data$TS)

models_by_species <- lapply(by_species, function(df) {
  glm(`P/A` ~ TH + CD + CC + BA + DS, data = df, family = binomial)
})
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
lapply(models_by_species, summary)
## $Acacia
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.341709   1.636531  -0.820    0.412
## TH          -0.069889   0.391637  -0.178    0.858
## CD          -0.051594   0.413620  -0.125    0.901
## CC           0.053601   0.040191   1.334    0.182
## BA           0.639031   0.872354   0.733    0.464
## DS          -0.006525   0.022984  -0.284    0.776
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.316  on 21  degrees of freedom
## Residual deviance: 23.194  on 16  degrees of freedom
## AIC: 35.194
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $Combretum
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.478e+01  2.560e+05       0        1
## TH           4.509e+00  1.370e+05       0        1
## CD          -5.106e+00  9.998e+04       0        1
## CC           2.497e-01  1.988e+04       0        1
## BA           1.077e+00  2.563e+05       0        1
## DS           5.009e-02  3.648e+03       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7.3479e+00  on 14  degrees of freedom
## Residual deviance: 2.8238e-10  on  9  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 24
## 
## 
## $Comiphora
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (5 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -22.57   48196.14       0        1
## TH                NA         NA      NA       NA
## CD                NA         NA      NA       NA
## CC                NA         NA      NA       NA
## BA                NA         NA      NA       NA
## DS                NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 0  degrees of freedom
## Residual deviance: 3.1675e-10  on 0  degrees of freedom
## AIC: 2
## 
## Number of Fisher Scoring iterations: 21
## 
## 
## $`Crocodile bark`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)    -98.836 325354.505   0.000    1.000
## TH              30.738 153280.074   0.000    1.000
## CD             -12.008 156636.425   0.000    1.000
## CC               1.965   6435.641   0.000    1.000
## BA            -144.175 215198.814  -0.001    0.999
## DS              -1.778   5339.491   0.000    1.000
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2.0862e+01  on 21  degrees of freedom
## Residual deviance: 2.3929e-09  on 16  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25
## 
## 
## $`Crocodile Bark`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.457e+01  1.145e+06       0        1
## TH          -1.020e-13  1.077e+06       0        1
## CD           9.925e-14  9.871e+05       0        1
## CC          -2.251e-16  2.018e+04       0        1
## BA           4.714e-14  7.290e+05       0        1
## DS          -4.803e-16  4.578e+03       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 6  degrees of freedom
## Residual deviance: 3.0007e-10  on 1  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 23
## 
## 
## $`Croton Megalabotis`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (3 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.357e+01  7.648e+05       0        1
## TH           1.153e-13  3.512e+05       0        1
## CD          -9.391e-14  2.924e+05       0        1
## CC                  NA         NA      NA       NA
## BA                  NA         NA      NA       NA
## DS                  NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 2  degrees of freedom
## Residual deviance: 3.4957e-10  on 0  degrees of freedom
## AIC: 6
## 
## Number of Fisher Scoring iterations: 22
## 
## 
## $`D. mespiliformus`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (5 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -22.57   48196.14       0        1
## TH                NA         NA      NA       NA
## CD                NA         NA      NA       NA
## CC                NA         NA      NA       NA
## BA                NA         NA      NA       NA
## DS                NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 0  degrees of freedom
## Residual deviance: 3.1675e-10  on 0  degrees of freedom
## AIC: 2
## 
## Number of Fisher Scoring iterations: 21
## 
## 
## $`D. quiloensis`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)  1.106e+02  2.792e+05       0        1
## TH          -1.628e+01  1.009e+05       0        1
## CD           5.875e+00  1.129e+05       0        1
## CC          -5.322e+00  7.849e+04       0        1
## BA          -2.717e+02  2.413e+06       0        1
## DS          -6.999e-01  7.469e+03       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1.3003e+01  on 19  degrees of freedom
## Residual deviance: 7.0792e-10  on 14  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25
## 
## 
## $`E. zambesiacum`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (4 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.357e+01  3.105e+05       0        1
## TH           1.149e-14  1.060e+05       0        1
## CD                  NA         NA      NA       NA
## CC                  NA         NA      NA       NA
## BA                  NA         NA      NA       NA
## DS                  NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 1  degrees of freedom
## Residual deviance: 2.3305e-10  on 0  degrees of freedom
## AIC: 4
## 
## Number of Fisher Scoring iterations: 22
## 
## 
## $`Monkey Orange`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (5 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -22.57   48196.14       0        1
## TH                NA         NA      NA       NA
## CD                NA         NA      NA       NA
## CC                NA         NA      NA       NA
## BA                NA         NA      NA       NA
## DS                NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 0  degrees of freedom
## Residual deviance: 3.1675e-10  on 0  degrees of freedom
## AIC: 2
## 
## Number of Fisher Scoring iterations: 21
## 
## 
## $Mopane
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.550079   0.407576  -3.803 0.000143 ***
## TH           0.146310   0.050073   2.922 0.003478 ** 
## CD          -0.032853   0.046821  -0.702 0.482885    
## CC           0.021426   0.005827   3.677 0.000236 ***
## BA           0.100625   0.086038   1.170 0.242185    
## DS          -0.025620   0.004876  -5.254 1.49e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 999.26  on 768  degrees of freedom
## Residual deviance: 670.93  on 763  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 682.93
## 
## Number of Fisher Scoring iterations: 5
## 
## 
## $`P. violacea`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (3 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.357e+01  4.008e+05       0        1
## TH          -1.994e-15  2.170e+04       0        1
## CD           1.331e-15  2.687e+04       0        1
## CC                  NA         NA      NA       NA
## BA                  NA         NA      NA       NA
## DS                  NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 2  degrees of freedom
## Residual deviance: 3.4957e-10  on 0  degrees of freedom
## AIC: 6
## 
## Number of Fisher Scoring iterations: 22
## 
## 
## $Strychnos
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.457e+01  9.933e+05       0        1
## TH          -8.149e-15  2.102e+05       0        1
## CD           1.471e-14  1.716e+05       0        1
## CC          -3.402e-15  3.632e+04       0        1
## BA           1.280e-13  9.784e+05       0        1
## DS           7.923e-16  9.036e+03       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 6  degrees of freedom
## Residual deviance: 3.0007e-10  on 1  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 23
## 
## 
## $Terminalia
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (5 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -22.57   48196.14       0        1
## TH                NA         NA      NA       NA
## CD                NA         NA      NA       NA
## CC                NA         NA      NA       NA
## BA                NA         NA      NA       NA
## DS                NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 0  degrees of freedom
## Residual deviance: 3.1675e-10  on 0  degrees of freedom
## AIC: 2
## 
## Number of Fisher Scoring iterations: 21
## 
## 
## $`Terminalia Maprunoid`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.457e+01  2.973e+05       0        1
## TH          -1.803e-15  4.941e+04       0        1
## CD          -2.230e-16  7.908e+04       0        1
## CC           2.017e-16  3.835e+04       0        1
## BA           2.367e-14  3.949e+05       0        1
## DS          -6.705e-17  3.482e+03       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 7  degrees of freedom
## Residual deviance: 3.4294e-10  on 2  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 23
## 
## 
## $`V. lanciflora`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (5 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -22.57   48196.14       0        1
## TH                NA         NA      NA       NA
## CD                NA         NA      NA       NA
## CC                NA         NA      NA       NA
## BA                NA         NA      NA       NA
## DS                NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 0  degrees of freedom
## Residual deviance: 3.1675e-10  on 0  degrees of freedom
## AIC: 2
## 
## Number of Fisher Scoring iterations: 21
## 
## 
## $`Vitex payos`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -212.724 793958.349       0        1
## TH             113.077 365934.280       0        1
## CD            -128.447 450042.819       0        1
## CC              10.224  41549.781       0        1
## BA              31.464 277579.859       0        1
## DS               1.297   8597.725       0        1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 7.6382e+00  on 5  degrees of freedom
## Residual deviance: 2.5720e-10  on 0  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 23
## 
## 
## $`Wine Cup Tree`
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients: (2 not defined because of singularities)
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.457e+01  3.741e+05       0        1
## TH          -7.635e-15  7.990e+05       0        1
## CD           7.635e-15  8.495e+05       0        1
## CC                  NA         NA      NA       NA
## BA          -1.136e-30  7.219e+05       0        1
## DS                  NA         NA      NA       NA
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 0.0000e+00  on 4  degrees of freedom
## Residual deviance: 2.1434e-10  on 1  degrees of freedom
## AIC: 8
## 
## Number of Fisher Scoring iterations: 23
## 
## 
## $Z.Coca
## 
## Call:
## glm(formula = `P/A` ~ TH + CD + CC + BA + DS, family = binomial, 
##     data = df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -1.78286    2.67927  -0.665   0.5058  
## TH           5.53253    3.23191   1.712   0.0869 .
## CD          -5.46988    3.24594  -1.685   0.0920 .
## CC          -0.26523    0.17907  -1.481   0.1386  
## BA           3.33066    2.67379   1.246   0.2129  
## DS          -0.07489    0.05154  -1.453   0.1462  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.287  on 18  degrees of freedom
## Residual deviance: 13.559  on 13  degrees of freedom
## AIC: 25.559
## 
## Number of Fisher Scoring iterations: 7

Overal Model (All Specifies combined)

Key Findings:

TH (Tree Height): Positive and significant (OR = 1.16, p = 0.001). Taller trees are more likely to be present.

CC (Canopy Cover): Positive and highly significant (OR = 1.02, p < 0.001). Higher canopy cover increases presence probability.

DS (Distance to Settlement): Negative and highly significant (OR = 0.97, p < 0.001). Greater distances reduce tree presence likelihood.

Interpretation: This suggests that certain trees favor closer proximity to settlements (possibly due to less competition or anthropogenic influence), thrive in taller form, and prefer denser canopy conditions. This could point to selective growth or management in semi-natural landscapes.

2. Interaction Model: THCD and BACC

Key Findings: TH and CD: Individually positive; their interaction (TH:CD) is negative and significant (p < 0.001).

BA and CC: Individually positive; their interaction (BA:CC) is significantly negative.

Interpretation: The TH:CD interaction indicates diminishing returns: taller trees with wider canopies may not always be more likely to occur — perhaps due to crowding, wind exposure, or ecological trade-offs.

The BA:CC interaction implies that trees with both high basal area and high canopy cover may reduce each other’s individual positive effects. Possibly, only certain species can support both without resource strain.

🌳 Species-Specific Models

TH: Positive and significant (p = 0.001)

CC: Positive and significant (p < 0.001)

DS: Negative and highly significant (p < 0.001)

CD and BA: Not significant

Interpretation: Mopane presence is associated with taller trees, denser canopy, and closer proximity to settlements. It is possibly a dominant species benefiting from human land-use or less susceptible to edge effects.

🌳 Acacia None of the predictors were statistically significant.

AIC and deviance suggest acceptable fit but low explanatory power.

Interpretation: The model for Acacia shows weak relationships with the environmental variables. This might suggest:

Acacia distribution is driven by other unmeasured factors (e.g., soil chemistry, grazing).

Or the data sample size for Acacia is too small (only 22 observations) to detect significance.

🌳 Combretum, Comiphora, Crocodile Bark Common Observations: Models failed to converge or had coefficients with massive standard errors.

Warnings: “fitted probabilities numerically 0 or 1 occurred”, “did not converge”.

Interpretation: Extreme separation or sparse data: Possibly too few presences or absences for reliable modeling.

Multicollinearity or perfect prediction: Some variables might perfectly predict the outcome, leading to infinite coefficients.

Recommendation: Consider reducing model complexity (e.g., remove interactions or collinear variables).

Possibly use Firth’s penalized logistic regression for rare events.

Consider combining rare species or conducting a Poisson/Zero-Inflated model if dealing with count-based or highly zero-inflated data.

predicted probabilities

predicted <- predict(model, type = "response")
plot(data$TH, main = "Probability vs Tree Height", xlab = "Tree Height", ylab = "Predicted Probability")

📊 Model Diagnostics & Suggestions 1. Model Fit Overall model AIC = 785.8 vs Interaction Model AIC = 796.96 → Suggests simpler model may be better.

Mopane model had the best predictive power due to significant variables and large sample size.

  1. Model Issues Many warnings in species-specific models point to the need for:

Better sampling across species

Variable selection or transformation

Potential hierarchical models (e.g., GLMM with species as a random effect)

📌 Final Thoughts and Recommendations

Model Key Drivers Notes
Overall TH↑, CC↑, DS↓ Strong predictors, interpretable
Mopane Same as overall Dominant species, reliable fit
Acacia None Weak signal, needs more data
Others Not interpretable Sparse data, consider alternative models