Species-Area Relationship

Equilibrium Theory of Island Biogeography

Background

What is it about?

What is an island?
What is the expectation?
Why does it matter?
- theoretical interest (what determines diversity?)
- conservation (habitat fragmentation…)
How does it come about?

Explanations put forward

Habitat diversity: greater in larger islands
Equilibrium Theory of Island Biogeography:
- colonisation vs. extinction events
- speciation events
Simple sampling effect (think quadrats of different sizes)

The Class Exercise: Stone Turning

Working in pairs, you will

turn over stones,
count the number of species present, and
record the size of the stone.

Do you need to identify the species present?

Planning a Field Study

Be clear about the analysis before you start collecting data.

What is my independent variable (predictor), \(X\)?
What is my dependent variable (outcome), \(Y\)?

What kind of variable? – and how will I test \(Y\sim X\)?

Sampling Considerations

“Random sampling” and “avoiding bias” is drilled into you.

What are you sampling here (what’s your ‘unit of replication’)?
Is random sampling appropriate for this study?

The (in)famous Normal Distribution

What will happen if you sample stones at random?
Why is this bad?

Because of this…

You want this

Analysis

Task for today

Plot the data / relationship
Formal test: is there evidence for our hypothesis?
[What is the shape of the relationship?]

Shape of the relationship?

Beyond “it’s correlated,” what do we expect the relationship to look like?

linear? — \(S\sim A\):
\(S = \beta_0 + \beta_1 A\)
log-linear (exponential)? — \(S \sim \log A\):
\(S = \beta_0 + \beta_1\log A\)
log-log linear (power function)? — \(\log S \sim \log A\):
\(\log S=\beta_0+\beta_1\log A\)
sigmoidal?

Most common assumption

Number of species \(S\) follows a power relationship with area \(A\):

\[S = cA^z\]

which would give a straight line in a log-log plot

\[\log S = \log c + z\log A\]

Getting the data in

Turn class XLSX file into CSV

Done for you

dat.A <- read.csv("data 2026/A.csv");  dat.A$group <- "A"
dat.B <- read.csv("data 2026/B.csv");  dat.B$group <- "B"
dat.C <- read.csv("data 2026/C.csv");  dat.C$group <- "C"
dat.D <- read.csv("data 2026/D.csv");  dat.D$group <- "D"
dat.E <- read.csv("data 2026/E.csv");  dat.E$group <- "E"
dat.F <- read.csv("data 2026/F.csv");  dat.F$group <- "F"

SpA.data <- rbind(dat.A, dat.B, dat.C, dat.D, dat.E, dat.F)

# write out CSV file that you will be using
write.csv(SpA.data, file = "Species-Area class data 2026.csv", row.names = F)

Read in the CSV

All you need to do is read in the CSV file, which has the aggregated class data in a single ‘table’ (data frame).

SpA.data <- read.csv("Species-Area class data 2026.csv")

What’s our N…

nrow(SpA.data)

[1] 179

Great… we were shooting for \(N\sim 30\times 6 = 180\) 👍

Sample distribution of areas (\(A\))

# calculate new column for stone area = width * length
SpA.data$area <- with(SpA.data, width.cm*length.cm)

What are those huge stones?

    group group.anon size.S_M_L width.cm length.cm n.species ants.Y_N area
88      C          C          L       74        73         1        N 5402
114     D          D          L       73        74         1        N 5402
164     F      FALSE       <NA>      101        46         8        Y 4646
166     F      FALSE       <NA>       65        58         4        N 3770
43      B          B          L       57        45         3        Y 2565
118     D          D          L       45        51         5        N 2295

I am somewhat skeptical of the \(73\times 74\ \mathrm{cm^2}\) stones. Executive decision: throw them out.

Distraction

With log scale \(x\)-axis:

Sample distribution of counts (\(S\))

Linear plot: \(S \sim A\)

Log-linear plot: \(S \sim \log A\)

Log-log plot: \(\log S \sim \log A\)

\(\log 0\) is undefined: add 1 to all counts.

\(\log 1=0\) so zeros will be back to zeros…

Log-log plot: \(\log S \sim \log A\)

Spearman rank correlation

Recall that rank is unaffected by log-transformation: log is ‘rank-preserving’ i.e. doesn’t change the rank order of the values. The whole point of a rank correlation is that it doesn’t assume a linear relationship.

cor.test( ~ n.species + area, data = SpA.2, method = "spear")


    Spearman's rank correlation rho

data:  n.species and area
S = 494264, p-value = 6.873e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4651837

Fit log-log linear regression

As an alternative to Spearman rank correlation test.

fit.1 <- lm(log(n.species+1) ~ log(area), data = SpA.2)
summary(fit.1)


Call:
lm(formula = log(n.species + 1) ~ log(area), data = SpA.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.16326 -0.39571 -0.00392  0.34451  1.97410 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.82331    0.22118  -3.722 0.000266 ***
log(area)    0.28367    0.03875   7.320 8.63e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5307 on 175 degrees of freedom
Multiple R-squared:  0.2344,    Adjusted R-squared:   0.23 
F-statistic: 53.59 on 1 and 175 DF,  p-value: 8.634e-12

Correlation vs Regression

Unlike a correlation test, linear regression is modelling a relationship – we are estimating parameters. Recall that \(\log S = \log c + z\log A\); as per the model summary (previous slide),

(Intercept) is the intercept of the regression line, and gives the estimate \(\log c = -0.82\); therefore, \(c = \exp (-0.82) =0.44\);
log(area) is the slope of the regression line, and gives the estimate \(z = 0.28\).

So our model of the species-area relationship for stones in the Boquer valley is

\[S = 0.44 \times A^{0.28}\]

Do Ants matter?

Hmmm!

fit.1 <- lm(log(n.species+1) ~ log(area) + ants.Y_N, data = SpA.2)
summary(fit.1)


Call:
lm(formula = log(n.species + 1) ~ log(area) + ants.Y_N, data = SpA.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.00865 -0.36998 -0.05922  0.38067  1.58190 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.77807    0.20635  -3.771 0.000223 ***
log(area)    0.25513    0.03653   6.984 5.82e-11 ***
ants.Y_NY    0.45223    0.08636   5.236 4.68e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4947 on 174 degrees of freedom
Multiple R-squared:  0.3386,    Adjusted R-squared:  0.331 
F-statistic: 44.55 on 2 and 174 DF,  p-value: 2.388e-16

Our DV is counts…

We shouldn’t really be fitting a Linear Model (LM), because the residuals won’t be normally distributed. This is particularly so if the counts are low (small numbers of species). Count data are usually best modelled using a Poisson distribution, so we could try a Poisson GLM (Generalised Linear Model) on \(\log A\).

Poisson GLM

fit.3 <- glm(n.species ~ log10(area), SpA.2, family = poisson())
summary(fit.3)


Call:
glm(formula = n.species ~ log10(area), family = poisson(), data = SpA.2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.1325     0.3603  -5.919 3.23e-09 ***
log10(area)   1.0231     0.1346   7.603 2.88e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 309.37  on 176  degrees of freedom
Residual deviance: 251.14  on 175  degrees of freedom
AIC: 575.17

Number of Fisher Scoring iterations: 5

Poisson fit line

The line shows the predictions from the Poisson model (GLM). Looks reasonable…