Species-Area Relationship

Equilibrium Theory of Island Biogeography

Background

What is it about?

  • What is an island?
  • What is the expectation?
  • Why does it matter?
    • theoretical interest (what determines diversity?)
    • conservation (habitat fragmentation…)
  • How does it come about?

Explanations put forward

  • Habitat diversity: greater in larger islands
  • Equilibrium Theory of Island Biogeography:
    • colonisation vs. extinction events
    • speciation events
  • Simple sampling effect (think quadrats of different sizes)

The Class Exercise: Stone Turning

Working in pairs, you will

  • turn over stones,
  • count the number of species present, and
  • record the size of the stone.

Do you need to identify the species present?

Planning a Field Study

Be clear about the analysis before you start collecting data.

  • What is my independent variable (predictor), \(X\)?
  • What is my dependent variable (outcome), \(Y\)?

What kind of variable? – and how will I test \(Y\sim X\)?

Sampling Considerations

“Random sampling” and “avoiding bias” is drilled into you.

  • What are you sampling here (what’s your ‘unit of replication’)?
  • Is random sampling appropriate for this study?

The (in)famous Normal Distribution

  • What will happen if you sample stones at random?
  • Why is this bad?

Because of this…

You want this

Analysis

Task for today

  • Plot the data / relationship
  • Formal test: is there evidence for our hypothesis?
  • [What is the shape of the relationship?]

Shape of the relationship?

Beyond “it’s correlated,” what do we expect the relationship to look like?

  • linear? — \(S\sim A\):
    \(S = \beta_0 + \beta_1 A\)
  • log-linear (exponential)? — \(S \sim \log A\):
    \(S = \beta_0 + \beta_1\log A\)
  • log-log linear (power function)? — \(\log S \sim \log A\):
    \(\log S=\beta_0+\beta_1\log A\)
  • sigmoidal?

Most common assumption

Number of species \(S\) follows a power relationship with area \(A\):

\[S = cA^z\]

which would give a straight line in a log-log plot

\[\log S = \log c + z\log A\]

Getting the data in

Turn class XLSX file into CSV

Done for you

dat.A <- read.csv("data 2026/A.csv");  dat.A$group <- "A"
dat.B <- read.csv("data 2026/B.csv");  dat.B$group <- "B"
dat.C <- read.csv("data 2026/C.csv");  dat.C$group <- "C"
dat.D <- read.csv("data 2026/D.csv");  dat.D$group <- "D"
dat.E <- read.csv("data 2026/E.csv");  dat.E$group <- "E"
dat.F <- read.csv("data 2026/F.csv");  dat.F$group <- "F"

SpA.data <- rbind(dat.A, dat.B, dat.C, dat.D, dat.E, dat.F)

# write out CSV file that you will be using
write.csv(SpA.data, file = "Species-Area class data 2026.csv", row.names = F)  

Read in the CSV

All you need to do is read in the CSV file, which has the aggregated class data in a single ‘table’ (data frame).

SpA.data <- read.csv("Species-Area class data 2026.csv")

What’s our N…

nrow(SpA.data)
[1] 179

Great… we were shooting for \(N\sim 30\times 6 = 180\) 👍

Sample distribution of areas (\(A\))

# calculate new column for stone area = width * length
SpA.data$area <- with(SpA.data, width.cm*length.cm)

What are those huge stones?

    group group.anon size.S_M_L width.cm length.cm n.species ants.Y_N area
88      C          C          L       74        73         1        N 5402
114     D          D          L       73        74         1        N 5402
164     F      FALSE       <NA>      101        46         8        Y 4646
166     F      FALSE       <NA>       65        58         4        N 3770
43      B          B          L       57        45         3        Y 2565
118     D          D          L       45        51         5        N 2295

I am somewhat skeptical of the \(73\times 74\ \mathrm{cm^2}\) stones. Executive decision: throw them out.

Distraction

With log scale \(x\)-axis:

Sample distribution of counts (\(S\))

Linear plot: \(S \sim A\)

Log-linear plot: \(S \sim \log A\)

Log-log plot: \(\log S \sim \log A\)

\(\log 0\) is undefined: add 1 to all counts.

\(\log 1=0\) so zeros will be back to zeros…

Log-log plot: \(\log S \sim \log A\)

Spearman rank correlation

Recall that rank is unaffected by log-transformation: log is ‘rank-preserving’ i.e. doesn’t change the rank order of the values. The whole point of a rank correlation is that it doesn’t assume a linear relationship.

cor.test( ~ n.species + area, data = SpA.2, method = "spear")

    Spearman's rank correlation rho

data:  n.species and area
S = 494264, p-value = 6.873e-11
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.4651837 

Fit log-log linear regression

As an alternative to Spearman rank correlation test.

fit.1 <- lm(log(n.species+1) ~ log(area), data = SpA.2)
summary(fit.1)

Call:
lm(formula = log(n.species + 1) ~ log(area), data = SpA.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.16326 -0.39571 -0.00392  0.34451  1.97410 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.82331    0.22118  -3.722 0.000266 ***
log(area)    0.28367    0.03875   7.320 8.63e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5307 on 175 degrees of freedom
Multiple R-squared:  0.2344,    Adjusted R-squared:   0.23 
F-statistic: 53.59 on 1 and 175 DF,  p-value: 8.634e-12

Correlation vs Regression

Unlike a correlation test, linear regression is modelling a relationship – we are estimating parameters. Recall that \(\log S = \log c + z\log A\); as per the model summary (previous slide),

  • (Intercept) is the intercept of the regression line, and gives the estimate \(\log c = -0.82\); therefore, \(c = \exp (-0.82) =0.44\);
  • log(area) is the slope of the regression line, and gives the estimate \(z = 0.28\).

So our model of the species-area relationship for stones in the Boquer valley is

\[S = 0.44 \times A^{0.28}\]

Do Ants matter?

Hmmm!

fit.1 <- lm(log(n.species+1) ~ log(area) + ants.Y_N, data = SpA.2)
summary(fit.1)

Call:
lm(formula = log(n.species + 1) ~ log(area) + ants.Y_N, data = SpA.2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.00865 -0.36998 -0.05922  0.38067  1.58190 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.77807    0.20635  -3.771 0.000223 ***
log(area)    0.25513    0.03653   6.984 5.82e-11 ***
ants.Y_NY    0.45223    0.08636   5.236 4.68e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4947 on 174 degrees of freedom
Multiple R-squared:  0.3386,    Adjusted R-squared:  0.331 
F-statistic: 44.55 on 2 and 174 DF,  p-value: 2.388e-16

Our DV is counts…

We shouldn’t really be fitting a Linear Model (LM), because the residuals won’t be normally distributed. This is particularly so if the counts are low (small numbers of species). Count data are usually best modelled using a Poisson distribution, so we could try a Poisson GLM (Generalised Linear Model) on \(\log A\).

Poisson GLM

fit.3 <- glm(n.species ~ log10(area), SpA.2, family = poisson())
summary(fit.3)

Call:
glm(formula = n.species ~ log10(area), family = poisson(), data = SpA.2)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -2.1325     0.3603  -5.919 3.23e-09 ***
log10(area)   1.0231     0.1346   7.603 2.88e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 309.37  on 176  degrees of freedom
Residual deviance: 251.14  on 175  degrees of freedom
AIC: 575.17

Number of Fisher Scoring iterations: 5

Poisson fit line

The line shows the predictions from the Poisson model (GLM). Looks reasonable…