Species-Area Relationship

Equilibrium Theory of Island Biogeography

Background and Planning

What is it about?

  • What is an island?
  • What is the expectation?
  • Why does it matter?
    • theoretical interest (what determines diversity?)
    • conservation (habitat fragmentation…)
  • How does it come about?

Explanations put forward

  • Habitat diversity: greater in larger islands
  • Equilibrium Theory of Island Biogeography:
    • colonisation vs. extinction events
    • speciation events
  • Simple sampling effect (think quadrats of different sizes)

The Field Study: Stone Turning

Working in pairs, you’ll be turning over stones, count the number of species present, and record the size of the stone.

Easy, right?

Do you need to identify the species present?

Planning the Field Study

Be clear about the analysis before you start collecting data.

  • What is my independent variable (predictor), \(x\)?
  • What is my dependent variable (outcome), \(y\)?

What kind of variable? - and how will I test \(y\sim x\)?

Sampling Considerations

“Random sampling” and “avoiding bias” is drilled into you.

  • What are you sampling here (what’s your ‘unit of replication’)?
  • Is random sampling appropriate for this study?

The (in)famous Normal Distribution

  • What will happen if you sample stones at random?
  • Why is this bad?

Because of this…

You want this

Stone Sampling: Strategy

  • Ensure all stone sizes are evenly (equally) represented in your sample.

  • Each group to do \(N = 30\) stones

  • Survey (eyeball) the sizes available (and manageable)

  • Roughly call them small (S), medium (M) and large (L) and get data for \(N=10\) each, aiming for variation within.

  • Record number of species and size of the stone (not just S, M, L):

    • measure width and length in cm.

Data Format

  • Collect with pen in your notebook
  • Afterwards, you’ll enter the data in XLS on SharePoint.

Analysis

Task for today

  • Plot the data / relationship
  • Formal test: is there evidence for our hypothesis?
  • [What fits best?]

Shape of the relationship?

Beyond “it’s correlated,” what do we expect the relationship to look like?

  • linear? — \(y \sim x\)
  • log-linear (exponential)? — \(y \sim \log x\)
  • log-log linear (power function)? — \(\log y \sim \log x\)
  • sigmoidal?

Most common assumption is that number of species \(S\) follows a power relationship with area \(A\):

\[S = cA^z\]

which would give a straight line in a log-log plot.

Getting the data in

Turn class XLSX file into CSV

Done for you – I use package readxl (after renaming your sheets A, B, C…)

library(readxl)
dat.A <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "A")
dat.B <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "B")
dat.C <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "C")
dat.D <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "D")
dat.E <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "E")[, 1:5]
dat.F <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "F")
dat.G <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "G")
dat.H <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "H")
dat.I <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "I")
dat.J <- read_xlsx("species-area data 2024 fin.xlsx", sheet = "J")

SpA.data <- rbind(dat.A, dat.B, dat.C, dat.D, dat.E, dat.F, dat.G, dat.H, dat.I, dat.J)
# write out CSV file that you will be using
write.csv(SpA.data, file = "Species-Area class data 2024.csv", row.names = F)  

Read in the CSV

All you need to do is read in the CSV file, which has the aggregated class data in a single ‘table’ (data frame).

SpA.data <- read.csv("Species-Area class data 2024.csv")

What’s our N…

nrow(SpA.data)
[1] 273

Not too bad… we were hoping for \(N\sim 30\times 10 = 300\) :)

Sample distribution of areas (\(X\))

# calculate new column for stone area = width * length
SpA.data$area <- with(SpA.data, width.cm*length.cm)

Distraction

How do our S, M, L classes come out in terms of area?

More Distraction (1)

How do S, M L compare between groups?

More Distraction (2)

With log scale \(x\)-axis:

Sample distribution of counts (\(Y\))

Linear plot: \(Y \sim X\)

Log-linear plot: \(Y \sim \log X\)

Log-log plot: \(\log Y \sim \log X\)

\(\log 0\) is undefined: add 1 to all counts.

Spearman rank correlation

Recall that rank correlation is unaffected by log transformation: the logarithm is ‘rank-preserving’ i.e. does not change the rank order of the values. The whole point of using a rank correlation is that it does not assume a linear relationship.

cor.test( ~ n.species + area, data = SpA.data, method = "spear")

    Spearman's rank correlation rho

data:  n.species and area
S = 1294789, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6181716 

Fit log-log linear regression

As an alternative to Spearman rank correlation test.

fit.1 <- lm(log(n.species+1) ~ log(area), data = SpA.data)
summary(fit.1)

Call:
lm(formula = log(n.species + 1) ~ log(area), data = SpA.data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.40189 -0.30246 -0.00392  0.34005  1.31639 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.62283    0.11206  -5.558 6.53e-08 ***
log(area)    0.25679    0.01984  12.946  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4662 on 271 degrees of freedom
Multiple R-squared:  0.3821,    Adjusted R-squared:  0.3798 
F-statistic: 167.6 on 1 and 271 DF,  p-value: < 2.2e-16

For Geeks (i.e., BS2004)

Recall that for count data, we shouldn’t really be fitting a Linear Model (LM), because the residuals won’t be normally distributed. This is particularly so if the counts are low (small numbers of species). Count data are usually best modelled using a Poisson distribution, so we could try a Poisson Generalised Linear Model (GLM).

I’ll be fitting \(Y \sim \log X\), i.e., log-transform area but not the species counts.

Poisson GLM

fit.3 <- glm(n.species ~ log10(area), SpA.data, family = poisson())
summary(fit.3)

Call:
glm(formula = n.species ~ log10(area), family = poisson(), data = SpA.data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8671  -0.9467  -0.1739   0.5052   3.2596  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.34346    0.26994  -8.681   <2e-16 ***
log10(area)  1.09712    0.09794  11.202   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 433.22  on 272  degrees of freedom
Residual deviance: 285.12  on 271  degrees of freedom
AIC: 789.35

Number of Fisher Scoring iterations: 5

Poisson fit line

The line shows the predictions from the Poisson model (GLM). Looks reasonable…