Example 1

Load data and plot raw data

Please set the directory, unfold the file and load the CSV file “Lesser-Antilles.csv”. Visualize the data and understand the structure of the file.

1.

Plot a scatterplot for each taxon of your data (birds, bats, butterflies, amphibians). It is always crucial to visualize the raw data before running any model.

lesser_ants <- read.table("data_applied_island_biogeo_exercise/data/Lesser-Antilles.csv", header = TRUE, sep = ";")

#par(mfrow=c(2,2)) #to plot a grid 2x2 of plots

scatplot_birds <- plot(lesser_ants$Area_km2, lesser_ants$Birds)

Build the scatter plots also for bats, butterflies and herpetiles, compare them and see if any patterns occur. You could notice that different taxa have different sensitivity to area.

2.

Then build again the scatter plot but scaling the variables in a log scale. If the power model is reasonable the scatterplot should look like a line. These scatter plots help us to understand if the power model is a good description of the relationship without even looking at the numbers.

scatplot_birds_loglog <- plot(lesser_ants$Area_km2, lesser_ants$Birds, log = "xy")

Build a log scatterplot for each taxon and see what are the differences.

3.

Build a model lm(log(Species) ~ log(Area_km2)) for each taxon and look at the summary. We will notice 3 main important values:

  • Estimated intercept, that corresponds to log(c); exponentiating the value we will understand the number of species for unit of area.

  • Estimate of the coefficient of log(Area), that corresponds to z.

  • R squared, which percentage of the log(Species) is explained by log(Area).

model_birds_power <- lm(log(Birds) ~ log(Area_km2), data = lesser_ants)
summary(model_birds_power)
## 
## Call:
## lm(formula = log(Birds) ~ log(Area_km2), data = lesser_ants)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46352 -0.16692  0.08539  0.20075  0.30686 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.10471    0.23813   8.839 9.16e-08 ***
## log(Area_km2)  0.20529    0.04535   4.526 0.000298 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2616 on 17 degrees of freedom
## Multiple R-squared:  0.5465, Adjusted R-squared:  0.5199 
## F-statistic: 20.49 on 1 and 17 DF,  p-value: 0.0002984

Build a model for each taxon and compare the values of the different models. Could you understand how different taxa could be differently influenced by area and why?

4.

Please fit also a linear and an exponential model to the data. Is any of those models a better representation of the species-area relationship for this dataset? Again, do it for the other taxa as well (butterflies, herpetiles).

So fit a linear model, without transformation of the variables, for each taxon. Then fit also an exponential model, transforming log just the dependent variable.

model_birds_linear <- lm(Birds ~ Area_km2, data = lesser_ants)
summary(model_birds_linear)
## 
## Call:
## lm(formula = Birds ~ Area_km2, data = lesser_ants)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8794 -4.9552 -0.2617  4.4159 11.5431 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 19.519198   2.062778   9.463 3.46e-08 ***
## Area_km2     0.016133   0.004005   4.028 0.000873 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.856 on 17 degrees of freedom
## Multiple R-squared:  0.4883, Adjusted R-squared:  0.4582 
## F-statistic: 16.22 on 1 and 17 DF,  p-value: 0.0008726
model_birds_exponential <- lm(log(Birds) ~ Area_km2, data = lesser_ants)
summary(model_birds_exponential)
## 
## Call:
## lm(formula = log(Birds) ~ Area_km2, data = lesser_ants)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43021 -0.21589  0.00855  0.21452  0.45029 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.9386705  0.0866700   33.91  < 2e-16 ***
## Area_km2    0.0006276  0.0001683    3.73  0.00167 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2881 on 17 degrees of freedom
## Multiple R-squared:   0.45,  Adjusted R-squared:  0.4177 
## F-statistic: 13.91 on 1 and 17 DF,  p-value: 0.001667

Compare the information that models built in different ways give to you.

5.

for the exponential and power model, we have to predict the model ourselves, so we first create a sequence with the minimum and maximum area values in steps of 10, and then predict the model to these area values and finally add these values to the plot using the lines() function.

## Birds
plot(lesser_ants$Area_km2, lesser_ants$Birds)

# add model trendline
# linear
abline(model_birds_linear) # with the abline() function, you can add a linear model prediction (here: from the linear model object you created above)


# exponential
areavalues <- seq(0, 1500, 10)
predicted_birds_exponential <- exp(predict(model_birds_exponential, data.frame(Area_km2 = areavalues)))
lines(areavalues, predicted_birds_exponential, lwd=2, col = "red", xlab = "Area [km ]", ylab = "# of Birds")

# Power
predicted_birds_power <- exp(predict(model_birds_power, data.frame(Area_km2 = areavalues)))
#lines(areavalues, predicted_birds_power, lwd=2, col = "blue", xlab = "Area [km ]", ylab = "# of Birds")
lines(areavalues, predicted_birds_power, lwd=2, col = "blue")

Use the function dev.off()to close the plots

Example 2

load data

Load the data for assessing the relationship of bird species richness and forest fragment size in Ghana

ghana <- read.table("data_applied_island_biogeo_exercise/data/Ghana.csv", header = TRUE, sep = ";")
ghana_unsurveyed <- read.table("data_applied_island_biogeo_exercise/data/Ghana_unsurveyed.csv", header = TRUE, sep = ";")

1.

Using these data, generate a scatterplot using log(area) as the independent variable and log(bird species richness) as the dependent variable. Fit the classic power model to the log-log scatterplots and assess the model summary.

scatplot_ghana_loglog <- plot(ghana$Area_ha, ghana$Species, log = "xy")

-> Try and Build the Power model (linear model with both variables logarithmized, see Example 1 in for an example)

2.

Generate a second scatterplot, but this time only the area variable should be logarithmized. Use this scatterplot to fit a semi-log model. How does the model fit vary when compared to the power model?

-> than build a Semi-log model (only the area variable is logarithmized)

3.

Now have a look at the table with the unsurveyed forest fragments. How many bird species can you expect to find there based on the log-log and semi-log models fitted above?

(Hint: use the predict() function to predict your two models to the areas of these sites.)

Number of species at unsurveyed sites for this, we can save the area values at unsurveyed sites as a vector, then we predict the model to this vector (and use round to avoid having too many digits) for semi-log model:

unsrv_areas <- data.frame(Area_ha = ghana_unsurveyed$Area_ha) 

round(predict(model_ghana_semilog, unsrv_areas), digits = 0)
##  1  2  3 
## 21 12 26

please do this also for the power model (remember that you need to exponentiate the resulting values because your model predicts the log(S), (i.e., use the exp() function)

4.

The Kumanin and South Fumansu FR areas are both partly located within gold mining concessions. Gold mining would lead to the destruction of 50% of the Kumanin and 90% of the South Fumansu FR sites. How many bird species can you predict to go locally extinct because of the mining operations?

We can use the same approach as in 3.: First, we create a vector with the areas we want to estimate (in this case,50% of the Kumanin and 10% of the South Fumansu FR forest patch)

mining_areas <- data.frame(Area_ha = c(ghana[ghana$Forest_Patch == "Kumanin", "Area_ha"] * 0.5,
                                       ghana[ghana$Forest_Patch == "South Fumansu FR", "Area_ha"] * 0.1))

Then we predict the number of species at the now smaller sites using the semilog model

species_at_smaller_sites <- round(predict(model_ghana_semilog, mining_areas), digits = 0)

And now calculate yourself, how many species would go locally extinct at each site, according to your model prediction (You can do this in R, or just use mental math (simple subtraction) :)