library(here)
library(dplyr)
library(lme4)
library(sjPlot)
library(ggplot2)
library(tidyr)
library(patchwork)
library(MuMIn)
library(lmerTest)
cabralarch <- read.csv(here("Week_11", "Data", "cabral_arch_data.csv"))
cabralisland <- read.csv(here("Week_11", "Data", "cabral_island_data.csv"))
To start, let’s consider that the data are naturally organized into groups (archipelagoes), and we should account for this structure (non-independence) in the model. In addition, some of the predictors are defined at the scale of the archipelago, which means we need an archipelago random effect in order to not pseudoreplicate when testing those predictors. Make a model with species richness as the response, and with a random effect for Archipelago. Species richness could be modeled as discrete count data (e.g., a negative binomial distribution), but we’ll finish covering GLMMs later; for now you can use log(Richness+1) to get a pretty normal looking response. What proportion of the variation in species richness occurs at the archipelago scale, and what proportion occurs within archipelagoes?
# species richness response, random effect for archipelago
cabral.richness = lmer(log(Species+1) ~ (1|Archipelago), data = cabralisland)
# fit an intercept, i.e. the mean of the data, and then let that intercept vary randomly between archipelago
summary(cabral.richness)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log(Species + 1) ~ (1 | Archipelago)
## Data: cabralisland
##
## REML criterion at convergence: 501.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.1507 -0.5357 0.0796 0.5790 1.9455
##
## Random effects:
## Groups Name Variance Std.Dev.
## Archipelago (Intercept) 0.6408 0.8005
## Residual 0.8243 0.9079
## Number of obs: 174, groups: Archipelago, 23
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 4.7685 0.1865 18.9104 25.57 3.94e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The random effects section will report the variance estimates for any random effects terms in the model. The fixed effects section will report the parameter estimates for any fixed effects terms in the model.
The proportion of variation in species richness that occurs at the archipelago scale is 64%, and the residual variance is 82%.
Which archipelagoes are particularly diverse, and which are depauperate?
ranef(cabral.richness)
## $Archipelago
## (Intercept)
## Aldabra 0.28121897
## Azores 0.36773210
## Balearic Islands 1.18183176
## Canaries 1.45904002
## Cape Verde -0.22621387
## Cook islands -0.22591288
## Dutch Caribbean 0.91678059
## Galapagos 0.09980580
## Hawaii -0.09227461
## Iles Crozet -1.16604580
## Inner Seychelles 0.06058546
## Juan Fernandez 0.06298814
## Kuril Islands 0.12453644
## Madeira 0.74721555
## Marianas -0.11819580
## Marquesas 0.10645794
## North. Cal. Channel Islands 0.67822271
## Phoenix Islands -1.35368520
## Pitcairn Islands -0.94759395
## Prince Edward Islands -0.90859247
## Revillagigedo Islands -0.71128117
## Society Islands -0.02387580
## Tristan da Cunha -0.31274390
##
## with conditional variances for "Archipelago"
plot_model(cabral.richness, type ="re")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## ℹ The deprecated feature was likely used in the sjPlot package.
## Please report the issue at <https://github.com/strengejacke/sjPlot/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Canaries and Balaeric Islands are particularly diverse with the highest positive values (they have high species richness in comparison to other islands), while the Phoenix Islands and Iles Crozet are depauperate.
Now let’s think about the six predictors. Make some exploratory plots of the effect of each variable on richness. You’ll need to merge the datasets. Think about which predictors might need to be transformed for use in a linear model.
cabralmerged <- left_join(cabralisland, cabralarch, by = "Archipelago")
# pivot long
cabrallong <- pivot_longer(cabralmerged, c(Area, Elev, Temp, number.islands, distance, age), names_to = "category", values_to = "values")
ggplot(cabrallong, aes(values, Species)) +
geom_point() +
facet_wrap(~ category, scales = "free")
logcabrallong <- cabrallong %>%
mutate(logvalues = log(values))
ggplot(logcabrallong, aes(logvalues, log(Species))) +
geom_point() +
facet_wrap(~ category, scales = "free")
Construct mixed model(s) that test the roles of the six predictors in explaining species richness. Plot fitted effects (fixed and random), plus model diagnostics. Calculate how much variation the predictors explain, at the two different scales in the data (island and archipelago). I.e., present R2 values for the two scales. Also, how much of the total variation have they explained, according to R2GLMM(m)?
# all variables richness model
cabral.richness.modall = lmer(log(Species+1) ~ log(Area) + log(Elev + 1) + log(Temp) + (1|Archipelago) + log(number.islands) + log(distance) + log(age), data = cabralmerged)
summary(cabral.richness.modall)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log(Species + 1) ~ log(Area) + log(Elev + 1) + log(Temp) + (1 |
## Archipelago) + log(number.islands) + log(distance) + log(age)
## Data: cabralmerged
##
## REML criterion at convergence: 309.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.6881 -0.4890 0.0988 0.5617 2.0021
##
## Random effects:
## Groups Name Variance Std.Dev.
## Archipelago (Intercept) 0.2752 0.5246
## Residual 0.2390 0.4889
## Number of obs: 174, groups: Archipelago, 23
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.79175 0.71416 28.82931 2.509 0.01799 *
## log(Area) 0.32851 0.03223 161.03491 10.192 < 2e-16 ***
## log(Elev + 1) 0.10453 0.03968 166.66229 2.635 0.00922 **
## log(Temp) 0.87107 0.15552 85.47693 5.601 2.54e-07 ***
## log(number.islands) 0.09998 0.18204 18.59259 0.549 0.58939
## log(distance) -0.27510 0.07445 20.07228 -3.695 0.00143 **
## log(age) 0.10516 0.06422 19.02819 1.638 0.11795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) lg(Ar) l(E+1) lg(Tm) lg(n.) lg(ds)
## log(Area) 0.123
## log(Elev+1) -0.314 -0.757
## log(Temp) -0.504 -0.260 0.344
## lg(nmbr.sl) -0.375 -0.031 0.032 0.029
## log(distnc) -0.534 0.132 -0.096 -0.267 -0.081
## log(age) -0.174 0.047 -0.064 -0.061 -0.315 0.252
plot_model(cabral.richness.modall, type ="re")
fixef(cabral.richness.modall)
## (Intercept) log(Area) log(Elev + 1) log(Temp)
## 1.79175009 0.32851415 0.10453193 0.87107414
## log(number.islands) log(distance) log(age)
## 0.09997955 -0.27510497 0.10515856
par(mfrow = c(3,2))
areap <- plot_model(cabral.richness.modall, type = "eff", terms = "Area")
elevp <- plot_model(cabral.richness.modall, type = "eff", terms = "Elev")
tempp <- plot_model(cabral.richness.modall, type = "eff", terms = "Temp")
islandsp <- plot_model(cabral.richness.modall, type = "eff", terms = "number.islands")
distancep <- plot_model(cabral.richness.modall, type = "eff", terms = "distance")
agep <- plot_model(cabral.richness.modall, type = "eff", terms = "age")
(areap + elevp + tempp) / (distancep + agep + islandsp)
pd1<- plot_model(cabral.richness.modall, type="diag")[[1]]
pd2<- plot_model(cabral.richness.modall, type="diag")[[2]]
pd1+ pd2
#R^2 = 1 - variance of full model/ variance of model with not predictors
1 - VarCorr(cabral.richness.modall)$Archipelago[1]/VarCorr(cabral.richness)$Archipelago[1]
## [1] 0.5705734
#sigma() extracts the residual standard deviation from the model.
1 - (sigma(cabral.richness.modall)^2)/(sigma(cabral.richness)^2)
## [1] 0.710061
#total variation have they explained, according to R2GLMM(m)
r.squaredGLMM(cabral.richness.modall)
## R2m R2c
## [1,] 0.6655305 0.8445342
57% of the variation between archipelagos is explained by the predictors. 71% of the variation within islands in an archipelago is explained by the predictors. The fixed effects explained 67% of the total variation and the random effects explained 84% of the variation.
Use appropriate hypothesis tests (as described in lecture) to test the significance of the predictors. How do you interpret the results of these tests, and the effects plots, in light of hypotheses for what controls species richness in islands and archipelagos? What are the denominator degrees of freedom for each predictor? This is essentially telling you how much replication there is for that predictor, minus the number of fitted parameters for that predictor. Do the denominator df make sense? Why or why not?
anova(cabral.richness.modall, ddf = "Satterthwaite", type = 2)
## Type II Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## log(Area) 24.8240 24.8240 1 161.035 103.8734 < 2.2e-16 ***
## log(Elev + 1) 1.6588 1.6588 1 166.662 6.9410 0.009218 **
## log(Temp) 7.4969 7.4969 1 85.477 31.3702 2.542e-07 ***
## log(number.islands) 0.0721 0.0721 1 18.593 0.3016 0.589391
## log(distance) 3.2634 3.2634 1 20.072 13.6553 0.001426 **
## log(age) 0.6408 0.6408 1 19.028 2.6815 0.117952
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Is the model we’ve used the best model? Often I just stick with one big model when the ratio of data to parameters is pretty good. But one might instead prefer to find the best model(s), while accounting for model selection uncertainty. Use AICc in some capacity to assess which predictors are important, what the ‘best’ model is, and how sure you are about what the best model is. The details of how you do it are up to you, as long as it seems justifiable. Remember to set REML=FALSE when comparing models with different sets of predictors.