1 Instructions:

R-scripts and files used to generate this report are available at: https://github.com/Bioshifts/MethodologicalAdjustment

Code demonstrating the entire processes is bellow, including data processing and decisions made along the way. Note that code is hidden by default – considering that most would be interested in the outputs themselves. However, if you wish to see code you can expand all code chunks by clicking at the button at the top right corner of this page, and selecting “Show All Code”. Alternatively, you may want to expand code chunks case-by-case by clicking at the button code at the beginning of each code chunk.

2 The corrected shift data:

The bioshifts v1 data with corrected range shifts using the methods described bellow can be found at the group’s google drive under: “BIOSHIFTS Working Group/DATA/Bioshifts_v1/biov1_method_corrected_shifts.csv”

3 The idea:

Previous explorations of the Bioshifts database have identified immense influence of methods on the observed range shift values. This suggests that methods can affect the potential of range shift detection. Here we propose a way for dealing with methodological variation, with the ultimate goal to generate range shift measurements that are free from this variation and so comparable across studies.

4 The approach:

Our approach consisted of fitting a class- and study-level linear mixed effect model, which we use to correct observed range shift values (Figure 1).

Figure 1. Conceptual approach for accounting for methodological variation in species range shifts. Large dots represent study-level range shifts (e.g., average within class and study range shifts). Small dots represent species level range shifts (i.e., raw shifts extracted from studies). The thicker line represents the fit of a study-level linear mixed-effect model (fixed-effect). Thinner lines represent the class-level fit (random intercept).
Figure 1. Conceptual approach for accounting for methodological variation in species range shifts. Large dots represent study-level range shifts (e.g., average within class and study range shifts). Small dots represent species level range shifts (i.e., raw shifts extracted from studies). The thicker line represents the fit of a study-level linear mixed-effect model (fixed-effect). Thinner lines represent the class-level fit (random intercept).

Pre-processing
Before modeling, we removed studies with low-species number ( e.g. < 3 ) and variables that are co-linear with climate velocity (e.g. start year and duration). We do not include parameter (leading edge, trailing edge, centroid) because methods do not differ across range position (i.e., assume effect of method is the same across range position). It’s important to note that the specific methodological variables considered may vary depending on the version or the Bioshifts database being studied.

We expect methods affect the magnitude of observed range shifts, and that they have a lesser influence on the direction of shift (a detected southwards range shift that is actually a northward shift should be rare). Therefore, we fitted models using absolute shift values.

1) Fixed effect structure of the models:

We fitted separate models based on the gradient (latitude or elevation). We do not assign any particular weights to the data during this step. The fixed effect structure of these models without weights is as follows:

\[ StudyLevelShift \sim NtemporalUnits + Extent + GrainSize + PrAb + Quality \]

NtemporalUnits represents the number of sampling years. Extent represents the total latitudinal extent of the study area for latitudinal shifts, or total elevation range at the study area for elevation shifts. GrainSize is an ordinal variable representing the spatial resolution of the data used to generate range shift estimates. Levels of the variable GrainSize are: small (< 1km2), moderate (> 1km2 and < 10km2), large (> 10 km2), and very large (> 100 km2). PrAb is a two levels variable representing whether the observation is of a shift in occupancy or a shift in abundance. Quality represents whether how the uncertainty coming from the geolocalisation of the sites was taken into account. Levels of the variable quality are raw (the raw data were used without attempting to account for spatial/temporal biases), resurveyed (the same sites were used through time), and balanced (a method was used to account for sampling biases).

Two different statistical models are used at this step:

2) Gamma vs Gaussian models:

Gamma Model: This model uses the glmmTMB package and is fitted with a Gamma distribution and a log link function. The Gamma distribution is suitable for positive-valued data, and the log link function transforms the response variable.

Gaussian Model: This model uses both the glmmTMB package and the nlme package and is fitted with a Gaussian distribution and identity link. The response variable in this case is the log-transformed absolute shifts at the study-level. We fitted Gaussian models with the two packages because the nlme packages allows greater flexibility in terms of weighting structure (see bellow).

3) Random effect structure: Random intercept vs random slope
We included random effect of Class on the intercept because methods might differ across classes. We also constructed a model with a random slope to account for potential variability between different taxonomic Classes:

Random Intercept only (1|Class):
This model considers Class as a random intercept. It means that the effect of Class (grouping of similar entities) is allowed to vary randomly across the data, accounting for the differences between the taxonomic classes.

Random Intercept and Random Slope (Extent|Class):
This model considers Extent (e.g., extent of geographical area covered in a study) as a random slope within each Class. It means that the effect of Extent can vary randomly across the data, and it can differ between the different taxonomic classes.

We selected the random structure in model with the lowest AIC, and a weight structure is applied to the selected model.

4) Adding Weights to the Model:

We explore multiple sources of variability in data using weights, given that study-level shifts differ in variability. In this step, we contrast different weight schemes. Overall, these models aim to identify the most appropriate statistical approach and weighting scheme for analyzing the data on range shifts and understanding the factors that influence these shifts.

Weighting Precision Around Mean: The weighting parameter used is the inverse of the study-level shift variability. This means that observations with more precise or less variable shift measurements are given higher weights during model fitting.

\[ weights = \frac{1}{shiftVar} \]

Weighting Taxa Diversity: The weighting is based on taxa diversity, meaning that species richness within a particular class and study is taken into account during model fitting, and observations with more species having higher weights during model fitting.

\[ weights = NtaxaClass \]

For both Precision Around Mean and Taxa Diversity, four variance functions (constant, fixed, power, and exponential) were applied and contrasted to determine the most suitable one based on AIC and inspection of residuals.

Weighting Two Sources of Variability: Here, two sources of variability are combined as weights in the model. The first source is Taxa Diversity, and the second source is the number of temporal units (e.g., time periods) used in the analysis.

\[ weights = varComb(varExp(NtaxaClass), varExp(NtemporalUnits)) \]

5) Applying correction to species-level range shifts:

We used predicted shift values obtained from the study-level models. These represent the expected range shift given the set of methodological conditions based on the analyses. To better understand and interpret these predicted values, we standardized them to a reference level. This reference level represents ideal methodological conditions, which can serve as a benchmark for comparison.

Two different methods were used for standardization:

𝚫Difference: In this method, we calculated the difference between predicted study-level shift and the reference value. By subtracting the reference value from the predicted study-level shift, we obtain the standardized value (Δ), which tells us how much the observed shift deviates from the ideal condition.

\[ 𝚫Difference = PredictedStudyLevel - PredictedIdeal \]

𝚫Ratio: In the 𝚫Ratio, we calculated the ratio between the predicted study-level shift and the reference value. Dividing the predicted study-level shift by the reference value gives us a ratio that of the observed shift compared to the ideal condition. If the standardized values are close to 1, it indicates that the model’s predictions are similar to the ideal conditions.

\[ 𝚫Ratio = \frac{PredictedStudyLevel}{PredictedIdeal} \]

Finally, we calculate corrected species-level shift using the two methods as:

\[ CorrShiftDiff = ObservedShift - 𝚫Difference \]

\[ CorrShiftRatio = ObservedShift * 𝚫Ratio \]

After applying correction to species range shifts using the 𝚫Difference method, predicted absolute values were back-transformed to negative values when appropriate.


5 Setup

library(tidyverse)
library(nlme)
library(car)
library(googledrive)
library(rgdal)
library(pbapply)
library(DHARMa)
library(lme4)
library(statmod)
library(corrplot)
library(glmmTMB)
library(knitr)
library(DT)
library(ade4)
library(FD)
library(plotly) 
library(dbscan)
library(ggpubr)
library(R.utils)

rm(list=ls())
theme_set(theme_bw())

# function to find the mode of each predictor
getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}

# Range values between 0 and 1
range01 <- function(x){(x-min(x))/(max(x)-min(x))}

# Create a table with AIC and R2 from a model list and vector with model names
source("scripts/aic_r2_table.R")

# Predict fixed effect
source("scripts/predict_fixed.R")

6 Geo data

if(!file.exists(here::here("data/bioshiftsv3_metadata.csv"))){
  
  # # From Google drive
  # drive_path <- readWindowsShortcut("G:/My Drive.lnk")$pathname
  # bioshifts_path <- readWindowsShortcut(paste0(drive_path,"\\BIOSHIFTS Working Group.lnk"))$pathname
  # shp_path <- here::here(bioshifts_path,("DATA/BIOSHIFTS_v3/CleanDatabasev3/ShapefilesBioShiftsv3"))
  # all_shps <- list.files(shp_path,pattern = "dbf", full.names = TRUE)
  
  # From local folder
  all_shps <- list.files("C:/Users/brunn/ShadowDrive/CreateGeodatabaseBioShifts/Data/ShapefilesBioShiftsv3",
                         pattern = "shp", full.names = TRUE)
  
  rs_meta <- pblapply(all_shps, terra::vect)
  rs_meta <- pblapply(rs_meta, data.frame)
  
  rs_meta <- data.table::rbindlist(rs_meta)
  
  write.csv(rs_meta, here::here("data/bioshiftsv3_metadata.csv"),row.names = FALSE)
  
} else {
  
  rs_meta <- read.csv("data/bioshiftsv3_metadata.csv")
  
}

7 Data preparation

# Read in data
rs <- read.delim("data/Shifts2018_checkedtaxo.txt")
rs <- rs[-which(rs$Type %in% c("BAT","LON")),]
rs$Type[rs$Type == "HOR"] <- "LAT"
length(unique(rs$ID)) # 322
## [1] 322
# add geospatial data
rs_meta$ID <- rs_meta$Name

# check if all IDs in v1 exist in geospatial data
all(rs$ID %in% rs_meta$ID) # No
## [1] FALSE
# what is missing?
unique(rs$ID[which(!rs$ID %in% rs_meta$ID)])
##  [1] "A101_P1" "A120_P1" "A124_P1" "A127_P1" "A140_P1" "A148_P1" "A150_P1"
##  [8] "A175_P1" "A175_P2" "A182_P1" "A189_P1" "A197_P1" "A198_P1" "A200_P1"
## [15] "A210_P1" "A223_P1" "A231_P1" "A3_P1"   "A45_P1"  "A50_P1"  "A73_P1" 
## [22] "A91_P1"  "A96_P1"  "A97_P1"
# these are all problematic studies. remove them all
rs <- rs %>%
  filter(ID %in% rs_meta$ID)

# merge
rs <- merge(rs,rs_meta,by="ID",all.y = TRUE) # all.y to keep only studies with geospatial data

# Get Extent (=Lat/Ele extent)
rs$Extent <- rs$LatExtentk
rs$Extent[which(rs$Type=="ELE")] <- rs$EleExtentm[which(rs$Type=="ELE")]

# Missing extent
tmp <- rs %>%
  filter(Type == "ELE" & is.na(Extent)) 
unique(tmp$ID)  
## character(0)
# log transform Lat/Ele extent
rs$LogExtent <- log(rs$Extent)

#remove bacteria and fungi
table(rs$Kingdom)
## 
## Animalia Bacteria    Fungi  Plantae 
##    15591        1      106    14369
rs <- rs[-which(rs$Kingdom %in% c("Bacteria","Fungi")),]

#remove taxa not identified to the species level (genus or family level data) or with no valide names
rs <- rs[-which(is.na(rs$Class)),]
rs <- rs[-which(rs$valid_species_name=="no"),]

# new coding scheme for type and param
rs$Gradient <- rs$Type
rs$Position <- rs$Param

## make new var for unique study id + class 
rs$Source = rs$ID 
rs$ID = paste(rs$Source, rs$Gradient, rs$Class, sep = "_")

## log transform area 
# names(rs)
rs$LogArea = log(rs$ID.area)
hist(rs$LogArea)

## group irregular sampling with multiple sampling  
rs$Sampling = ifelse(rs$Sampling %in% c("IRR","MULTIPLE(continuous)"),"MULT", rs$Sampling)
table(rs$Sampling)
## 
## CONTINUOUS       MULT        TWO 
##       3083       1246      25483
## order factors in a logical way
table(rs$Grain_size)
## 
##      large   moderate      small very_large 
##       6853      10655       9746       2557
rs$Grain <- factor(rs$Grain_size, levels = c("small", "moderate", "large","very_large"))
rs$ContinuousGrain <- 
  as.numeric(
    as.character(
      ifelse(
        rs$Grain == "small", 1, 
        ifelse(rs$Grain == "moderate",
               2, #10,
               ifelse(rs$Grain == "large",
                      3, #100,
                      4))))) #we have a few very large in the full database

hist(rs$ContinuousGrain)

rs$ContinuousGrain2 <- 
  as.numeric(
    as.character(
      ifelse(
        rs$Grain == "small", 1, 
        ifelse(rs$Grain == "moderate",
               10,
               ifelse(rs$Grain == "large",
                      100, 
                      1000))))) #we have a few very large in the full database

rs$ContinuousGrain2 <- rs$ContinuousGrain2

hist(rs$ContinuousGrain2)

#harmonize column names
rs$Quality <- ifelse(rs$Uncertainty_Distribution %in% c("RESAMPLING"),"BALANCED",
                     ifelse(rs$Uncertainty_Distribution %in% c("MODEL","MODEL+RESAMPLING(same)","RESAMPLING(same)+MODEL","RESAMPLING+MODEL"),"BALANCED",
                            ifelse(rs$Uncertainty_Distribution %in% c("DETECTABILITY","RESAMPLING(same)+DETECTABILITY"),"BALANCED",
                                   ifelse(rs$Uncertainty_Distribution %in% c("RESAMPLING(same)"),"RESURVEYED",rs$Uncertainty_Distribution))))

table(rs$Quality) #to be thought about how to split the modalities?
## 
##   BALANCED        RAW RESURVEYED 
##      14016       9440       6356
rs$PrAb <- ifelse(rs$Data %in% c("occurence-based", "occurrence-based"),"occurrence","abundance")
table(rs$PrAb) 
## 
##  abundance occurrence 
##       9426      20386
##compute number of species per class
NtaxaClass <- tapply(rs$ID,rs$ID,length)
rs$NtaxaClass <- NtaxaClass[match(rs$ID,names(NtaxaClass))]

NClass <- table(rs$Source,rs$Class)
NClass <- ifelse(NClass > 0,1,0)
NClass <- apply(NClass,1,sum)
rs$NClass <- NClass[match(rs$Source,names(NClass))]

# Number of temporal units
rs$NtempUnits <- rs$Nperiodes
these <- which(rs$NtempUnits > rs$DUR)
rs$NtempUnits[these] <- rs$DUR[these]

# log NtempUnits
rs$LogNtempUnits <- log(rs$NtempUnits)

## Save the "full data"
rs_full <- rs

## Create study level "full data"
SL_variables <- c("ID","Article_ID","Class","Source",
                  "Grain","ContinuousGrain","ContinuousGrain2",
                  "Quality","PrAb","NtaxaClass","NClass",
                  "LogNtempUnits","NtempUnits","Gradient","Sampling",
                  "Extent","LogExtent")

rs_SL_full <- rs_full[,c(SL_variables)]
rs_SL_full <- na.omit(unique(rs_SL_full))

# any study with more >1 NtempUnits?
test <- rs_SL_full %>%
  group_by(ID) %>%
  summarise(N=length(unique(NtempUnits)))

any(test$N>1)
## [1] FALSE
# No. Good!

# any study with more >1 Extent?
test <- rs_SL_full %>%
  group_by(ID) %>%
  summarise(N=length(unique(Extent)))

any(test$N>1)
## [1] FALSE
# Yes...
# which ones?
these <- unique(test$ID[which(test$N>1)])

# View(rs_SL_full %>% filter(ID %in% these))
# 
# any(duplicated(rs_SL_full$ID))
# 
# rs_SL_full$ID[which(duplicated(rs_SL_full$ID))]
# 
# rs_SL_full <- rs_SL_full %>%
#   group_by(ID) %>%
#   mutate(study_level_shift = mean(SHIFT),
#          study_level_var = var(SHIFT),
#          study_level_sd = sd(SHIFT),
#          study_level_shift_abs = mean(abs(SHIFT)),
#          study_level_var_abs = var(abs(SHIFT)),
#          study_level_sd_abs = sd(abs(SHIFT)),
#          NtempUnits = mean(NtempUnits),
#          LogNtempUnits = mean(LogNtempUnits),
#          Extent = mean(Extent),
#          LogExtent = mean(LogExtent)) %>%
#   select(-SHIFT) 
# 
# dim(rs_SL_full) # 685    

rs_SL_full_lat <- filter(rs_SL_full, grepl("LAT",ID))
rs_SL_full_ele <- filter(rs_SL_full, grepl("ELE",ID))

8 Data selection for study level

head(rs[rs$Source == "A51_P1",]) #varies between 13 to 97 years so better to put a cap on the differences
##                    ID            Publi_name Article_ID Study_ID Type Param
## 27205 A51_P1_ELE_Aves   Arachnothera_juliae        A51       P1  ELE    LE
## 27206 A51_P1_ELE_Aves   Pycnonotus_goiavier        A51       P1  ELE    TE
## 27207 A51_P1_ELE_Aves Eurylaimus_ochromalus        A51       P1  ELE    TE
## 27208 A51_P1_ELE_Aves   Pycnonotus_goiavier        A51       P1  ELE    LE
## 27209 A51_P1_ELE_Aves Pellorneum_pyrrogenys        A51       P1  ELE    TE
## 27210 A51_P1_ELE_Aves     Geokichla_citrina        A51       P1  ELE    TE
##       Trend Sign.trend Unit.trend Azimuth Sign.azim Unit.azim START  END DUR
## 27205   333       <NA>          m      NA        NA      <NA>  1978 2010  33
## 27206     0       <NA>          m      NA        NA      <NA>  1970 2010  41
## 27207   227       <NA>          m      NA        NA      <NA>  1997 2010  14
## 27208 -1015       <NA>          m      NA        NA      <NA>  1970 2010  41
## 27209   475       <NA>          m      NA        NA      <NA>  1996 2010  15
## 27210   591       <NA>          m      NA        NA      <NA>  1998 2010  13
##       Observation Hemisphere HOR LAT LON ELE BAT LE TE O ECO TAX  N Sampling
## 27205        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
## 27206        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
## 27207        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
## 27208        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
## 27209        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
## 27210        <NA>          N   0   0   0   1   0  1  1 0   T   A 58      TWO
##       Nperiodes first_year last_year             Data Uncertainty_Distribution
## 27205         2       1978      2011 occurrence-based                      RAW
## 27206         2       1978      2011 occurrence-based                      RAW
## 27207         2       1978      2011 occurrence-based                      RAW
## 27208         2       1978      2011 occurrence-based                      RAW
## 27209         2       1978      2011 occurrence-based                      RAW
## 27210         2       1978      2011 occurrence-based                      RAW
##       Uncertainty_Parameter Grain_size    ID.area ID.latitude baseline.temp
## 27205                    no      small 0.01847762     6.07545      25.75278
## 27206                    no      small 0.01847762     6.07545      25.75278
## 27207                    no      small 0.01847762     6.07545      25.75278
## 27208                    no      small 0.01847762     6.07545      25.75278
## 27209                    no      small 0.01847762     6.07545      25.75278
## 27210                    no      small 0.01847762     6.07545      25.75278
##       trend.mean trend.sd trend.nb.pixels v.lat.mean  v.lat.sd v.lat.nb.pixels
## 27205 0.01513407       NA               1 0.01568606 0.1160768               1
## 27206 0.01513407       NA               1 0.01568606 0.1160768               1
## 27207 0.01513407       NA               1 0.01568606 0.1160768               1
## 27208 0.01513407       NA               1 0.01568606 0.1160768               1
## 27209 0.01513407       NA               1 0.01568606 0.1160768               1
## 27210 0.01513407       NA               1 0.01568606 0.1160768               1
##       ele.gradient v.ele.mean v.ele.sd     SHIFT   UNIT              old_name
## 27205  0.005318239   2.845691       NA  10.09091 m/year   Arachnothera_juliae
## 27206  0.005318239   2.845691       NA   0.00000 m/year   Pycnonotus_goiavier
## 27207  0.005318239   2.845691       NA  16.21429 m/year Eurylaimus_ochromalus
## 27208  0.005318239   2.845691       NA -24.75610 m/year   Pycnonotus_goiavier
## 27209  0.005318239   2.845691       NA  31.66667 m/year Pellorneum_pyrrogenys
## 27210  0.005318239   2.845691       NA  45.46154 m/year     Geokichla_citrina
##       taxo_level checkedby  Kingdom   Phylum Class         Order        Family
## 27205    Species      Lise Animalia Chordata  Aves Passeriformes Nectariniidae
## 27206    Species      Lise Animalia Chordata  Aves Passeriformes  Pycnonotidae
## 27207    Species      Lise Animalia Chordata  Aves Passeriformes  Eurylaimidae
## 27208    Species      Lise Animalia Chordata  Aves Passeriformes  Pycnonotidae
## 27209    Species      Lise Animalia Chordata  Aves Passeriformes    Timaliidae
## 27210    Species      Lise Animalia Chordata  Aves Passeriformes      Turdidae
##              Genus    Species              New_name      group       Temp
## 27205 Arachnothera     juliae   Arachnothera_juliae vertebrate endotherme
## 27206   Pycnonotus   goiavier   Pycnonotus_goiavier vertebrate endotherme
## 27207   Eurylaimus ochromalus Eurylaimus_ochromalus vertebrate endotherme
## 27208   Pycnonotus   goiavier   Pycnonotus_goiavier vertebrate endotherme
## 27209   Pellorneum pyrrogenys Pellorneum_pyrrogenys vertebrate endotherme
## 27210    Geokichla    citrina     Geokichla_citrina vertebrate endotherme
##       valid_species_name   Name            Article DOI LatCentDeg LonCentDeg
## 27205                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
## 27206                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
## 27207                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
## 27208                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
## 27209                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
## 27210                yes A51_P1 Harris_al_2012_RBZ n/a   116.5626    6.07545
##       Areakm2 LatExtentk EleExtentm Eleminm Elemaxm Elemeanm  Extent LogExtent
## 27205   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
## 27206   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
## 27207   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
## 27208   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
## 27209   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
## 27210   226.2      17.87    3567.62   427.4 3995.01   472.13 3567.62  8.179654
##       Gradient Position Source   LogArea Grain ContinuousGrain ContinuousGrain2
## 27205      ELE       LE A51_P1 -3.991195 small               1                1
## 27206      ELE       TE A51_P1 -3.991195 small               1                1
## 27207      ELE       TE A51_P1 -3.991195 small               1                1
## 27208      ELE       LE A51_P1 -3.991195 small               1                1
## 27209      ELE       TE A51_P1 -3.991195 small               1                1
## 27210      ELE       TE A51_P1 -3.991195 small               1                1
##       Quality       PrAb NtaxaClass NClass NtempUnits LogNtempUnits
## 27205     RAW occurrence         88      1          2     0.6931472
## 27206     RAW occurrence         88      1          2     0.6931472
## 27207     RAW occurrence         88      1          2     0.6931472
## 27208     RAW occurrence         88      1          2     0.6931472
## 27209     RAW occurrence         88      1          2     0.6931472
## 27210     RAW occurrence         88      1          2     0.6931472
## Exclude studies with varying number of years
dur <- paste0(rs$ID,"_",rs$DUR)
dur_details <- aggregate(DUR ~ ID,data=rs,range)
#select studies with less than 2 years of difference
VaryingYears <- dur_details$ID[which(dur_details$DUR[,2] - dur_details$DUR[,1] > 2)] #12 studies to remove
rs <- rs[-which(rs$ID %in% VaryingYears),]

# take the min only for studies with less than 2 years of difference
rs = rs %>%
  group_by(ID) %>%
  mutate(DUR = min(DUR), 
         START = min(START),
         END = min(END)) %>%
  ungroup() 


## get rid of study with insanely large area (and weird study overall)
rs <- filter(rs, LogArea < 9)


## split raw data by gradient at the species level
rs_lat <- filter(rs, Gradient == "LAT")
rs_ele <- filter(rs, Gradient == "ELE")


## Create the study-level dataset
## group by study ID and class and calculate a mean range shift value and variance for each class within studies

rs_SL <- rs[,c(SL_variables,"SHIFT")]
rs_SL <- rs_SL %>%
  group_by(ID) %>%
  mutate(study_level_shift = mean(SHIFT),
         study_level_var = var(SHIFT),
         study_level_sd = sd(SHIFT),
         study_level_shift_abs = mean(abs(SHIFT)),
         study_level_var_abs = var(abs(SHIFT)),
         study_level_sd_abs = sd(abs(SHIFT)),
         NtempUnits = mean(NtempUnits)) %>%
  select(-SHIFT)  %>%
  ungroup() %>%
  distinct()

dim(rs_SL) #596  
## [1] 572  23
## filter to studies with at least X species 
rs_SL <- filter(rs_SL, rs_SL$NtaxaClass >= 3) #LC: here Ntaxa refers to total number of species in a study (not a taxonomic class)
range(rs_SL$NtaxaClass)
## [1]    3 3596
nlevels(factor(rs_SL$ID))
## [1] 374
length(unique(paste(rs_SL$Source, rs_SL$Gradient, rs_SL$Class, sep = "_"))) # 325 studies should remain
## [1] 374
rs_SL <- rs_SL[complete.cases(rs_SL),]

## split data into latitudinal and elevational observations
rs_SL_lat <- filter(rs_SL, rs_SL$Gradient == "LAT")
rs_SL_ele <- filter(rs_SL, rs_SL$Gradient == "ELE")

#a few checks
range(rs_SL_lat$NtaxaClass)
## [1]    3 1199
range(rs_SL_ele$NtaxaClass)
## [1]    3 3596
range(rs_SL_lat$study_level_var)
## [1]   0.02249571 997.55689917
range(rs_SL_ele$study_level_var)
## [1]   0.01085069 568.50323929
table(rs_SL_lat$ContinuousGrain)
## 
##  1  2  3  4 
## 81 44 46 46
table(rs_SL_ele$ContinuousGrain)
## 
##   1   2   3   4 
## 126  29   1   1
table(rs_SL_lat$Quality)
## 
##   BALANCED        RAW RESURVEYED 
##         69        101         47
table(rs_SL_ele$Quality)
## 
##   BALANCED        RAW RESURVEYED 
##         60         41         56
table(rs_SL_lat$Sampling)
## 
## CONTINUOUS       MULT        TWO 
##         43         17        157
table(rs_SL_ele$Sampling)
## 
## CONTINUOUS       MULT        TWO 
##          9          1        147
table(rs_SL_lat$PrAb)
## 
##  abundance occurrence 
##         79        138
table(rs_SL_ele$PrAb)
## 
##  abundance occurrence 
##         34        123
#rs_SL_lat[rs_SL_lat$study_level_var > 1000,]
#var(rs$SHIFT[rs$ID == "A116_P1__Insecta"])

#remove groups with <3 observations TO BE DISCUSSED
#KEEP_lat <- names(which(table(rs_SL_lat$Class) > 3))
#rs_SL_lat <- rs_SL_lat[which(rs_SL_lat$Class %in% KEEP_lat),]

#KEEP_ele <-  names(which(table(rs_SL_ele$Class) > 3))
#rs_SL_ele <- rs_SL_ele[which(rs_SL_ele$Class %in% KEEP_ele),]

write.csv(rs_SL_lat,"outputs/LATData_model.csv",row.names=F)
write.csv(rs_SL_ele,"outputs/ELEData_model.csv",row.names=F)

9 Model fitting

Fit one model per gradient (latitudinal, elevational)

9.1 Latitude

9.1.1 Explore data

rs_SL_lat <- read.csv("outputs/LATData_model.csv")
rs_SL_lat$Class <- factor(rs_SL_lat$Class)

dim(rs_SL_lat)
## [1] 217  23
nlevels(factor(rs_SL_lat$Source))
## [1] 120
nlevels(factor(rs_SL_lat$Class))
## [1] 37
# 120 studies, 37 taxonomic groups 

hist(rs_SL_lat$study_level_shift_abs)

hist(rs_SL_lat$study_level_var)

hist(rs_SL_lat$LogExtent)

hist(rs_SL_lat$ContinuousGrain)

hist(rs_SL_lat$LogNtempUnits, breaks = 50)

ggplot(rs_SL_lat, aes(x=Quality)) + geom_bar()

ggplot(rs_SL_lat, aes(x=PrAb)) + geom_bar()

ggplot(rs_SL_lat, aes(y=Class)) + geom_bar()

ggplot(rs_SL_lat, aes(y=Sampling)) + geom_bar()

corrplot(
  cor(rs_SL_lat[,c("study_level_shift_abs",
                   "study_level_var",
                   "NtempUnits",
                   "LogExtent",
                   "NtaxaClass")]), 
  method = "number",
  type = "lower",
  diag = FALSE)

# we cannot use study_level_var and study_level_shift_abs in the same model
plot(rs_SL_lat$study_level_shift_abs, 1/rs_SL_lat$study_level_var)

plot(log(rs_SL_lat$study_level_shift_abs), log(1/rs_SL_lat$study_level_var)) # do not fit models with study_level_var?

# study_level_var is correlated with Extent
plot(log(1/study_level_var) ~ LogExtent, rs_SL_lat) 

cor.test(rs_SL_lat$LogExtent,1/rs_SL_lat$study_level_var) # do not fit models with study_level_var?
## 
##  Pearson's product-moment correlation
## 
## data:  rs_SL_lat$LogExtent and 1/rs_SL_lat$study_level_var
## t = -2.4162, df = 215, p-value = 0.01652
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.28950808 -0.03005944
## sample estimates:
##        cor 
## -0.1625929
# NtaxaClass is OK
plot(study_level_shift_abs ~ NtaxaClass, rs_SL_lat)

plot(log(study_level_shift_abs) ~ log(NtaxaClass), rs_SL_lat)

plot(log(1/study_level_var) ~ log(NtaxaClass), rs_SL_lat)

# LogNtempUnits is OK
plot(study_level_shift_abs ~ LogNtempUnits, rs_SL_lat)

plot(log(study_level_shift_abs) ~ log(LogNtempUnits), rs_SL_lat)

# we cannot use sampling and LogNtempUnits in the same model
boxplot(rs_SL_lat$LogNtempUnits~rs_SL_lat$Sampling) # do not fit models with sampling

# more correlation tests
boxplot(log(study_level_var) ~ ContinuousGrain, rs_SL_lat)

cor.test(rs_SL_lat$ContinuousGrain,log(rs_SL_lat$study_level_var))
## 
##  Pearson's product-moment correlation
## 
## data:  rs_SL_lat$ContinuousGrain and log(rs_SL_lat$study_level_var)
## t = 1.7343, df = 215, p-value = 0.0843
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.01597405  0.24678402
## sample estimates:
##       cor 
## 0.1174603
plot(log(NtaxaClass) ~ ContinuousGrain, rs_SL_lat)

boxplot(LogExtent ~ ContinuousGrain, rs_SL_lat) 

9.1.2 Compare relationships within species-level and study-level data

9.1.2.1 Absolute values

# N temporal units
p1 <- ggplot(rs_lat, aes(x = NtempUnits, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = NtempUnits, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Extent
p1 <- ggplot(rs_lat, aes(x = LogExtent, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = LogExtent, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Grain size
p1 <- ggplot(rs_lat, aes(x = ContinuousGrain, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data") +
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = ContinuousGrain, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Sampling
p1 <- ggplot(rs_lat, aes(x = Sampling, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = Sampling, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Quality
p1 <- ggplot(rs_lat, aes(x = Quality, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = Quality, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# PrAb
p1 <- ggplot(rs_lat, aes(x = PrAb, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = PrAb, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

9.1.2.2 Raw values

# N temporal units
p1 <- ggplot(rs_lat, aes(x = NtempUnits, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = NtempUnits, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Extent
p1 <- ggplot(rs_lat, aes(x = LogExtent, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = LogExtent, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Grain size
p1 <- ggplot(rs_lat, aes(x = ContinuousGrain, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = ContinuousGrain, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

# Sampling
p1 <- ggplot(rs_lat, aes(x = Sampling, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = Sampling, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

# Quality
p1 <- ggplot(rs_lat, aes(x = Quality, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = Quality, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

# PrAb
p1 <- ggplot(rs_lat, aes(x = PrAb, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_lat, aes(x = PrAb, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

We expect methods affect shift magnitude, and to a way leaser extent they affect direction (?), so we decided to model absolute shifts rather than raw shifts.

9.1.3 Fit models

9.1.4 glmmTMB

9.1.4.1 Gamma vs Gaussian model and Random intercept vs random slope

# Gamma intercept
tmb_lat_gamma_int <- glmmTMB(
  study_level_shift_abs ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (1|Class), 
  family = Gamma(link = "log"),
  data = rs_SL_lat)

# Gamma slope
tmb_lat_gamma_slp <- glmmTMB(
  study_level_shift_abs ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (LogExtent|Class), 
  family = Gamma(link = "log"),
  data = rs_SL_lat)

# Gaussian intercept
tmb_lat_gaussian_int <- glmmTMB(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (1|Class), 
  data = rs_SL_lat)

# Gaussian slope
tmb_lat_gaussian_slp <- glmmTMB(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (LogExtent|Class), 
  data = rs_SL_lat)
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
## Warning in fitTMB(TMBStruc): Model convergence problem; singular convergence
## (7). See vignette('troubleshooting')
###########
# Get results
tmb_list <- list(tmb_lat_gamma_int,
                 tmb_lat_gamma_slp,
                 tmb_lat_gaussian_int,
                 tmb_lat_gaussian_slp)

tmb_names <- c("Gamma intercept",
               "Gamma slope",
               "Gaussian intercept",
               "Gaussian slope")

# AIC and R2 table

DT::datatable(aic_r2_table(tmb_list, tmb_names))
## Loading required package: data.table
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## The following object is masked from 'package:purrr':
## 
##     transpose
## Loading required package: piecewiseSEM
## 
##   This is piecewiseSEM version 2.3.01.
## 
## 
##   Questions or bugs can be addressed to <jlefcheck@umces.edu>.
# Residual distribution
for(i in 1:length(tmb_list)){
  try({
    plot(simulateResiduals(tmb_list[[i]]), main = tmb_names[i])
  })
}

## qu = 0.75, log(sigma) = -2.56479 : outer Newton did not converge fully.
## qu = 0.75, log(sigma) = -2.363777 : outer Newton did not converge fully.

Gaussian intercept is the best model according to AIC. This model also show good residual distribution.

9.1.4.2 Select weight structure

# The selected model
tmb_lat <- tmb_lat_gaussian_int
summary(tmb_lat)
##  Family: gaussian  ( identity )
## Formula:          
## log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +  
##     Quality + PrAb + (1 | Class)
## Data: rs_SL_lat
## 
##      AIC      BIC   logLik deviance df.resid 
##    512.8    543.3   -247.4    494.8      208 
## 
## Random effects:
## 
## Conditional model:
##  Groups   Name        Variance Std.Dev.
##  Class    (Intercept) 0.2866   0.5354  
##  Residual             0.4686   0.6845  
## Number of obs: 217, groups:  Class, 37
## 
## Dispersion estimate for gaussian family (sigma^2): 0.469 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -2.44715    0.60584  -4.039 5.36e-05 ***
## LogNtempUnits     -0.19402    0.06184  -3.138  0.00170 ** 
## LogExtent          0.50629    0.08219   6.160 7.26e-10 ***
## ContinuousGrain    0.09772    0.04803   2.035  0.04190 *  
## QualityRAW         0.20472    0.12714   1.610  0.10736    
## QualityRESURVEYED -0.17089    0.14218  -1.202  0.22939    
## PrAboccurrence    -0.40687    0.14112  -2.883  0.00394 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# NtaxaClass model
tmb_lat_class <- try(
  update(tmb_lat, weights = NtaxaClass))

# summary(tmb_lat_class)

# 1/study_level_var model
tmb_lat_var <- try(
  update(tmb_lat, weights = range01(log(1/study_level_var_abs)))) # precision

# NtempUnits model
tmb_lat_temp <- try(
  update(tmb_lat,
         .~ LogExtent + ContinuousGrain + Quality + PrAb + (1 | Class),
         weights = LogNtempUnits))

# NtaxaClass + NtempUnits model
tmb_lat_class_temp <- try(
  update(tmb_lat, 
         .~ LogExtent + ContinuousGrain + Quality + PrAb + (1 | Class),
         weights = log(NtaxaClass) * LogNtempUnits)) # two sources of variability (N species and sample size)


##############################################################

tmb_list <- list(tmb_lat,
                 tmb_lat_var,
                 tmb_lat_class,
                 tmb_lat_temp,
                 tmb_lat_class_temp)

tmb_names <- c("No weights",
               "1/variance",
               "NtaxaClass",
               "NtempUnits",
               "NtaxaClass + NtempUnits")

# AIC and R2 table
DT::datatable(aic_r2_table(tmb_list, tmb_names))
# Dharma residuals
for(i in 1:length(tmb_list)){
  try({
    plot(simulateResiduals(tmb_list[[i]]), main = tmb_names[i])
  })
}
## qu = 0.75, log(sigma) = -2.56479 : outer Newton did not converge fully.
## qu = 0.75, log(sigma) = -2.363777 : outer Newton did not converge fully.
## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,
## : Fitting terminated with step failure - check results carefully

## Warning in newton(lsp = lsp, X = G$X, y = G$y, Eb = G$Eb, UrS = G$UrS, L = G$L,
## : Model was fit with prior weights. These will be ignored in the simulation.
## See ?getSimulations for details

## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

# qqnorm plot
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    qqnorm(resid(tmb_list[[i]]), main = tmb_names[i])
    qqline(resid(tmb_list[[i]]))
  }
}

# histogram residuals 
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    hist(resid(tmb_list[[i]]), main = tmb_names[i], xlab = "Residual")
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    plot(fitted(tmb_list[[i]]),resid(tmb_list[[i]]), main = tmb_names[i],
         xlab="Fitted values",ylab = "Residuals")
  }
}

The model with the lowest AIC is the “1/variance”, followed by the one without weights. All models have acceptable residual distribution.

9.1.5 nlme

nlme only fits Gaussian models. Therefore, here we test random intercept vs random slopes.

9.1.5.1 Random intercept vs random slope

lme_lat_gaussian_int <- try(lme(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb,
  random = ~1|Class,
  data = rs_SL_lat), silent = TRUE)

lme_lat_gaussian_slp <- try(lme(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb,
  random = ~LogExtent|Class,
  data = rs_SL_lat), silent = TRUE)

###########
# Get results
lme_list <- list(lme_lat_gaussian_int,
                 lme_lat_gaussian_slp)

lme_names <- c("Gaussian intercept",
               "Gaussian slope")

# AIC and R2 table
DT::datatable(aic_r2_table(lme_list, lme_names))
# Residual distribution
for(i in 1:length(lme_list)){
  try({
    plot(simulateResiduals(lme_list[[i]]), main = lme_names[i])
  }, silent = TRUE)
}
## Warning in checkModel(fittedModel): DHARMa: fittedModel not in class of
## supported models. Absolutely no guarantee that this will work!

## Warning in checkModel(fittedModel): DHARMa: fittedModel not in class of
## supported models. Absolutely no guarantee that this will work!

The model with a random slope does not converges.

9.1.5.2 Select weight structure

Here we contrast 3 weighting parameters:
1) 1/study_level_var_abs
2) NtaxaClass
3) NtaxaClass + LogNtempUnits
For each of these parameters, we tested multiple variance functions

# The selected model
lme_lat <- lme_lat_gaussian_int
summary(lme_lat)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   533.4977 563.6217 -257.7488
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5507015 0.6948875
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4412740 0.6092196 174 -4.007215  0.0001
## LogNtempUnits     -0.1940158 0.0628066 174 -3.089100  0.0023
## LogExtent          0.5055169 0.0826801 174  6.114133  0.0000
## ContinuousGrain    0.0976624 0.0487850 174  2.001892  0.0468
## QualityRAW         0.2046510 0.1291787 174  1.584248  0.1150
## QualityRESURVEYED -0.1702981 0.1441215 174 -1.181629  0.2390
## PrAboccurrence    -0.4059768 0.1427542 174 -2.843886  0.0050
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.050                                   
## LogExtent         -0.930 -0.071                            
## ContinuousGrain   -0.287 -0.288  0.129                     
## QualityRAW         0.064 -0.050 -0.216  0.068              
## QualityRESURVEYED -0.155  0.167  0.033  0.050  0.447       
## PrAboccurrence    -0.322  0.290  0.092  0.144  0.101  0.086
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.09908495 -0.60288880 -0.03025387  0.62938224  3.27542928 
## 
## Number of Observations: 217
## Number of Groups: 37
##########################
# 1) 1/study_level_var_abs    
my_variable <- as.formula("~ study_level_var_abs")
weight_options <- list(varIdent(form = my_variable),
                       varFixed(my_variable),
                       varPower(form = my_variable),
                       varExp(form = my_variable))

# Fit
lme_lat_var <- lapply(weight_options, function(x){
  update(lme_lat, weights = x)
})

# histogram residuals 
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_var)){
    try(hist(resid(lme_lat_var[[i]]),main = class(weight_options[[i]])[[1]]))
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_var)){
    plot(fitted(lme_lat_var[[i]]),resid(lme_lat_var[[i]]), 
         main = class(weight_options[[i]])[[1]],
         xlab="Fitted values",ylab = "Residuals")
  }
}

# qqnorm plot
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_var)){
    qqnorm(resid(lme_lat_var[[i]]), 
           main = class(weight_options[[i]])[[1]])
    qqline(resid(lme_lat_var[[i]]))
  }
}

# AIC and R2 table
DT::datatable(aic_r2_table(lme_lat_var, sapply(weight_options, function(x) class(x)[[1]])))
# summary model
lapply(lme_lat_var, function(x) summary(x))
## [[1]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   533.4977 563.6217 -257.7488
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5507015 0.6948875
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4412740 0.6092196 174 -4.007215  0.0001
## LogNtempUnits     -0.1940158 0.0628066 174 -3.089100  0.0023
## LogExtent          0.5055169 0.0826801 174  6.114133  0.0000
## ContinuousGrain    0.0976624 0.0487850 174  2.001892  0.0468
## QualityRAW         0.2046510 0.1291787 174  1.584248  0.1150
## QualityRESURVEYED -0.1702981 0.1441215 174 -1.181629  0.2390
## PrAboccurrence    -0.4059768 0.1427542 174 -2.843886  0.0050
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.050                                   
## LogExtent         -0.930 -0.071                            
## ContinuousGrain   -0.287 -0.288  0.129                     
## QualityRAW         0.064 -0.050 -0.216  0.068              
## QualityRESURVEYED -0.155  0.167  0.033  0.050  0.447       
## PrAboccurrence    -0.322  0.290  0.092  0.144  0.101  0.086
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.09908495 -0.60288880 -0.03025387  0.62938224  3.27542928 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[2]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC   logLik
##   794.9419 825.0659 -388.471
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5518732 0.6331793
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~study_level_var_abs 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -1.9935160 0.5764445 174 -3.458297  0.0007
## LogNtempUnits     -0.1010358 0.0718081 174 -1.407026  0.1612
## LogExtent          0.3361240 0.0919585 174  3.655171  0.0003
## ContinuousGrain   -0.0436097 0.0456051 174 -0.956246  0.3403
## QualityRAW         0.7797228 0.1506752 174  5.174858  0.0000
## QualityRESURVEYED  0.2016302 0.1126776 174  1.789444  0.0753
## PrAboccurrence    -0.6587409 0.1809530 174 -3.640397  0.0004
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits      0.015                                   
## LogExtent         -0.906 -0.193                            
## ContinuousGrain   -0.092 -0.320 -0.018                     
## QualityRAW         0.051 -0.040 -0.165 -0.083              
## QualityRESURVEYED  0.479  0.213 -0.631  0.112  0.231       
## PrAboccurrence    -0.053  0.332 -0.250 -0.051  0.208  0.251
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -5.01331454  0.01464133  0.28698506  0.58823027  3.13452231 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[3]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   534.1468 567.6179 -257.0734
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5634731 0.6572764
## 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~study_level_var_abs 
##  Parameter estimates:
##      power 
## 0.03520022 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4068074 0.5986633 174 -4.020302  0.0001
## LogNtempUnits     -0.1957462 0.0621496 174 -3.149597  0.0019
## LogExtent          0.4909565 0.0817333 174  6.006814  0.0000
## ContinuousGrain    0.0935788 0.0488264 174  1.916562  0.0569
## QualityRAW         0.2410644 0.1288646 174  1.870680  0.0631
## QualityRESURVEYED -0.1736397 0.1405396 174 -1.235522  0.2183
## PrAboccurrence    -0.4070470 0.1443815 174 -2.819246  0.0054
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.046                                   
## LogExtent         -0.927 -0.078                            
## ContinuousGrain   -0.284 -0.297  0.126                     
## QualityRAW         0.068 -0.043 -0.218  0.052              
## QualityRESURVEYED -0.126  0.174 -0.003  0.049  0.444       
## PrAboccurrence    -0.307  0.294  0.067  0.132  0.105  0.096
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.30671544 -0.55750402  0.02781574  0.77350311  3.16741416 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[4]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   523.1192 556.5902 -251.5596
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5587764 0.6408116
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~study_level_var_abs 
##  Parameter estimates:
##       expon 
## 0.001857796 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.3241216 0.5800041 174 -4.007078  0.0001
## LogNtempUnits     -0.1730623 0.0595588 174 -2.905741  0.0041
## LogExtent          0.4622914 0.0789066 174  5.858718  0.0000
## ContinuousGrain    0.1080122 0.0467968 174  2.308112  0.0222
## QualityRAW         0.1790855 0.1233973 174  1.451291  0.1485
## QualityRESURVEYED -0.1551612 0.1359925 174 -1.140955  0.2555
## PrAboccurrence    -0.2885558 0.1404481 174 -2.054536  0.0414
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.049                                   
## LogExtent         -0.925 -0.082                            
## ContinuousGrain   -0.295 -0.273  0.131                     
## QualityRAW         0.064 -0.053 -0.205  0.045              
## QualityRESURVEYED -0.144  0.174  0.018  0.048  0.446       
## PrAboccurrence    -0.307  0.310  0.061  0.156  0.076  0.097
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.29259075 -0.58919682  0.03299913  0.70020347  3.13939611 
## 
## Number of Observations: 217
## Number of Groups: 37
##########################
# 2) NtaxaClass   
my_variable <- as.formula("~ 1/NtaxaClass")
weight_options <- list(varIdent(form = my_variable),
                       varFixed(my_variable),
                       varPower(form = my_variable),
                       varExp(form = my_variable))

# Fit
lme_lat_class <- lapply(weight_options, function(x){
  try(update(lme_lat, weights = x))
})

# histogram residuals 
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_class)){
    try(hist(resid(lme_lat_class[[i]]),main = class(weight_options[[i]])[[1]]))
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_class)){
    try(plot(fitted(lme_lat_class[[i]]),resid(lme_lat_class[[i]]), 
             main = class(weight_options[[i]])[[1]],
             xlab="Fitted values",ylab = "Residuals"))
  }
}

# qqnorm plot
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_lat_class)){
    try(qqnorm(resid(lme_lat_class[[i]]), 
               main = class(weight_options[[i]])[[1]]))
    try(qqline(resid(lme_lat_class[[i]])))
  }
}

# AIC and R2 table
DT::datatable(aic_r2_table(lme_lat_class, sapply(weight_options, function(x) class(x)[[1]])))
# summary model
lapply(lme_lat_class, function(x) try(summary(x)))
## [[1]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   533.4977 563.6217 -257.7488
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5507015 0.6948875
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4412740 0.6092196 174 -4.007215  0.0001
## LogNtempUnits     -0.1940158 0.0628066 174 -3.089100  0.0023
## LogExtent          0.5055169 0.0826801 174  6.114133  0.0000
## ContinuousGrain    0.0976624 0.0487850 174  2.001892  0.0468
## QualityRAW         0.2046510 0.1291787 174  1.584248  0.1150
## QualityRESURVEYED -0.1702981 0.1441215 174 -1.181629  0.2390
## PrAboccurrence    -0.4059768 0.1427542 174 -2.843886  0.0050
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.050                                   
## LogExtent         -0.930 -0.071                            
## ContinuousGrain   -0.287 -0.288  0.129                     
## QualityRAW         0.064 -0.050 -0.216  0.068              
## QualityRESURVEYED -0.155  0.167  0.033  0.050  0.447       
## PrAboccurrence    -0.322  0.290  0.092  0.144  0.101  0.086
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.09908495 -0.60288880 -0.03025387  0.62938224  3.27542928 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[2]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   674.0736 704.1976 -328.0368
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept) Residual
## StdDev:   0.6027634 4.638077
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~1/NtaxaClass 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -1.4595751 0.5328458 174 -2.739207  0.0068
## LogNtempUnits     -0.2226465 0.0587588 174 -3.789164  0.0002
## LogExtent          0.4201687 0.0768569 174  5.466899  0.0000
## ContinuousGrain   -0.0287806 0.0490695 174 -0.586526  0.5583
## QualityRAW         0.1335398 0.1021747 174  1.306976  0.1929
## QualityRESURVEYED -0.2790245 0.1230153 174 -2.268209  0.0245
## PrAboccurrence    -0.6440590 0.1241472 174 -5.187868  0.0000
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits      0.335                                   
## LogExtent         -0.898 -0.499                            
## ContinuousGrain   -0.305 -0.158  0.098                     
## QualityRAW         0.291  0.347 -0.461  0.004              
## QualityRESURVEYED  0.209  0.355 -0.330 -0.168  0.521       
## PrAboccurrence    -0.270  0.171  0.028  0.134  0.105  0.177
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -4.84298648 -0.39866059  0.01288189  0.46365561  4.30215131 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[3]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   526.8818 560.3529 -253.4409
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5615717 0.9474142
## 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##     power 
## 0.1064028 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.2807461 0.5789205 174 -3.939654  0.0001
## LogNtempUnits     -0.1794017 0.0600957 174 -2.985266  0.0032
## LogExtent          0.4854499 0.0790291 174  6.142675  0.0000
## ContinuousGrain    0.0799024 0.0474494 174  1.683951  0.0940
## QualityRAW         0.1731711 0.1214523 174  1.425836  0.1557
## QualityRESURVEYED -0.1631199 0.1357614 174 -1.201519  0.2312
## PrAboccurrence    -0.4319845 0.1358949 174 -3.178813  0.0018
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits      0.019                                   
## LogExtent         -0.924 -0.152                            
## ContinuousGrain   -0.331 -0.254  0.164                     
## QualityRAW         0.111  0.004 -0.263  0.044              
## QualityRESURVEYED -0.082  0.196 -0.051  0.013  0.467       
## PrAboccurrence    -0.317  0.266  0.080  0.159  0.086  0.130
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -2.564678600 -0.661652047 -0.003470179  0.652144636  2.844564321 
## 
## Number of Observations: 217
## Number of Groups: 37 
## 
## [[4]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   520.6408 554.1118 -250.3204
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:    0.575876 0.5475617
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##    expon 
## 1.988932 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4040330 0.5691356 174 -4.224007  0.0000
## LogNtempUnits     -0.1523503 0.0578878 174 -2.631821  0.0093
## LogExtent          0.4870975 0.0771452 174  6.314036  0.0000
## ContinuousGrain    0.0933897 0.0457378 174  2.041849  0.0427
## QualityRAW         0.1476989 0.1194466 174  1.236527  0.2179
## QualityRESURVEYED -0.1408404 0.1326195 174 -1.061989  0.2897
## PrAboccurrence    -0.3454114 0.1314168 174 -2.628366  0.0093
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits      0.001                                   
## LogExtent         -0.925 -0.127                            
## ContinuousGrain   -0.346 -0.254  0.187                     
## QualityRAW         0.113 -0.020 -0.260  0.029              
## QualityRESURVEYED -0.095  0.195 -0.037  0.011  0.473       
## PrAboccurrence    -0.309  0.260  0.077  0.161  0.076  0.132
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -2.302899540 -0.639129017  0.003250849  0.648176530  2.759057956 
## 
## Number of Observations: 217
## Number of Groups: 37
##########################
# NtempUnits model
lme_lat_temp <-nlme::lme(
  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality + PrAb, 
  random = ~ 1 | Class,
  weights = varExp(form = ~1/LogNtempUnits),
  data = rs_SL_lat)

hist(resid(lme_lat_temp))
plot(predict(lme_lat_temp),resid(lme_lat_temp))
summary(lme_lat_temp)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   539.1006 569.2673 -260.5503
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5555131 0.7320209
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~1/LogNtempUnits 
##  Parameter estimates:
##       expon 
## -0.02724066 
## Fixed effects:  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality +      PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.5260382 0.6212676 175 -4.065942  0.0001
## LogExtent          0.4865162 0.0842332 175  5.775823  0.0000
## ContinuousGrain    0.0532925 0.0479117 175  1.112305  0.2675
## QualityRAW         0.1825462 0.1315774 175  1.387367  0.1671
## QualityRESURVEYED -0.0994335 0.1444599 175 -0.688312  0.4922
## PrAboccurrence    -0.2787999 0.1392715 175 -2.001845  0.0468
##  Correlation: 
##                   (Intr) LgExtn CntnsG QltRAW QRESUR
## LogExtent         -0.938                            
## ContinuousGrain   -0.316  0.113                     
## QualityRAW         0.063 -0.223  0.062              
## QualityRESURVEYED -0.148  0.044  0.107  0.464       
## PrAboccurrence    -0.325  0.121  0.245  0.121  0.041
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.49949576 -0.63584774 -0.07258622  0.66662585  3.30643264 
## 
## Number of Observations: 217
## Number of Groups: 37
aic_r2_table(list(lme_lat_temp))
##     Model   AIC R2_mar R2_cond
## 1 Model 1 539.1   0.19    0.48
qqnorm(resid(lme_lat_temp));qqline(resid(lme_lat_temp))

##########################
# NtaxaClass + NtempUnits model
lme_lat_class_temp <-nlme::lme(
  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality + PrAb, 
  random = ~ 1 | Class,
  weights = varComb(varExp(form = ~1/NtaxaClass),
                    varExp(form = ~1/LogNtempUnits)),
  data = rs_SL_lat)

hist(resid(lme_lat_class_temp))

plot(predict(lme_lat_class_temp),resid(lme_lat_class_temp))
summary(lme_lat_class_temp)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_lat 
##        AIC      BIC    logLik
##   523.5557 557.0743 -251.7778
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.5780469 0.5438538
## 
## Combination of variance functions: 
##  Structure: Exponential of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##    expon 
## 2.116788 
##  Structure: Exponential of variance covariate
##  Formula: ~1/LogNtempUnits 
##  Parameter estimates:
##      expon 
## 0.00786819 
## Fixed effects:  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality +      PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -2.4055438 0.5746180 175 -4.186336  0.0000
## LogExtent          0.4611313 0.0772520 175  5.969185  0.0000
## ContinuousGrain    0.0634887 0.0446834 175  1.420854  0.1571
## QualityRAW         0.1396562 0.1206498 175  1.157533  0.2486
## QualityRESURVEYED -0.0720077 0.1314635 175 -0.547739  0.5846
## PrAboccurrence    -0.2560108 0.1282094 175 -1.996817  0.0474
##  Correlation: 
##                   (Intr) LgExtn CntnsG QltRAW QRESUR
## LogExtent         -0.933                            
## ContinuousGrain   -0.359  0.163                     
## QualityRAW         0.115 -0.266  0.023              
## QualityRESURVEYED -0.094 -0.015  0.061  0.487       
## PrAboccurrence    -0.320  0.114  0.244  0.083  0.088
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.3035080 -0.6257098 -0.0166655  0.6831095  2.7082614 
## 
## Number of Observations: 217
## Number of Groups: 37
aic_r2_table(list(lme_lat_class_temp))
##     Model    AIC R2_mar R2_cond
## 1 Model 1 523.56   0.21    0.63
qqnorm(resid(lme_lat_class_temp));qqline(resid(lme_lat_class_temp))

All models have acceptable residual distribution.

The model with “1/variance” has lowest AIC than the model combining NtaxaClass and LogNtempUnits.
The “1/variance” model with a varExp function was the best based on AIC and residuals distribution.

9.1.6 Pick the model

# mod_lat <- lme_lat_var[[4]]
mod_lat <- tmb_lat_var

9.1.7 Plot model predictions

Predict shifts to all possible combinations of methodological variables. For this, we create a grid with all possible combination of methodological variables. Note that some combination of methodological variables do not exist in the real database. This grid can be useful for illustrative purposes: visualization of predicted shifts along levels of a given methodological variable.

new_data <- data.frame(
  expand_grid(
    LogNtempUnits = seq(from = min(rs_SL_lat$LogNtempUnits), 
                        to = max(rs_SL_lat$LogNtempUnits),  
                        length.out = 5),
    LogExtent = seq(from = min(rs_SL_lat$LogExtent), 
                    to = max(rs_SL_lat$LogExtent),  
                    length.out = 5),
    ContinuousGrain = unique(rs_SL_lat$ContinuousGrain),
    PrAb = unique(rs_SL_lat$PrAb),
    Quality = unique(rs_SL_lat$Quality),
    Class = NA,
    study_level_var_abs = NA))

pred_lat <- predict_fixed(mod_lat, new_data) # Use fixed effects only

new_data$pred_val <- pred_lat

## get mode of each predictor 
time_mode = median(rs_SL_lat$LogNtempUnits)
time_mode = new_data$LogNtempUnits[which.min(abs(new_data$LogNtempUnits-time_mode))]
mode_grain = getmode(rs_SL_lat$ContinuousGrain)
mode_prab = getmode(rs_SL_lat$PrAb)
mode_quality = getmode(rs_SL_lat$Quality)
area = median(rs_SL_lat$LogExtent)
area = new_data$LogExtent[which.min(abs(new_data$LogExtent-area))]

############################################
## plot predictions and raw data together:
## AREA
new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogNtempUnits == time_mode,
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogExtent, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_lat, aes(x = LogExtent, y = abs(SHIFT), colour = ID), 
             #I subseted only the studies used in the model
             size = 0.25, alpha = 0.75) + ## small circles are raw shifts within studies
  geom_point(data = rs_SL_lat, aes(x = LogExtent, y = study_level_shift_abs, fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + ## large circles are mean study-level shift per class
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogNtempUnits == time_mode,
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogExtent, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_SL_lat, aes(x = LogExtent, y = study_level_shift_abs, fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + ## large circles are mean study-level shift per class
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 + 
  geom_smooth(method = "lm",
              data = rs_SL_lat,
              aes(x = LogExtent, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

#############################################
# plot predictions and raw data together:
## N temporal units
new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogExtent == area, 
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogNtempUnits, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_lat, 
             aes(x = LogNtempUnits, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75) + 
  geom_point(data = rs_SL_lat, 
             aes(x = LogNtempUnits, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + #
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogExtent == area, 
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogNtempUnits, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_SL_lat, 
             aes(x = LogNtempUnits, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + 
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 + 
  geom_smooth(method = "lm",
              data = rs_SL_lat,
              aes(x = LogNtempUnits, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

############################################
## GRAIN 
new_data %>%
  filter(
    LogExtent == area, 
    LogNtempUnits == time_mode,
    PrAb == mode_prab,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = ContinuousGrain, y = pred_val)) +
  geom_line()+
  theme(legend.position = "none") +
  geom_point(data = rs_lat, 
             aes(x = ContinuousGrain, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  geom_point(data = rs_SL_lat, 
             aes(x = ContinuousGrain, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(
    LogExtent == area, 
    LogNtempUnits == time_mode,
    PrAb == mode_prab,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = ContinuousGrain, y = pred_val)) +
  geom_line()+
  theme(legend.position = "none") +
  geom_point(data = rs_SL_lat, 
             aes(x = ContinuousGrain, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 +
  geom_smooth(method = "lm",data = rs_SL_lat,
              aes(x = ContinuousGrain, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

############################################
## PRAB 
new_data %>%
  filter(
    LogExtent == area,
    LogNtempUnits == time_mode,
    ContinuousGrain == mode_grain,
    Quality == mode_quality
    
  ) %>%
  ggplot(aes(x = PrAb, y = pred_val)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  geom_point(data = rs_lat, 
             aes(x = PrAb, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, 
                                             jitter.height = 0)) +
  geom_point(data = rs_SL_lat, 
             aes(x = PrAb, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03,
                                             jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(
    LogExtent == area,
    LogNtempUnits == time_mode,
    ContinuousGrain == mode_grain,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = PrAb, y = pred_val)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  geom_point(data = rs_SL_lat, 
             aes(x = PrAb, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03,
                                             jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 +
  geom_boxplot(data=rs_SL_lat, 
               aes(x = PrAb, y = study_level_shift_abs), color = "red", alpha = .5)

9.2 Elevation

9.2.1 Explore data

rs_SL_ele <- read.csv("outputs/ELEData_model.csv")
rs_SL_ele$Class <- factor(rs_SL_ele$Class)

dim(rs_SL_ele)
## [1] 157  23
nlevels(factor(rs_SL_ele$Source))
## [1] 92
nlevels(factor(rs_SL_ele$Class))
## [1] 16
# 120 studies, 37 taxonomic groups 

hist(rs_SL_ele$study_level_shift_abs)

hist(rs_SL_ele$study_level_var)

hist(rs_SL_ele$LogExtent)

hist(rs_SL_ele$ContinuousGrain)

hist(rs_SL_ele$LogNtempUnits, breaks = 50)

ggplot(rs_SL_ele, aes(x=Quality)) + geom_bar()

ggplot(rs_SL_ele, aes(x=PrAb)) + geom_bar()

ggplot(rs_SL_ele, aes(y=Class)) + geom_bar()

ggplot(rs_SL_ele, aes(y=Sampling)) + geom_bar()

# we cannot use sampling and LogNtempUnits in the same model
corrplot(
  cor(rs_SL_ele[,c("study_level_shift_abs",
                   "study_level_var",
                   "NtempUnits",
                   "LogExtent",
                   "NtaxaClass")]), 
  method = "number",
  type = "lower",
  diag = FALSE)

plot(rs_SL_ele$study_level_shift_abs, 1/rs_SL_ele$study_level_var)

plot(log(rs_SL_ele$study_level_shift_abs), log(1/rs_SL_ele$study_level_var)) 

# study_level_var is not correlated with Extent
plot(log(1/study_level_var) ~ LogExtent, rs_SL_ele) 

cor.test(rs_SL_ele$LogExtent,1/rs_SL_ele$study_level_var) 
## 
##  Pearson's product-moment correlation
## 
## data:  rs_SL_ele$LogExtent and 1/rs_SL_ele$study_level_var
## t = -0.22586, df = 155, p-value = 0.8216
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1742818  0.1388939
## sample estimates:
##         cor 
## -0.01813887
# NtaxaClass is OK
plot(study_level_shift_abs ~ NtaxaClass, rs_SL_ele)

plot(log(study_level_shift_abs) ~ log(NtaxaClass), rs_SL_ele)

plot(log(1/study_level_var) ~ log(NtaxaClass), rs_SL_ele)

# LogNtempUnits is OK
plot(study_level_shift_abs ~ LogNtempUnits, rs_SL_ele)

plot(log(study_level_shift_abs) ~ log(LogNtempUnits), rs_SL_ele)

# we cannot use sampling and LogNtempUnits in the same model
boxplot(rs_SL_ele$LogNtempUnits~rs_SL_ele$Sampling) # do not fit models with sampling

# more correlation tests
boxplot(log(study_level_var) ~ ContinuousGrain, rs_SL_ele)

cor.test(rs_SL_ele$ContinuousGrain,log(rs_SL_ele$study_level_var))
## 
##  Pearson's product-moment correlation
## 
## data:  rs_SL_ele$ContinuousGrain and log(rs_SL_ele$study_level_var)
## t = 0.77415, df = 155, p-value = 0.44
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.0955050  0.2165941
## sample estimates:
##        cor 
## 0.06206156
boxplot(log(NtaxaClass) ~ ContinuousGrain, rs_SL_ele)

boxplot(LogExtent ~ ContinuousGrain, rs_SL_ele) 

9.2.2 Compare relationships within species-level and study-level data

9.2.2.1 Absolute values

# N temporal units
p1 <- ggplot(rs_ele, aes(x = NtempUnits, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = NtempUnits, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Extent
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = LogExtent, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Study area\n(Elevation extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = LogExtent, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Study area\n(Elevation extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Grain size
p1 <- ggplot(rs_ele, aes(x = ContinuousGrain, y = abs(SHIFT), size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = ContinuousGrain, y = study_level_shift_abs, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Sampling
p1 <- ggplot(rs_ele, aes(x = Sampling, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = Sampling, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Quality
p1 <- ggplot(rs_ele, aes(x = Quality, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = Quality, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# PrAb
p1 <- ggplot(rs_ele, aes(x = PrAb, y = abs(SHIFT), size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = PrAb, y = study_level_shift_abs, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

9.2.2.2 Raw values

# N temporal units
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = NtempUnits, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = NtempUnits, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Log N temporal units") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Extent
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = LogExtent, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = LogExtent, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Study area\n(Latitudinal extent km)") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Grain size
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = ContinuousGrain, y = SHIFT, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Species-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = ContinuousGrain, y = study_level_shift, size = NtaxaClass, fill = Class))+
  geom_point(shape = 21, colour = "black") + 
  ggtitle("Study-level data")+
  xlab("Grain size categories") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)

# Sampling
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = Sampling, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = Sampling, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Sampling") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

# Quality
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = Quality, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = Quality, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

# PrAb
p1 <- ggplot(rs_full %>%
               filter(Gradient == "ELE"), 
             aes(x = PrAb, y = SHIFT, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black",alpha=.2, aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Species-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

p2 <- ggplot(rs_SL_ele, aes(x = PrAb, y = study_level_shift, size = NtaxaClass))+
  geom_jitter(shape = 21, colour = "black", aes(fill = Class)) + 
  geom_boxplot(alpha=.5) + 
  ggtitle("Study-level data")+
  xlab("Quality") + ylab("Study level average shift\n(absolute km/year)")+
  theme_classic() + theme(legend.position = "none")

gridExtra::grid.arrange(p1,p2,nrow=1)
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Looking at the data it seems it makes more sense using absolute shift values than raw values.

9.2.3 Fit models

9.2.4 glmmTMB

9.2.4.1 Gamma vs Gaussian model and Random intercept vs random slope

# Gamma intercept
tmb_ele_gamma_int <- glmmTMB(
  study_level_shift_abs ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (1|Class), 
  family = Gamma(link = "log"),
  data = rs_SL_ele)

# Gamma slope
tmb_ele_gamma_slp <- glmmTMB(
  study_level_shift_abs ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (LogExtent|Class), 
  family = Gamma(link = "log"),
  data = rs_SL_ele)
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
## Warning in fitTMB(TMBStruc): Model convergence problem; singular convergence
## (7). See vignette('troubleshooting')
# Gaussian intercept
tmb_ele_gaussian_int <- glmmTMB(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (1|Class), 
  data = rs_SL_ele)

# Gaussian slope
tmb_ele_gaussian_slp <- glmmTMB(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb +
    (LogExtent|Class), 
  data = rs_SL_ele)
## Warning in fitTMB(TMBStruc): Model convergence problem; singular convergence
## (7). See vignette('troubleshooting')
###########
# Get results
tmb_list <- list(tmb_ele_gamma_int,
                 tmb_ele_gamma_slp,
                 tmb_ele_gaussian_int,
                 tmb_ele_gaussian_slp)

tmb_names <- c("Gamma intercept",
               "Gamma slope",
               "Gaussian intercept",
               "Gaussian slope")

# AIC and R2 table

DT::datatable(aic_r2_table(tmb_list, tmb_names))
# Residual distribution
for(i in 1:length(tmb_list)){
  try({
    plot(simulateResiduals(tmb_list[[i]]), main = tmb_names[i])
  })
}

Gaussian intercept is the best model according to AIC. This model also show good residual distribution.

9.2.4.2 Select weight structure

# The selected model
tmb_ele <- tmb_ele_gaussian_int
summary(tmb_ele)
##  Family: gaussian  ( identity )
## Formula:          
## log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +  
##     Quality + PrAb + (1 | Class)
## Data: rs_SL_ele
## 
##      AIC      BIC   logLik deviance df.resid 
##    352.5    380.0   -167.2    334.5      148 
## 
## Random effects:
## 
## Conditional model:
##  Groups   Name        Variance Std.Dev.
##  Class    (Intercept) 0.05742  0.2396  
##  Residual             0.46120  0.6791  
## Number of obs: 157, groups:  Class, 16
## 
## Dispersion estimate for gaussian family (sigma^2): 0.461 
## 
## Conditional model:
##                   Estimate Std. Error z value Pr(>|z|)   
## (Intercept)       -0.56387    0.66892  -0.843  0.39925   
## LogNtempUnits      0.28225    0.09980   2.828  0.00468 **
## LogExtent          0.17746    0.08239   2.154  0.03125 * 
## ContinuousGrain   -0.02780    0.12619  -0.220  0.82566   
## QualityRAW         0.16540    0.15320   1.080  0.28031   
## QualityRESURVEYED -0.14673    0.13368  -1.098  0.27236   
## PrAboccurrence    -0.21046    0.13835  -1.521  0.12821   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# NtaxaClass model
tmb_ele_class <- try(
  update(tmb_ele, weights = NtaxaClass))

# summary(tmb_ele_class)

# 1/study_level_var model
tmb_ele_var <- try(
  update(tmb_ele, weights = range01(log(1/study_level_var_abs)))) # precision

# NtempUnits model
tmb_ele_temp <- try(
  update(tmb_ele,
         .~ LogExtent + ContinuousGrain + Quality + PrAb + (1 | Class),
         weights = LogNtempUnits))

# NtaxaClass + NtempUnits model
tmb_ele_class_temp <- try(
  update(tmb_ele, 
         .~ LogExtent + ContinuousGrain + Quality + PrAb + (1 | Class),
         weights = log(NtaxaClass) * LogNtempUnits)) # two sources of variability (N species and sample size)


##############################################################

tmb_list <- list(tmb_ele,
                 tmb_ele_var,
                 tmb_ele_class,
                 tmb_ele_temp,
                 tmb_ele_class_temp)

tmb_names <- c("No weights",
               "1/variance",
               "NtaxaClass",
               "NtempUnits",
               "NtaxaClass + NtempUnits")

# AIC and R2 table
DT::datatable(aic_r2_table(tmb_list, tmb_names))
## Warning: Can't compute random effect variances. Some variance components equal
##   zero. Your model may suffer from singularity (see `?lme4::isSingular`
##   and `?performance::check_singularity`).
##   Solution: Respecify random structure! You may also decrease the
##   `tolerance` level to enforce the calculation of random effect variances.
## Random effect variances not available. Returned R2 does not account for random effects.
## Warning: Can't compute random effect variances. Some variance components equal
##   zero. Your model may suffer from singularity (see `?lme4::isSingular`
##   and `?performance::check_singularity`).
##   Solution: Respecify random structure! You may also decrease the
##   `tolerance` level to enforce the calculation of random effect variances.
## Random effect variances not available. Returned R2 does not account for random effects.
# Dharma residuals
for(i in 1:length(tmb_list)){
  try({
    plot(simulateResiduals(tmb_list[[i]]), main = tmb_names[i])
  })
}
## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

## Warning in getSimulations.glmmTMB(fittedModel, nsim = n, type = "normal", :
## Model was fit with prior weights. These will be ignored in the simulation. See
## ?getSimulations for details

# qqnorm plot
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    qqnorm(resid(tmb_list[[i]]), main = tmb_names[i])
    qqline(resid(tmb_list[[i]]))
  }
}

# histogram residuals 
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    hist(resid(tmb_list[[i]]), main = tmb_names[i], xlab = "Residual")
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,3))
  
  for(i in 1:length(tmb_list)){
    plot(fitted(tmb_list[[i]]),resid(tmb_list[[i]]), main = tmb_names[i],
         xlab="Fitted values",ylab = "Residuals")
  }
}

The model with the lowest AIC is the one with “1/variance”, followed by “NtempUnits”. However, residual distribution of the “1/variance” model is problematic. The model with “NtempUnits” has low AIC and good residual distribution.

9.2.5 nlme

nlme only fits Gaussian models. Therefore, here we test random intercept vs random slopes.

9.2.5.1 Random intercept vs random slope

lme_ele_gaussian_int <- try(lme(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb,
  random = ~1|Class,
  data = rs_SL_ele), silent = TRUE)

lme_ele_gaussian_slp <- try(lme(
  log(study_level_shift_abs) ~ 
    LogNtempUnits + LogExtent + ContinuousGrain + Quality + PrAb,
  random = ~LogExtent|Class,
  data = rs_SL_ele), silent = TRUE)

###########
# Get results
lme_list <- list(lme_ele_gaussian_int,
                 lme_ele_gaussian_slp)

lme_names <- c("Gaussian intercept",
               "Gaussian slope")

# AIC and R2 table
DT::datatable(aic_r2_table(lme_list, lme_names))
# Residual distribution
for(i in 1:length(lme_list)){
  try({
    plot(simulateResiduals(lme_list[[i]]), main = lme_names[i])
  }, silent = TRUE)
}
## Warning in checkModel(fittedModel): DHARMa: fittedModel not in class of
## supported models. Absolutely no guarantee that this will work!

## Warning in checkModel(fittedModel): DHARMa: fittedModel not in class of
## supported models. Absolutely no guarantee that this will work!

The model with a random intercept has the lowest AIC.

9.2.5.2 Select weight structure

Here we contrast 3 weighting parameters:
1) 1/study_level_var_abs
2) NtaxaClass
3) NtaxaClass + LogNtempUnits
For each of these parameters, we tested multiple variance functions

# The selected model
lme_ele <- lme_ele_gaussian_int
summary(lme_ele)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   370.1906 397.2863 -176.0953
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2618918 0.6927744
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.5665101 0.6833711 135 -0.8289933  0.4086
## LogNtempUnits      0.2818703 0.1018657 135  2.7670781  0.0065
## LogExtent          0.1776792 0.0841615 135  2.1111709  0.0366
## ContinuousGrain   -0.0283435 0.1289498 135 -0.2198022  0.8264
## QualityRAW         0.1627400 0.1557477 135  1.0448950  0.2979
## QualityRESURVEYED -0.1484327 0.1361030 135 -1.0905910  0.2774
## PrAboccurrence    -0.2086462 0.1408523 135 -1.4813122  0.1409
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.102                                   
## LogExtent         -0.923 -0.052                            
## ContinuousGrain   -0.064  0.112 -0.228                     
## QualityRAW        -0.318 -0.136  0.276  0.026              
## QualityRESURVEYED -0.299  0.068  0.158  0.157  0.463       
## PrAboccurrence    -0.187  0.029 -0.005  0.129 -0.111  0.069
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.79473828 -0.67384968 -0.02456627  0.65023332  2.87768510 
## 
## Number of Observations: 157
## Number of Groups: 16
##########################
# 1) 1/study_level_var_abs    
my_variable <- as.formula("~ study_level_var_abs")
weight_options <- list(varIdent(form = my_variable),
                       varFixed(my_variable),
                       varPower(form = my_variable),
                       varExp(form = my_variable))

# Fit
lme_ele_var <- lapply(weight_options, function(x){
  update(lme_ele, weights = x)
})

# histogram residuals 
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_var)){
    try(hist(resid(lme_ele_var[[i]]),main = class(weight_options[[i]])[[1]]))
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_var)){
    plot(fitted(lme_ele_var[[i]]),resid(lme_ele_var[[i]]), 
         main = class(weight_options[[i]])[[1]],
         xlab="Fitted values",ylab = "Residuals")
  }
}

# qqnorm plot
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_var)){
    qqnorm(resid(lme_ele_var[[i]]), 
           main = class(weight_options[[i]])[[1]])
    qqline(resid(lme_ele_var[[i]]))
  }
}

# AIC and R2 table
DT::datatable(aic_r2_table(lme_ele_var, sapply(weight_options, function(x) class(x)[[1]])))
# summary model
lapply(lme_ele_var, function(x) summary(x))
## [[1]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   370.1906 397.2863 -176.0953
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2618918 0.6927744
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.5665101 0.6833711 135 -0.8289933  0.4086
## LogNtempUnits      0.2818703 0.1018657 135  2.7670781  0.0065
## LogExtent          0.1776792 0.0841615 135  2.1111709  0.0366
## ContinuousGrain   -0.0283435 0.1289498 135 -0.2198022  0.8264
## QualityRAW         0.1627400 0.1557477 135  1.0448950  0.2979
## QualityRESURVEYED -0.1484327 0.1361030 135 -1.0905910  0.2774
## PrAboccurrence    -0.2086462 0.1408523 135 -1.4813122  0.1409
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.102                                   
## LogExtent         -0.923 -0.052                            
## ContinuousGrain   -0.064  0.112 -0.228                     
## QualityRAW        -0.318 -0.136  0.276  0.026              
## QualityRESURVEYED -0.299  0.068  0.158  0.157  0.463       
## PrAboccurrence    -0.187  0.029 -0.005  0.129 -0.111  0.069
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.79473828 -0.67384968 -0.02456627  0.65023332  2.87768510 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[2]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   557.6328 584.7286 -269.8164
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.3560825 0.5516225
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~study_level_var_abs 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -0.3095504 0.7821234 135 -0.395782  0.6929
## LogNtempUnits      0.5503917 0.2928678 135  1.879318  0.0624
## LogExtent         -0.0578956 0.1034053 135 -0.559890  0.5765
## ContinuousGrain    0.5843595 0.1745176 135  3.348428  0.0011
## QualityRAW         0.4482744 0.1741657 135  2.573838  0.0111
## QualityRESURVEYED -0.1142241 0.1121303 135 -1.018673  0.3102
## PrAboccurrence    -0.5114893 0.1891612 135 -2.703987  0.0077
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.272                                   
## LogExtent         -0.874 -0.032                            
## ContinuousGrain    0.046  0.066 -0.370                     
## QualityRAW        -0.164  0.035  0.128 -0.081              
## QualityRESURVEYED  0.216  0.033 -0.306  0.051  0.278       
## PrAboccurrence    -0.118  0.053 -0.148  0.227 -0.069  0.046
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -4.7575487  0.1934655  0.4876003  0.8908965  3.8365471 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[3]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   368.2133 398.3196 -174.1066
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2951556 0.7710153
## 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~study_level_var_abs 
##  Parameter estimates:
##       power 
## -0.07838948 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.4034978 0.6724225 135 -0.6000659  0.5495
## LogNtempUnits      0.2602537 0.0889548 135  2.9256861  0.0040
## LogExtent          0.1840592 0.0823818 135  2.2342228  0.0271
## ContinuousGrain   -0.0990278 0.1244354 135 -0.7958176  0.4275
## QualityRAW         0.1295775 0.1509162 135  0.8586055  0.3921
## QualityRESURVEYED -0.0751246 0.1326141 135 -0.5664907  0.5720
## PrAboccurrence    -0.1883007 0.1352009 135 -1.3927477  0.1660
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.072                                   
## LogExtent         -0.926 -0.066                            
## ContinuousGrain   -0.066  0.121 -0.217                     
## QualityRAW        -0.307 -0.194  0.273  0.014              
## QualityRESURVEYED -0.307  0.062  0.169  0.146  0.454       
## PrAboccurrence    -0.201  0.008  0.016  0.118 -0.095  0.106
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.1918563 -0.8099605 -0.1749449  0.5460862  3.2864242 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[4]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   362.0615 392.1679 -171.0308
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:     0.20329 0.6329828
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~study_level_var_abs 
##  Parameter estimates:
##       expon 
## 0.003861337 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.6385182 0.6464150 135 -0.9877836  0.3250
## LogNtempUnits      0.2722494 0.1066754 135  2.5521296  0.0118
## LogExtent          0.1676456 0.0792938 135  2.1142340  0.0363
## ContinuousGrain    0.0293767 0.1217038 135  0.2413784  0.8096
## QualityRAW         0.1790103 0.1484502 135  1.2058605  0.2300
## QualityRESURVEYED -0.1966395 0.1304901 135 -1.5069302  0.1342
## PrAboccurrence    -0.1786052 0.1355370 135 -1.3177591  0.1898
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.142                                   
## LogExtent         -0.921 -0.026                            
## ContinuousGrain   -0.071  0.102 -0.226                     
## QualityRAW        -0.336 -0.072  0.284  0.039              
## QualityRESURVEYED -0.304  0.076  0.164  0.161  0.468       
## PrAboccurrence    -0.188  0.050 -0.009  0.134 -0.120  0.041
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.90472019 -0.62135350  0.05226188  0.71693426  2.39625091 
## 
## Number of Observations: 157
## Number of Groups: 16
##########################
# 2) NtaxaClass   
my_variable <- as.formula("~ 1/NtaxaClass")
weight_options <- list(varIdent(form = my_variable),
                       varFixed(my_variable),
                       varPower(form = my_variable),
                       varExp(form = my_variable))

# Fit
lme_ele_class <- lapply(weight_options, function(x){
  try(update(lme_ele, weights = x))
})

# histogram residuals 
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_class)){
    try(hist(resid(lme_ele_class[[i]]),main = class(weight_options[[i]])[[1]]))
  }
}

# residuals vs fitted
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_class)){
    try(plot(fitted(lme_ele_class[[i]]),resid(lme_ele_class[[i]]), 
             main = class(weight_options[[i]])[[1]],
             xlab="Fitted values",ylab = "Residuals"))
  }
}

# qqnorm plot
{
  par(mfrow=c(2,2))
  
  for(i in 1:length(lme_ele_class)){
    try(qqnorm(resid(lme_ele_class[[i]]), 
               main = class(weight_options[[i]])[[1]]))
    try(qqline(resid(lme_ele_class[[i]])))
  }
}

# AIC and R2 table
DT::datatable(aic_r2_table(lme_ele_class, sapply(weight_options, function(x) class(x)[[1]])))
# summary model
lapply(lme_ele_class, function(x) try(summary(x)))
## [[1]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   370.1906 397.2863 -176.0953
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2618918 0.6927744
## 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.5665101 0.6833711 135 -0.8289933  0.4086
## LogNtempUnits      0.2818703 0.1018657 135  2.7670781  0.0065
## LogExtent          0.1776792 0.0841615 135  2.1111709  0.0366
## ContinuousGrain   -0.0283435 0.1289498 135 -0.2198022  0.8264
## QualityRAW         0.1627400 0.1557477 135  1.0448950  0.2979
## QualityRESURVEYED -0.1484327 0.1361030 135 -1.0905910  0.2774
## PrAboccurrence    -0.2086462 0.1408523 135 -1.4813122  0.1409
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.102                                   
## LogExtent         -0.923 -0.052                            
## ContinuousGrain   -0.064  0.112 -0.228                     
## QualityRAW        -0.318 -0.136  0.276  0.026              
## QualityRESURVEYED -0.299  0.068  0.158  0.157  0.463       
## PrAboccurrence    -0.187  0.029 -0.005  0.129 -0.111  0.069
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.79473828 -0.67384968 -0.02456627  0.65023332  2.87768510 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[2]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   433.1665 460.2622 -207.5832
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept) Residual
## StdDev:   0.2512939 4.110173
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~1/NtaxaClass 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -0.2283451 0.7143937 135 -0.319635  0.7497
## LogNtempUnits      0.4538880 0.0621407 135  7.304194  0.0000
## LogExtent          0.1432886 0.0852335 135  1.681131  0.0950
## ContinuousGrain   -0.0996393 0.0877098 135 -1.136011  0.2580
## QualityRAW        -0.0220492 0.1455722 135 -0.151466  0.8798
## QualityRESURVEYED -0.1039861 0.1038899 135 -1.000925  0.3187
## PrAboccurrence    -0.2168552 0.0984883 135 -2.201836  0.0294
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.077                                   
## LogExtent         -0.956 -0.025                            
## ContinuousGrain   -0.124  0.206 -0.122                     
## QualityRAW        -0.400 -0.471  0.431 -0.012              
## QualityRESURVEYED -0.263 -0.015  0.198  0.207  0.445       
## PrAboccurrence    -0.177  0.069  0.026  0.423 -0.320 -0.183
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -3.9586319 -0.5169851 -0.0447035  0.4280858  2.7716126 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[3]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC    BIC    logLik
##   368.7637 398.87 -174.3818
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2610468 0.9109973
## 
## Variance function:
##  Structure: Power of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##      power 
## 0.08970922 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF   t-value p-value
## (Intercept)       -0.3784942 0.6925878 135 -0.546493  0.5856
## LogNtempUnits      0.3032638 0.0954944 135  3.175724  0.0019
## LogExtent          0.1603629 0.0847878 135  1.891344  0.0607
## ContinuousGrain   -0.0489349 0.1193912 135 -0.409870  0.6826
## QualityRAW         0.1033126 0.1547291 135  0.667700  0.5055
## QualityRESURVEYED -0.1377403 0.1319591 135 -1.043811  0.2984
## PrAboccurrence    -0.2318803 0.1341852 135 -1.728062  0.0863
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.113                                   
## LogExtent         -0.931 -0.030                            
## ContinuousGrain   -0.057  0.112 -0.220                     
## QualityRAW        -0.320 -0.163  0.286  0.026              
## QualityRESURVEYED -0.302  0.069  0.170  0.165  0.463       
## PrAboccurrence    -0.186  0.047  0.003  0.149 -0.128  0.055
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.88534287 -0.63003323 -0.05091425  0.65444661  2.92470286 
## 
## Number of Observations: 157
## Number of Groups: 16 
## 
## [[4]]
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC     BIC    logLik
##   370.8647 400.971 -175.4323
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:    0.267146 0.6468247
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##     expon 
## 0.6999818 
## Fixed effects:  log(study_level_shift_abs) ~ LogNtempUnits + LogExtent + ContinuousGrain +      Quality + PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.3926583 0.6884726 135 -0.5703324  0.5694
## LogNtempUnits      0.2856459 0.1011927 135  2.8227929  0.0055
## LogExtent          0.1622643 0.0845785 135  1.9185068  0.0572
## ContinuousGrain   -0.0411185 0.1253765 135 -0.3279599  0.7435
## QualityRAW         0.1091734 0.1560608 135  0.6995567  0.4854
## QualityRESURVEYED -0.1512619 0.1341335 135 -1.1276966  0.2614
## PrAboccurrence    -0.2266836 0.1384148 135 -1.6377120  0.1038
##  Correlation: 
##                   (Intr) LgNtmU LgExtn CntnsG QltRAW QRESUR
## LogNtempUnits     -0.119                                   
## LogExtent         -0.926 -0.033                            
## ContinuousGrain   -0.058  0.109 -0.227                     
## QualityRAW        -0.310 -0.136  0.269  0.027              
## QualityRESURVEYED -0.297  0.071  0.157  0.159  0.464       
## PrAboccurrence    -0.183  0.042 -0.005  0.127 -0.114  0.068
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.92188020 -0.64012774 -0.04210522  0.69059415  2.95213169 
## 
## Number of Observations: 157
## Number of Groups: 16
##########################
# NtempUnits model
lme_ele_temp <-nlme::lme(
  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality + PrAb, 
  random = ~ 1 | Class,
  weights = varExp(form = ~1/LogNtempUnits),
  data = rs_SL_ele)

hist(resid(lme_ele_temp))
plot(predict(lme_ele_temp),resid(lme_ele_temp))
summary(lme_ele_temp)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   374.9583 402.1138 -178.4791
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2673134 0.7387208
## 
## Variance function:
##  Structure: Exponential of variance covariate
##  Formula: ~1/LogNtempUnits 
##  Parameter estimates:
##       expon 
## -0.03110809 
## Fixed effects:  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality +      PrAb 
##                        Value Std.Error  DF    t-value p-value
## (Intercept)       -0.3629888 0.6937616 136 -0.5232183  0.6017
## LogExtent          0.1879246 0.0857591 136  2.1913075  0.0301
## ContinuousGrain   -0.0653682 0.1307824 136 -0.4998245  0.6180
## QualityRAW         0.2139081 0.1577765 136  1.3557661  0.1774
## QualityRESURVEYED -0.1735447 0.1386184 136 -1.2519603  0.2127
## PrAboccurrence    -0.2215600 0.1437848 136 -1.5409137  0.1257
##  Correlation: 
##                   (Intr) LgExtn CntnsG QltRAW QRESUR
## LogExtent         -0.934                            
## ContinuousGrain   -0.053 -0.225                     
## QualityRAW        -0.338  0.273  0.040              
## QualityRESURVEYED -0.295  0.162  0.151  0.477       
## PrAboccurrence    -0.186 -0.002  0.125 -0.107  0.067
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.75233236 -0.63702510 -0.01948146  0.66653593  2.80292465 
## 
## Number of Observations: 157
## Number of Groups: 16
aic_r2_table(list(lme_ele_temp))
##     Model    AIC R2_mar R2_cond
## 1 Model 1 374.96   0.06    0.17
qqnorm(resid(lme_ele_temp));qqline(resid(lme_ele_temp))

##########################
# NtaxaClass + NtempUnits model
lme_ele_class_temp <-nlme::lme(
  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality + PrAb, 
  random = ~ 1 | Class,
  weights = varComb(varExp(form = ~1/NtaxaClass),
                    varExp(form = ~1/LogNtempUnits)),
  data = rs_SL_ele)

hist(resid(lme_ele_class_temp))

plot(predict(lme_ele_class_temp),resid(lme_ele_class_temp))
summary(lme_ele_class_temp)
## Linear mixed-effects model fit by REML
##   Data: rs_SL_ele 
##        AIC      BIC    logLik
##   375.7758 405.9486 -177.8879
## 
## Random effects:
##  Formula: ~1 | Class
##         (Intercept)  Residual
## StdDev:   0.2690567 0.7515249
## 
## Combination of variance functions: 
##  Structure: Exponential of variance covariate
##  Formula: ~1/NtaxaClass 
##  Parameter estimates:
##     expon 
## 0.6759289 
##  Structure: Exponential of variance covariate
##  Formula: ~1/LogNtempUnits 
##  Parameter estimates:
##       expon 
## -0.09129755 
## Fixed effects:  log(study_level_shift_abs) ~ LogExtent + ContinuousGrain + Quality +      PrAb 
##                         Value Std.Error  DF    t-value p-value
## (Intercept)       -0.13918651 0.6961218 136 -0.1999456  0.8418
## LogExtent          0.16595178 0.0860363 136  1.9288576  0.0558
## ContinuousGrain   -0.07100645 0.1271129 136 -0.5586096  0.5773
## QualityRAW         0.14789586 0.1584144 136  0.9336013  0.3522
## QualityRESURVEYED -0.17644655 0.1365301 136 -1.2923639  0.1984
## PrAboccurrence    -0.24786636 0.1412801 136 -1.7544327  0.0816
##  Correlation: 
##                   (Intr) LgExtn CntnsG QltRAW QRESUR
## LogExtent         -0.937                            
## ContinuousGrain   -0.045 -0.225                     
## QualityRAW        -0.334  0.270  0.040              
## QualityRESURVEYED -0.292  0.161  0.153  0.479       
## PrAboccurrence    -0.183  0.000  0.119 -0.106  0.063
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.87715955 -0.64359053 -0.04739503  0.67808785  2.89301266 
## 
## Number of Observations: 157
## Number of Groups: 16
aic_r2_table(list(lme_ele_class_temp))
##     Model    AIC R2_mar R2_cond
## 1 Model 1 375.78   0.05    0.16
qqnorm(resid(lme_ele_class_temp));qqline(resid(lme_ele_class_temp))

All models have acceptable residual distribution.

The model with “1/variance” has lowest AIC than the model combining NtaxaClass and LogNtempUnits.
The “1/variance” model with a varExp function was the best based on AIC and residuals distribution.

9.2.6 Pick the model

# mod_ele <- lme_ele_var[[4]]
mod_ele <- tmb_ele_var

9.2.7 Plot model predictions

Predict shifts to all possible combinations of methodological variables. For this, we create a grid with all possible combination of methodological variables. Note that some combination of methodological variables do not exist in the real database. This grid can be useful for illustrative purposes: visualization of predicted shifts along levels of a given methodological variable.

new_data <- data.frame(
  expand_grid(
    LogNtempUnits = seq(from = min(rs_SL_ele$LogNtempUnits), 
                        to = max(rs_SL_ele$LogNtempUnits),  
                        length.out = 5),
    LogExtent = seq(from = min(rs_SL_ele$LogExtent), 
                    to = max(rs_SL_ele$LogExtent),  
                    length.out = 5),
    ContinuousGrain = unique(rs_SL_ele$ContinuousGrain),
    PrAb = unique(rs_SL_ele$PrAb),
    Quality = unique(rs_SL_ele$Quality),
    Class = NA,
    study_level_var_abs = NA))

pred_ele <- predict_fixed(mod_ele, new_data) # Use fixed effects only

new_data$pred_val <- pred_ele

## get mode of each predictor 
time_mode = median(rs_SL_ele$LogNtempUnits)
time_mode = new_data$LogNtempUnits[which.min(abs(new_data$LogNtempUnits-time_mode))]
mode_grain = getmode(rs_SL_ele$ContinuousGrain)
mode_prab = getmode(rs_SL_ele$PrAb)
mode_quality = getmode(rs_SL_ele$Quality)
area = median(rs_SL_ele$LogExtent)
area = new_data$LogExtent[which.min(abs(new_data$LogExtent-area))]


############################################
## plot predictions and raw data together:
## AREA
new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogNtempUnits == time_mode,
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogExtent, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_ele, aes(x = LogExtent, y = abs(SHIFT), colour = ID), 
             #I subseted only the studies used in the model
             size = 0.25, alpha = 0.75) + ## small circles are raw shifts within studies
  geom_point(data = rs_SL_ele, aes(x = LogExtent, y = study_level_shift_abs, fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + ## large circles are mean study-level shift per class
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogNtempUnits == time_mode,
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogExtent, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_SL_ele, aes(x = LogExtent, y = study_level_shift_abs, fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + ## large circles are mean study-level shift per class
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 + 
  geom_smooth(method = "lm",
              data = rs_SL_ele,
              aes(x = LogExtent, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

#############################################
# plot predictions and raw data together:
## N temporal units
new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogExtent == area, 
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogNtempUnits, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_ele, 
             aes(x = LogNtempUnits, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75) + 
  geom_point(data = rs_SL_ele, 
             aes(x = LogNtempUnits, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + #
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(PrAb == mode_prab, ## plot predictions for most common levels
         LogExtent == area, 
         ContinuousGrain == mode_grain,
         Quality == mode_quality
  ) %>%
  ggplot(aes(x = LogNtempUnits, y = pred_val)) +
  geom_line() +
  geom_point(data = rs_SL_ele, 
             aes(x = LogNtempUnits, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75) + 
  theme(legend.position = "none") +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 + 
  geom_smooth(method = "lm",
              data = rs_SL_ele,
              aes(x = LogNtempUnits, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

############################################
## GRAIN 
new_data %>%
  filter(
    LogExtent == area, 
    LogNtempUnits == time_mode,
    PrAb == mode_prab,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = ContinuousGrain, y = pred_val)) +
  geom_line()+
  theme(legend.position = "none") +
  geom_point(data = rs_ele, 
             aes(x = ContinuousGrain, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  geom_point(data = rs_SL_ele, 
             aes(x = ContinuousGrain, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(
    LogExtent == area, 
    LogNtempUnits == time_mode,
    PrAb == mode_prab,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = ContinuousGrain, y = pred_val)) +
  geom_line()+
  theme(legend.position = "none") +
  geom_point(data = rs_SL_ele, 
             aes(x = ContinuousGrain, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 +
  geom_smooth(method = "lm",data = rs_SL_ele,
              aes(x = ContinuousGrain, y = study_level_shift_abs),
              color = "darkgray",se = FALSE, linetype = "dashed") 

############################################
## PRAB 
new_data %>%
  filter(
    LogExtent == area,
    LogNtempUnits == time_mode,
    ContinuousGrain == mode_grain,
    Quality == mode_quality
    
  ) %>%
  ggplot(aes(x = PrAb, y = pred_val)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  geom_point(data = rs_ele, 
             aes(x = PrAb, y = abs(SHIFT), colour = ID), 
             size = 0.25, alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03, 
                                             jitter.height = 0)) +
  geom_point(data = rs_SL_ele, 
             aes(x = PrAb, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03,
                                             jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

# Only SL
p1 <- new_data %>%
  filter(
    LogExtent == area,
    LogNtempUnits == time_mode,
    ContinuousGrain == mode_grain,
    Quality == mode_quality
  ) %>%
  ggplot(aes(x = PrAb, y = pred_val)) +
  geom_boxplot() +
  theme(legend.position = "none") +
  geom_point(data = rs_SL_ele, 
             aes(x = PrAb, y = study_level_shift_abs, 
                 fill = ID, size = log(NtaxaClass)),
             shape = 21, colour = "black", alpha = 0.75,
             position = position_jitterdodge(jitter.width = 0.03,
                                             jitter.height = 0)) +
  labs(y = "Mean range shift rate (class within study)")

p1

## Contrast with the expected linear relationship 
p1 +
  geom_boxplot(data=rs_SL_ele, 
               aes(x = PrAb, y = study_level_shift_abs), color = "red", alpha = .5)

10 Applying correction to range shifts

Next step is to calculate the predicted study-level shift for each study, holding methods constant across all studies.

We want to leave variation due to class in, so we will ignore the random effect predict mean study level range shift per class, holding all methodological variables at their median/mode value.

We apply correction to species-level shifts based on the estimated study-level effect of methods using two approaches: Delta and ratio. The delta difference consists on the difference between the predicted shift at the study-level and the predicted shift under ideal method conditions. In contrast, the 𝚫Ratio is the ratio between the predicted study-level shift and predicted shift under ideal method conditions.

The ideal method was defined as the 95% quantile of Log N temporal units, Log extent of the study area (using latitudinal extent for latitudinal shifts or elevation extent for elevation shifts), grain = “small”, PrAb = “abundance”, and Quality = “resurveyed”. We these represent a set of methodological conditions that would (at least in theory) generate accurate range shift measurements.

var_meth_names <- c("ContinuousGrain","LogNtempUnits","LogExtent","PrAb","Quality")

#############
## 1) Predict to the full dataset study level shift
pred_rs_SL_lat <- predict_fixed(mod_lat, 
                                mutate(rs_SL_full_lat,
                                       study_level_var_abs = NA,
                                       Class = NA)) # using fixed effects only

rs_SL_full_lat$Pred_SL <- pred_rs_SL_lat

pred_rs_SL_ele <- predict_fixed(mod_ele, 
                                mutate(rs_SL_full_ele,
                                       study_level_var_abs = NA,
                                       Class = NA)) # using fixed effects only

rs_SL_full_ele$Pred_SL <- pred_rs_SL_ele

# Any predicted negative shift
# We do not expect models to predict negative shifts as we fit models using a Gamma distribution
any(exp(pred_rs_SL_lat) < 0)
## [1] FALSE
any(exp(pred_rs_SL_ele) < 0)
## [1] FALSE
#############
## 2) filter predictions on the ideal data

## 2.1) Latitude
# Option 1)
# Select the set of condition that (in theory) would provide a good shift estimate
# e.g., large grain, more time periods, abundance data, resurveyed...
# Option 2) 
# Select the most common set of conditions for each single method variable
# Option 3) 
# Select the most common set of conditions in the database == higher occupancy in a multidimensional space built with methodological variables

# Option 1)
ideal_data_lat1 <- data.frame(ContinuousGrain = quantile(rs_SL_lat$ContinuousGrain, 0.05),
                              LogNtempUnits = quantile(rs_SL_lat$LogNtempUnits, 0.95), 
                              LogExtent = quantile(rs_SL_lat$LogExtent, 0.95),
                              PrAb = "abundance",
                              Quality = "RESURVEYED")

ideal_data_lat1
##    ContinuousGrain LogNtempUnits LogExtent      PrAb    Quality
## 5%               1      3.621447  8.203747 abundance RESURVEYED
# Option 2)
ideal_data_lat2 <- rs_SL_lat[,var_meth_names] %>% unique()
ideal_data_lat2 <- apply(ideal_data_lat2, 2, getmode)
ideal_data_lat2 <- data.frame(matrix(ncol = length(ideal_data_lat2),data = ideal_data_lat2))
names(ideal_data_lat2) <- var_meth_names  
ideal_data_lat2[,1:3] <- as.numeric(ideal_data_lat2[,1:3])

ideal_data_lat2
##   ContinuousGrain LogNtempUnits LogExtent       PrAb Quality
## 1               1     0.6931472  6.827608 occurrence     RAW
# Option 3)
# Gower's distance
lat_meth_dis <- rs_SL_lat %>% select(var_meth_names) %>% data.frame %>% gowdis
# PCoA
rs_pcoa <- dudi.pco(lat_meth_dis, scannf = FALSE, nf = 3)
# Desity
rs_dens <- pointdensity(rs_pcoa$li, eps = .1, type = "density")
ideal_data_lat3 <- rs_SL_lat[which(rs_dens == max(rs_dens))[1],var_meth_names]
ideal_data_lat3
##     ContinuousGrain LogNtempUnits LogExtent       PrAb  Quality
## 145               1      1.386294  7.070911 occurrence BALANCED
# Check whether the selected method conditions exists in the multidimensional method space
# create a methods table
lat_meth <- data.frame(rs_SL_lat[,var_meth_names])

# add ideal methodo
lat_meth <- rbind(data.frame(lat_meth, Reference = "Data"),
                  data.frame(ideal_data_lat1, Reference = "Ideal 1"),
                  data.frame(ideal_data_lat2, Reference = "Ideal 2"),
                  data.frame(ideal_data_lat3, Reference = "Ideal 3"))
ref_data <- 1:length(which(lat_meth$Reference=="Data"))

# Gower's distance
lat_meth_dis <- gowdis(lat_meth[,var_meth_names])
# PCoA
rs_pcoa <- dudi.pco(lat_meth_dis, scannf = FALSE, nf = 3)

plot_ly() %>%
  add_trace(data = rs_pcoa$li[ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type = "scatter3d", mode = "markers",
            marker = list(size = 5, opacity = .5)) %>%
  add_trace(data = rs_pcoa$li[ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type="mesh3d", opacity=.2, alphahull=0) %>%
  add_trace(data = rs_pcoa$li[-ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type = "scatter3d", mode = "markers",
            color = lat_meth$Reference[-ref_data],
            marker = list(size = 20, opacity = .5)) %>%
  layout(title = "Latitude")
## 2.2) Elevation
# Option 1)
# Select the set of condition that (in theory) would provide a good shift estimate
# e.g., large grain, more time periods, abundance data, resurveyed...
# Option 2) 
# Select the most common set of conditions in the database
# Option 3) 
# Select the set of conditions that generates an average predicted value

# Option 1)
ideal_data_ele1 <- data.frame(ContinuousGrain = quantile(rs_SL_ele$ContinuousGrain, 0.05),
                              LogNtempUnits = quantile(rs_SL_ele$LogNtempUnits, 0.95), 
                              LogExtent = quantile(rs_SL_ele$LogExtent, 0.95),
                              PrAb = "abundance",
                              Quality = "RESURVEYED")

ideal_data_ele1
##    ContinuousGrain LogNtempUnits LogExtent      PrAb    Quality
## 5%               1      2.542443  8.554775 abundance RESURVEYED
# Option 2)
ideal_data_ele2 <- rs_SL_ele[,var_meth_names] %>% unique()
ideal_data_ele2 <- apply(ideal_data_ele2, 2, getmode)
ideal_data_ele2 <- data.frame(matrix(ncol = length(ideal_data_ele2),data = ideal_data_ele2))
names(ideal_data_ele2) <- var_meth_names  
ideal_data_ele2[,1:3] <- as.numeric(ideal_data_ele2[,1:3])

ideal_data_ele2
##   ContinuousGrain LogNtempUnits LogExtent       PrAb  Quality
## 1               1     0.6931472  8.452354 occurrence BALANCED
# Option 3)
# Gower's distance
ele_meth_dis <- rs_SL_ele %>% select(var_meth_names) %>% data.frame %>% gowdis
# PCoA
rs_pcoa <- dudi.pco(ele_meth_dis, scannf = FALSE, nf = 3)
# Desity
rs_dens <- pointdensity(rs_pcoa$li, eps = .1, type = "density")
ideal_data_ele3 <- rs_SL_ele[which(rs_dens == max(rs_dens))[1],var_meth_names]
ideal_data_ele3
##    ContinuousGrain LogNtempUnits LogExtent       PrAb  Quality
## 58               1      2.484907  7.498726 occurrence BALANCED
# Check whether the selected method conditions exists in the multidimensional method space
# create a methods table
ele_meth <- data.frame(rs_SL_ele[,var_meth_names])

# add ideal methodo
ele_meth <- rbind(data.frame(ele_meth, Reference = "Data"),
                  data.frame(ideal_data_ele1, Reference = "Ideal 1"),
                  data.frame(ideal_data_ele2, Reference = "Ideal 2"),
                  data.frame(ideal_data_ele3, Reference = "Ideal 3"))
ref_data <- 1:length(which(ele_meth$Reference=="Data"))

# Gower's distance
ele_meth_dis <- gowdis(ele_meth[,var_meth_names])
# PCoA
rs_pcoa <- dudi.pco(ele_meth_dis, scannf = FALSE, nf = 3)

plot_ly() %>%
  add_trace(data = rs_pcoa$li[ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type = "scatter3d", mode = "markers",
            marker = list(size = 5, opacity = .5)) %>%
  add_trace(data = rs_pcoa$li[ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type="mesh3d", opacity=.2, alphahull=0) %>%
  add_trace(data = rs_pcoa$li[-ref_data,], x = ~A1, y = ~A2, z = ~A3,
            type = "scatter3d", mode = "markers",
            color = ele_meth$Reference[-ref_data],
            marker = list(size = 20, opacity = .5)) %>%
  layout(title = "Elevation")
#############
## 3) Adding the predicted ideal shift 
## Latitude
Pred_ideal1 = predict_fixed(mod_lat,
                            mutate(ideal_data_lat1,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only
Pred_ideal2 = predict_fixed(mod_lat,
                            mutate(ideal_data_lat2,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only
Pred_ideal3 = predict_fixed(mod_lat,
                            mutate(ideal_data_lat3,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only

rs_SL_full_lat  <- mutate(rs_SL_full_lat,
                          Type = "LAT",
                          Pred_ideal1 = Pred_ideal1,
                          Pred_ideal2 = Pred_ideal2,
                          Pred_ideal3 = Pred_ideal3)

## Elevation
Pred_ideal1 = predict_fixed(mod_ele,
                            mutate(ideal_data_ele1,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only
Pred_ideal2 = predict_fixed(mod_ele,
                            mutate(ideal_data_ele2,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only
Pred_ideal3 = predict_fixed(mod_ele,
                            mutate(ideal_data_ele3,
                                   study_level_var_abs = NA,
                                   Class = NA)) # using fixed effects only

rs_SL_full_ele  <- mutate(rs_SL_full_ele,
                          Type = "ELE",
                          Pred_ideal1 = Pred_ideal1,
                          Pred_ideal2 = Pred_ideal2,
                          Pred_ideal3 = Pred_ideal3)

## bind ele and lat corr shift data 
corr_rs <- rbind(rs_SL_full_lat,
                 rs_SL_full_ele) 

# Inverse of log of predictions
corr_rs$Pred_ideal1 <- exp(corr_rs$Pred_ideal1)
corr_rs$Pred_ideal2 <- exp(corr_rs$Pred_ideal2)
corr_rs$Pred_ideal3 <- exp(corr_rs$Pred_ideal3)
corr_rs$Pred_SL <- exp(corr_rs$Pred_SL)

## Add variables
corr_rs$Article_ID <- sapply(corr_rs$ID, function(x) {
  strsplit(x,"_")[[1]][1]
})
corr_rs$Study_ID <- sapply(corr_rs$ID, function(x) {
  strsplit(x,"_")[[1]][2]
})
corr_rs$Class <- sapply(corr_rs$ID, function(x) {
  strsplit(x,"_")[[1]][4]
})

## 4) calculate deltaSHIFT 
## 3.1) difference between predicted study-level shift and predicted reference study-level shift 
{
  par(mfrow = c(1,3))
  corr_rs$SLDiff1 <- corr_rs$Pred_SL - corr_rs$Pred_ideal1
  hist(corr_rs$SLDiff1,main = "",xlab="SLDiff1")
  corr_rs$SLDiff2 <- corr_rs$Pred_SL - corr_rs$Pred_ideal2
  hist(corr_rs$SLDiff2,main = "",xlab="SLDiff2")
  corr_rs$SLDiff3 <- corr_rs$Pred_SL - corr_rs$Pred_ideal3
  hist(corr_rs$SLDiff3,main = "",xlab="SLDiff3")
}

## 4.2) ratio between predicted study-level shift and predicted reference study-level shift 
{
  par(mfrow = c(1,3))
  corr_rs$SLRatio1 <- corr_rs$Pred_SL / corr_rs$Pred_ideal1
  hist(corr_rs$SLRatio1,main = "",xlab="SLRatio1")
  corr_rs$SLRatio2 <- corr_rs$Pred_SL / corr_rs$Pred_ideal2
  hist(corr_rs$SLRatio2,main = "",xlab="SLRatio2")
  corr_rs$SLRatio3 <- corr_rs$Pred_SL / corr_rs$Pred_ideal3
  hist(corr_rs$SLRatio3,main = "",xlab="SLRatio3")
}

## 5) calculate corrected range shift
rs <- merge(rs_full, 
            corr_rs %>%
              select(c("Article_ID","Study_ID","Class","Type","ID",
                       names(corr_rs)[!names(corr_rs) %in% names(rs_full)])), 
            by = c("Article_ID","Study_ID","Class","Type","ID"),
            all.x = TRUE)

rs$CorrShiftDiff1 <- abs(rs$SHIFT) - rs$SLDiff1
rs$CorrShiftDiff2 <- abs(rs$SHIFT) - rs$SLDiff2
rs$CorrShiftDiff3 <- abs(rs$SHIFT) - rs$SLDiff3

rs$CorrShiftRatio1 <- rs$SHIFT * rs$SLRatio1
rs$CorrShiftRatio2 <- rs$SHIFT * rs$SLRatio2
rs$CorrShiftRatio3 <- rs$SHIFT * rs$SLRatio3


## Look at CorrShiftDiff

# Corrected vs abs shift
p1 <- ggplot(rs, aes(x = abs(SHIFT), y = CorrShiftDiff1, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p2 <- ggplot(rs, aes(x = abs(SHIFT), y = CorrShiftDiff2, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p3 <- ggplot(rs, aes(x = abs(SHIFT), y = CorrShiftDiff3, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

ggarrange(p1,p2,p3,ncol=3, nrow=1, common.legend = TRUE, legend="bottom")

# change sign for negative shifts
rs$neg_shifts <- ifelse(sign(rs$SHIFT) == -1 & rs$SHIFT != 0, 1, 0)

rs$CorrShiftDiff1[which(rs$neg_shifts == 1)] <- rs$CorrShiftDiff1[which(rs$neg_shifts == 1)] * -1
rs$CorrShiftDiff2[which(rs$neg_shifts == 1)] <- rs$CorrShiftDiff2[which(rs$neg_shifts == 1)] * -1
rs$CorrShiftDiff3[which(rs$neg_shifts == 1)] <- rs$CorrShiftDiff3[which(rs$neg_shifts == 1)] * -1

p1 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftDiff1, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p2 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftDiff2, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p3 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftDiff3, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

ggarrange(p1,p2,p3,ncol=3, nrow=1, common.legend = TRUE, legend="bottom")

## Look at CorrShiftRatio

p1 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftRatio1, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p2 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftRatio2, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

p3 <- ggplot(rs, aes(x = SHIFT, y = CorrShiftRatio3, color = as.factor(sign(SHIFT))))+
  geom_point(alpha=.3) + theme_classic() + 
  geom_vline(xintercept = 0, color = "gray")+
  geom_hline(yintercept = 0, color = "gray")+
  geom_abline(intercept = 0, slope = 1)

ggarrange(p1,p2,p3,ncol=3, nrow=1, common.legend = TRUE, legend="bottom")

###################
## Save new database
write.csv(rs,
          "outputs/biov1_method_corrected_shifts.csv",
          row.names = FALSE)

## Upload to google drive
# googledrive::drive_upload(media = "outputs/biov1_corrected_shifts.csv",
#                           path = paste0("BIOSHIFTS Working Group/DATA/Bioshifts_v1/",
#                                         "biov1_method_corrected_shifts.csv"))

In general, the 𝚫Difference corrects more the small values, whereas the 𝚫Ratio corrects more the larger values. The magnitude of correction is much greater under the 𝚫Ratio.

10.1 Observed vs predicted study-level shift

# Raw vs predicted study-level shift
test <- rs_full %>%
  group_by(ID,Type) %>%
  summarise(SHIFT_SL = mean(abs(SHIFT)))
## `summarise()` has grouped output by 'ID'. You can override using the `.groups`
## argument.
test <- merge(corr_rs[,c("ID","Pred_SL")],
              test[,c("ID","Type","SHIFT_SL")],
              by = "ID",
              all = "TRUE")

# Find which one observations were used for fitting the model
test <- test %>%
  mutate(Model = ifelse(ID %in% rs_SL$ID, 1, 0)) %>%
  na.omit

# Plot
ggplot(test,aes(Pred_SL,SHIFT_SL))+
  geom_point(alpha=.5)+
  theme_classic()+
  geom_abline(slope = 1, intercept = 0) +
  labs(x = "Predicted study-level shift", y = "Observed study-level shift")

ggplot(test,aes(Pred_SL,SHIFT_SL))+
  geom_point(alpha=.5)+
  theme_classic()+
  geom_abline(slope = 1, intercept = 0) +
  labs(x = "Predicted study-level shift", y = "Observed study-level shift")+
  facet_wrap(.~Type, scales = "free")

ggplot(test,aes(Pred_SL,SHIFT_SL))+
  geom_point(alpha=.5)+
  theme_classic()+
  geom_abline(slope = 1, intercept = 0) +
  labs(x = "Predicted study-level shift", y = "Observed study-level shift")+
  facet_wrap(Model~Type, scales = "free")

10.2 Shift correction: CorrShiftDifference

# read in corrected shifts
rs <- read.csv("outputs/biov1_method_corrected_shifts.csv")

## Histograms
## Raw vs corrected shift
rs %>%
  gather(key = "shift_type", value = "shift_value", 
         c(SHIFT, CorrShiftDiff1)) %>% 
  ggplot(., aes(x = shift_value, fill = shift_type)) +
  geom_histogram(alpha=.5) +
  theme_bw() +
  facet_wrap(Gradient~Position, scales = "free") +
  scale_fill_discrete(labels = c("Corrected shift (𝚫Difference)", "Observed shift")) +
  labs(x = "Range shift", y = "Frequency", fill = "")

## Distribution of SL difference btw predicted shift and ideal SL shift
ggplot(rs, aes(x = SLDiff1)) +
  geom_histogram() +
  theme_bw() +
  labs(x = "Study-level difference", y = "Frequency")

ggplot(rs, aes(x = SLDiff1)) +
  geom_histogram() +
  theme_bw() +
  facet_grid(Gradient~Position, scales = "free") +
  labs(x = "𝚫Difference", y = "Frequency")

## plot raw vs corrected species-level shift 
ggplot(rs, aes(x = SHIFT, y = CorrShiftDiff1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Observed range shift", y = "Corrected shift (𝚫Difference)") +
  geom_abline(intercept = 0, slope = 1)

ggplot(rs, aes(x = SHIFT, y = CorrShiftDiff1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Observed range shift", y = "Corrected shift (𝚫Difference)") +
  geom_abline(intercept = 0, slope = 1) +
  facet_grid(Gradient~Position, scales = "free")

## plot raw vs delta correction
ggplot(rs, aes(x = SLDiff1, y = abs(SHIFT), colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(y = "Absolute observed shift", x = "𝚫Difference") +
  geom_abline(intercept = 0, slope = 1) +
  facet_grid(Gradient~Position, scales = "free") 

The correction applied tends to be larger for larger shifts. However, note that little correction is applied for some very large shifts at the elevation LE. In general, the magnitude of correction is very low (range delta difference: -2.6288904, 4.3595131)

rs$sign_change <- ifelse(sign(rs$CorrShiftDiff1) != sign(rs$SHIFT) & !rs$SHIFT==0, 1, 0)

rs$sign_change_posneg <- ifelse(sign(rs$CorrShiftDiff1)==-1 & sign(rs$SHIFT)==1 & !rs$SHIFT==0, 1, 0)
rs$sign_change_negpos <- ifelse(sign(rs$CorrShiftDiff1)==1 & sign(rs$SHIFT)==-1 & !rs$SHIFT==0, 1, 0)

## N change in sign for observed shift to predicted shift
rs %>%
  summarise(sign_change = length(which(sign_change==1)),
            total_grain_obs = length(Grain),
            sign_change_percent = (sign_change/total_grain_obs)*100,
            
            sign_change_posneg_percent = (length(which(sign_change_posneg==1))/total_grain_obs)*100,
            sign_change_negpos_percent = (length(which(sign_change_negpos==1))/total_grain_obs)*100,
            
            sign_change_posneg_max_shift = max(SHIFT[which(sign_change_posneg==1)]),
            sign_change_posneg_mean_shift = mean(SHIFT[which(sign_change_posneg==1)]),
            
            sign_change_negpos_min_shift = min(SHIFT[which(sign_change_negpos==1)]),
            sign_change_negpos_mean_shift = mean(SHIFT[which(sign_change_negpos==1)])) %>%
  t %>%
  datatable
rs %>%
  group_by(Param) %>%
  summarise(sign_change = length(which(sign_change==1)),
            total_grain_obs = length(Grain),
            sign_change_percent = (sign_change/total_grain_obs)*100,
            
            sign_change_posneg_percent = (length(which(sign_change_posneg==1))/total_grain_obs)*100,
            sign_change_negpos_percent = (length(which(sign_change_negpos==1))/total_grain_obs)*100) %>%
  t %>% as.data.frame() %>% janitor::row_to_names(1) %>%
  datatable 
rs %>%
  group_by(Type) %>%
  summarise(sign_change = length(which(sign_change==1)),
            total_grain_obs = length(Grain),
            sign_change_percent = (sign_change/total_grain_obs)*100,
            
            sign_change_posneg_percent = (length(which(sign_change_posneg==1))/total_grain_obs)*100,
            sign_change_negpos_percent = (length(which(sign_change_negpos==1))/total_grain_obs)*100) %>%
  t %>% as.data.frame() %>% janitor::row_to_names(1) %>%
  datatable 
# is change in sign related to grain?
rs %>%
  group_by(Grain) %>%
  summarise(sign_change = length(which(sign_change==1)),
            total_grain_obs = length(Grain),
            sign_change_percent = (sign_change/total_grain_obs)*100,
            
            sign_change_posneg_percent = (length(which(sign_change_posneg==1))/total_grain_obs)*100,
            sign_change_negpos_percent = (length(which(sign_change_negpos==1))/total_grain_obs)*100) %>%
  t %>% as.data.frame() %>% janitor::row_to_names(1) %>%
  datatable 

We have a total of 7% total sign change (observed positive shift becomes negative, or a negative shift becomes positive, after applying a correction). This does in includes shifts that were originally equal to zero. The number of positive shifts becoming negative is proportional to the number of negative shifts becoming positive (3.5 and 3.2, respectively).

#  related to the magnitude of shift?
rs %>%
  filter(SHIFT != 0) %>%
  select(sign_change,CorrShiftDiff1,Gradient,Position,Source) %>%
  na.omit() %>%
  ggplot(., aes(x = as.factor(sign_change), y = CorrShiftDiff1)) +
  geom_jitter(aes(color=Source),alpha=.1,width = .1)+
  theme_bw() +
  theme(legend.position = "none") +
  labs(y = "Corrected shift (𝚫Difference)", x = "Sign change") +
  facet_wrap(Gradient~Position, scales = "free")

Shift sign change occurs mainly for short shifts.

10.3 Shift correction #2

# read in corrected shifts
rs <- read.csv("outputs/biov1_method_corrected_shifts.csv")

## Histograms
## Raw vs corrected shift
rs %>%
  gather(key = "shift_type", value = "shift_value", 
         c(SHIFT, CorrShiftRatio1)) %>% 
  ggplot(., aes(x = shift_value, fill = shift_type)) +
  geom_histogram(alpha=.5) +
  theme_bw() +
  facet_wrap(Gradient~Position, scales = "free") +
  scale_fill_discrete(labels = c("Corrected shift (𝚫Ratio)", "Observed shift")) +
  labs(x = "Range shift", y = "Frequency", fill = "")

## Distribution of SL difference btw predicted shift and ideal SL shift
ggplot(rs, aes(x = SLRatio1)) +
  geom_histogram() +
  theme_bw() +
  labs(x = "Study-level difference", y = "Frequency")

ggplot(rs, aes(x = SLRatio1)) +
  geom_histogram() +
  theme_bw() +
  facet_grid(Gradient~Position, scales = "free") +
  labs(x = "𝚫Ratio", y = "Frequency")

## plot raw vs corrected species-level shift 
ggplot(rs, aes(x = SHIFT, y = CorrShiftRatio1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Observed range shift", y = "Corrected shift (𝚫Ratio)") +
  geom_abline(intercept = 0, slope = 1)

ggplot(rs, aes(x = SHIFT, y = CorrShiftRatio1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Observed range shift", y = "Corrected shift (𝚫Ratio)") +
  geom_abline(intercept = 0, slope = 1) +
  facet_grid(Gradient~Position, scales = "free")

The correction applied tends to be larger for larger shifts.

10.4 CorrShiftDiff vs CorrShiftRatio

ggplot(rs, aes(x = CorrShiftRatio1, y = CorrShiftDiff1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Corrected shift (𝚫Ratio)", y = "Corrected shift (𝚫Difference)") +
  geom_abline(intercept = 0, slope = 1) +
  theme(aspect.ratio=1)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

ggplot(rs, aes(x = CorrShiftRatio1, y = CorrShiftDiff1, colour = Source)) +
  geom_point() +
  theme_bw() +
  theme(legend.position = "none") +
  labs(x = "Corrected shift (𝚫Ratio)", y = "Corrected shift (𝚫Difference)") +
  geom_abline(intercept = 0, slope = 1) +
  facet_grid(Gradient~Position, scales = "free")+
  theme(aspect.ratio=1)
## Warning: Removed 1 rows containing missing values (`geom_point()`).

10.5 Compare relationships between shift and velocity

rs <- rs %>%
  mutate(velocity = ifelse(Type == "LAT", v.lat.mean, v.ele.mean))

rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftDiff1,CorrShiftRatio1,ObsShift,Gradient) %>%
  gather(Condition, Shift, -c(velocity,Gradient)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftRatio1", "𝚫Ratio", 
                  ifelse(Condition=="CorrShiftDiff1","𝚫Difference", "Observed Shift"))) %>%
  ggplot(.,aes(x = velocity, y = Shift))+
  geom_smooth(aes(color = Condition))+
  # geom_point(alpha=.1)+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(.~Gradient)
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).

rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftRatio1,CorrShiftDiff1,ObsShift,Gradient,ECO) %>%
  gather(Condition, Shift, -c(velocity,Gradient,ECO)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftRatio1", "𝚫Ratio", 
                  ifelse(Condition=="CorrShiftDiff1","𝚫Difference", "Observed Shift"))) %>%
  ggplot(.,aes(x = velocity, y = Shift))+
  geom_smooth(aes(color = Condition))+
  # geom_point(alpha=.1)+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(Gradient~ECO)
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).

rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftDiff1,CorrShiftRatio1,ObsShift,Gradient,Position,ECO) %>%
  gather(Condition, Shift, -c(velocity,Gradient,Position,ECO)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftRatio1", "𝚫Ratio", 
                  ifelse(Condition=="CorrShiftDiff1","𝚫Difference", "Observed Shift"))) %>%
  ggplot(.,aes(x = velocity, y = Shift))+
  geom_smooth(aes(color = Condition))+
  # geom_point(alpha=.1)+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient~Position,scales = "free")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).

rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftDiff1,CorrShiftRatio1,ObsShift,Gradient,Position,ECO) %>%
  gather(Condition, Shift, -c(velocity,Gradient,Position,ECO)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftRatio1", "𝚫Ratio", 
                  ifelse(Condition=="CorrShiftDiff1","𝚫Difference", "Observed Shift"))) %>%
  ggplot(.,aes(x = velocity, y = Shift))+
  geom_smooth(aes(color = Condition))+
  # geom_point(alpha=.1)+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient*ECO~Position,scales = "free")
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).

any(is.na(rs$CorrShiftDiff1))
## [1] TRUE

10.5.1 CorrShiftDiff across multiple options to select ideal conditions

rs_plot <- rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftDiff1,CorrShiftDiff2,CorrShiftDiff3,ObsShift,Gradient,Position,ECO) %>%
  gather(Condition, Shift, -c(velocity,Gradient,Position,ECO)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftDiff1", "Option1 (Ideal methods)", 
                  ifelse(Condition=="CorrShiftDiff2","Option2 (Common method var)", 
                         ifelse(Condition=="CorrShiftDiff3","Option3 (Common method density)", "Observed Shift")))) 

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(.~Gradient) +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(Gradient~ECO) +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient~Position,scales = "free") +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient*ECO~Position,scales = "free") +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

10.5.2 CorrShiftRatio across multiple options to select ideal conditions

rs_plot <- rs %>%
  mutate(ObsShift = SHIFT) %>%
  select(velocity,CorrShiftRatio1,CorrShiftRatio2,CorrShiftRatio3,ObsShift,Gradient,Position,ECO) %>%
  gather(Condition, Shift, -c(velocity,Gradient,Position,ECO)) %>% 
  mutate(Condition = 
           ifelse(Condition=="CorrShiftRatio1", "Option1 (Ideal methods)", 
                  ifelse(Condition=="CorrShiftRatio2","Option2 (Common method var)", 
                         ifelse(Condition=="CorrShiftRatio3","Option3 (Common method density)", "Observed Shift")))) 

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(.~Gradient) +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_wrap(Gradient~ECO) +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient~Position,scales = "free") +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

ggplot(rs_plot,aes(x = velocity, y = Shift, color = Condition, fill = Condition))+
  geom_smooth(alpha = .2) + 
  geom_vline(xintercept = 0, color = "gray")+
  labs(x = "Velocity", y = "Range shift")+
  facet_grid(Gradient*ECO~Position,scales = "free") +
  theme_classic() + theme(legend.position="bottom") + guides(color=guide_legend(nrow=2,byrow=TRUE))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 7 rows containing non-finite values (`stat_smooth()`).

11 Conclusions

Main difference between methods:
ΔRatio Method: The predictions become more spread out or variable for larger observed shifts.

ΔDifference Method: The predictions become more scattered or varied for shifts close to zero (i.e., small changes in range shifts).

Which method to select?
The choice of method depends on the specific objective of the correction. We can provide the following guidance:

ΔRatio Method: If the goal is to correct larger absolute shift values (i.e., significant changes in range shifts across studies with different methodologies), then the ΔRatio method should be selected.

ΔDifference Method: On the other hand, if the aim is to correct shorter absolute shift values (i.e., negligible changes in short range shifts).

How do we know if our method generated meaningful corrected shift values?
To evaluate the meaningfulness of the corrected shift values generated by our methods, we may want to propose a simulation-based approach. Here, we can suggest generating a distribution of known (true) shift values through simulations. We can then apply a certain level of distortion to these known shift values to represent the effect of methodological variables on range shift detection. After applying the shift correction approach, we can compare the generated corrected shift values with the known true shift values. If the corrected shifts approximate the true shifts accurately, it indicates that the methods generate meaningful corrections.