Identified 14 important covariates (None of which were phosphorus).
Optimised JSDM by scaling covariates.
A model using all data predicted almost perfectly onto itself.
Models predicting onto trained and tested on different data had mixed results. Randomly selecting half the plots as training data still predicted really well onto the remaining plots. However, using two sites to predict onto a third severely decreased predictive power.
The proportion of variance explained by the environment vs biotic interactions seemed sensitive to the data used. A similar pattern happened with the correlation results.
I first created a correlation matrix and filtered those variables that had a |0.7| correlation or greater. These were then plotted against each other.
corr_vars <- occ_df %>% select(acidity,
Mg,
Na,
K,
Clay,
Silt,
Sand,
conductivity_ms,
ph_kcl,
C_perc,
corr_dC,
C_N_ratio) %>%
cor()
corrplot::corrplot(corr_vars, type = "lower", method = "number")
There were strong correlations between pH, acidity, Mg, K and corrected dC. The pH was used as a proxy variable and renamed ph_et_al. Clay, silt and sand formed another group. Clay was used as a proxy and renamed texture. Conductivity and Na were correlated; the latter was chosen and renamed salt. Finally, there was a correlation between C_N_ratio and percent C; C_N_ration was renamed carbon. Munsell colour was also excluded.
dat_occ <- cbind(occ_df[,1:10], occ_df %>% select(lon,
lat,
percent_over1,
percent_over2,
Ca,
Na,
P,
Olsen,
Clay,
ph_kcl,
N_perc,
corr_dN,
C_N_ratio,
elevation,
slope,
aspect,
drainage,
Q_cover))
colnames(dat_occ)[colnames(dat_occ) == "ph_kcl"] <- "ph_et_al"
colnames(dat_occ)[colnames(dat_occ) == "Clay"] <- "texture"
colnames(dat_occ)[colnames(dat_occ) == "Na"] <- "salt"
colnames(dat_occ)[colnames(dat_occ) == "C_N_ratio"] <- "carbon"
A BRT was then run for each species with the above covariates (Tc = 2, lr = 0.0005). A variable was selected for the JSDM analysis if it had a relative importance of 5% in at least one of the BRTs. The final variables are shown below. Interestingly, neither inorganic nor Olsen P was influential.
fin_occ_df <- dat_occ %>%
select(1:10,
ph_et_al,
salt,
carbon,
Ca,
Q_cover,
elevation,
percent_over1,
percent_over2,
lat,
lon,
aspect,
texture,
corr_dN,
drainage)
First ran a model using all 150 plots, and three latent variables. However, the diagnostic plots looked unpromising.
## NULL
I then retried after scaling the covariates. First centering is done by subtracting the column means of x from their corresponding columns. Then scaling is done by dividing the (centered) columns of x by their standard deviations. The author of boral did this in one of his examples. The diagnostic plots of this model looked much better. From now on I’ll be using scaled covariates.
## NULL
The above model was used to predict onto the same data. The AUC values were all close to one, suggesting a very good predictive power.
Species | AUC |
---|---|
R_burtoniae | 1.000 |
R_comptonii | 0.999 |
D_diversifolium | 0.998 |
A_delaetii | 1.000 |
A_fissum | 1.000 |
A_framesii | 0.999 |
C_spissum | 0.999 |
C_staminodiosum | 0.996 |
Dicrocaulon_sp | 1.000 |
Oophytum_sp | 0.983 |
Let’s use sites 2 and 3 to predict site 1. Some species are predicted really well, others very badly.
Species | AUC |
---|---|
R_burtoniae | 0.908 |
R_comptonii | 0.626 |
D_diversifolium | 0.329 |
A_delaetii | 0.908 |
A_fissum | 0.415 |
A_framesii | 0.409 |
C_spissum | 0.730 |
C_staminodiosum | 0.729 |
Dicrocaulon_sp | 0.359 |
Oophytum_sp | 0.917 |
Now sites 1 and 2 on 3. Predictions are much worse than they were. A_framesii is not included as it didn’t occur in site 3.
Species | AUC |
---|---|
R_burtoniae | 0.851 |
R_comptonii | 0.516 |
D_diversifolium | 0.417 |
A_delaetii | 0.575 |
A_fissum | 0.419 |
C_spissum | 0.490 |
C_staminodiosum | 0.731 |
Dicrocaulon_sp | 0.478 |
Oophytum_sp | 0.660 |
Now sites 1 and 3 on 2. Predictions are much worse than they were. R_comptonii and C_stamin are not included as it didn’t occur in site 2.
Species | AUC |
---|---|
R_burtoniae | 0.943 |
D_diversifolium | 0.321 |
A_delaetii | 0.681 |
A_fissum | 0.078 |
A_framesii | 0.531 |
C_spissum | 0.400 |
Dicrocaulon_sp | 0.637 |
Oophytum_sp | 0.650 |
Let’s try by randomly selecting 75 plots and predicting onto the remaining 75. It predicts a lot better than site by site.
Species | AUC |
---|---|
R_burtoniae | 0.918 |
R_comptonii | 0.790 |
D_diversifolium | 0.832 |
A_delaetii | 0.794 |
A_fissum | 0.601 |
A_framesii | 0.623 |
C_spissum | 0.739 |
C_staminodiosum | 0.882 |
Dicrocaulon_sp | 0.832 |
Oophytum_sp | 0.916 |
Overall then, there is some indication that distribution is deterministic.
Boral can partition variance into that explained by environment and latent variables. However, this seems quite sensitive to the data used. The full model is shown on the left, and the randomly chosen one on the right.
Using the full model. Correlation due to environment on the left, correlation due to latent variables on the right.
Using the random model.
Full model