This is a non-restricted/unsupervised model which only uses a response matrix. This procedure convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. This is a method of dimensionality reduction, so it will find the components which can explain the biggest amount of variance in the data.
The assumptions to be considered here are:
mshapiro.test(t(env))
##
## Shapiro-Wilk normality test
##
## data: Z
## W = 0.55746, p-value = 3.276e-08
The multivariant normality assumption can not be accepted.
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste(prefix, txt, sep = "")
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(env, diag.panel = panel.hist, lower.panel = panel.smooth, upper.panel = panel.cor)
Maybe this is not a test easy to check. We should try with another one:
PerformanceAnalytics::chart.Correlation(doubs$env, histogram = TRUE, pch = 19)
Not all the relationships between variables are linear. Despite both of the assumptions are violated and we should be careful with this dataset, we will continue our analysis as an example.
The PCA model uses the distance matrix of the dataset. So we have to scale it before creating the model.
env.pca <- rda(env, scale = TRUE)
summary(env.pca) # By default scaling 2
##
## Call:
## rda(X = env, scale = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 11 1
## Unconstrained 11 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 6.446 2.2252 0.99825 0.39831 0.36224 0.25361 0.16106
## Proportion Explained 0.586 0.2023 0.09075 0.03621 0.03293 0.02306 0.01464
## Cumulative Proportion 0.586 0.7883 0.87903 0.91524 0.94817 0.97123 0.98587
## PC8 PC9 PC10 PC11
## Eigenvalue 0.11062 0.022949 0.017476 0.004378
## Proportion Explained 0.01006 0.002086 0.001589 0.000398
## Cumulative Proportion 0.99593 0.998013 0.999602 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 4.189264
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## dfs 1.10786 0.4901 -0.20230 0.045871 -0.187500 -0.20053
## alt -1.06604 -0.5607 0.15135 0.193929 0.096582 -0.05541
## slo -0.95746 -0.5295 -0.25933 -0.352365 0.066943 -0.39476
## flo 0.98732 0.6262 -0.23731 0.009137 -0.105599 -0.31209
## pH -0.02679 0.4991 1.14319 -0.014864 -0.005398 -0.17594
## har 0.91493 0.5514 -0.09538 -0.128923 0.639664 0.07145
## pho 1.02238 -0.6481 0.20016 -0.178923 0.040772 -0.04615
## nit 1.14057 -0.1628 0.05081 -0.283125 -0.272940 0.20434
## amm 0.96776 -0.7382 0.18740 -0.198493 -0.021140 0.04733
## oxy -0.99282 0.4993 0.04744 -0.543295 -0.012373 0.06953
## bdo 0.96051 -0.7274 0.09630 0.089149 0.178489 -0.14521
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 -1.35309 -1.0614 -0.626184 -1.14841 -1.0494 -1.821e+00
## 2 -1.05499 -0.7841 0.195705 0.90757 -1.7294 2.655e-01
## 3 -0.97457 -0.4890 1.340010 0.61152 -0.8592 -7.288e-01
## 4 -0.90281 -0.3118 0.000372 0.17712 0.2052 5.288e-01
## 5 -0.45666 -0.6973 0.550276 1.16964 1.2500 1.273e-01
## 6 -0.81070 -0.7590 -0.318284 0.77249 -0.2537 1.263e-01
## 7 -0.85277 -0.1858 0.231151 -0.37159 1.3564 -3.616e-01
## 9 -0.27926 -0.4610 0.061754 1.60909 1.2127 8.870e-01
## 10 -0.59145 -0.5554 -1.595293 -0.35281 0.6397 -2.559e-01
## 11 -0.34078 0.3167 0.005014 -1.23777 0.7883 -2.345e-01
## 12 -0.44165 0.3209 -0.697214 -0.46903 0.3928 5.963e-01
## 13 -0.39855 0.6314 0.003511 -0.90415 1.0778 5.002e-05
## 14 -0.22649 0.7350 0.946755 -0.87823 0.8730 3.814e-02
## 15 -0.21927 1.0432 2.269502 -0.19576 -0.1024 -2.360e-01
## 16 -0.16778 0.2507 -0.340084 -0.54136 -0.1054 6.195e-01
## 17 0.14914 0.3628 -0.171537 -0.14337 -0.1372 1.435e+00
## 18 0.08633 0.3674 -0.238531 -0.44014 -0.2901 1.015e+00
## 19 0.10967 0.4821 0.232435 -0.28363 -0.7316 9.414e-01
## 20 0.18575 0.3732 -0.268777 -0.69333 -0.9729 1.131e+00
## 21 0.16766 0.3112 -0.834583 0.21603 -0.7147 5.100e-01
## 22 0.13041 0.4842 -0.108135 0.18812 -0.2895 -5.716e-01
## 23 1.28519 -1.3164 0.715652 -0.57204 0.6499 -1.021e+00
## 24 1.01542 -0.4735 0.019519 1.43289 0.5981 1.440e-01
## 25 2.10059 -2.1406 0.361157 -1.21146 -0.1786 4.424e-01
## 26 0.89379 -0.1213 -0.671652 0.86581 -0.3046 -8.871e-02
## 27 0.61092 0.3178 -0.139402 0.31511 -0.8178 -9.684e-01
## 28 0.82353 0.8569 0.802337 -0.04239 -0.8747 5.103e-02
## 29 0.67793 1.0652 -1.729365 0.28387 0.2944 -1.028e+00
## 30 0.83450 1.4380 0.003890 0.93621 0.0730 -1.543e+00
summary(env.pca, scaling = 1)
##
## Call:
## rda(X = env, scale = TRUE)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 11 1
## Unconstrained 11 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 6.446 2.2252 0.99825 0.39831 0.36224 0.25361 0.16106
## Proportion Explained 0.586 0.2023 0.09075 0.03621 0.03293 0.02306 0.01464
## Cumulative Proportion 0.586 0.7883 0.87903 0.91524 0.94817 0.97123 0.98587
## PC8 PC9 PC10 PC11
## Eigenvalue 0.11062 0.022949 0.017476 0.004378
## Proportion Explained 0.01006 0.002086 0.001589 0.000398
## Cumulative Proportion 0.99593 0.998013 0.999602 1.000000
##
## Scaling 1 for species and site scores
## * Sites are scaled proportional to eigenvalues
## * Species are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 4.189264
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## dfs 1.447 1.0896 -0.6715 0.24106 -1.03324 -1.3207
## alt -1.393 -1.2467 0.5024 1.01912 0.53222 -0.3649
## slo -1.251 -1.1772 -0.8608 -1.85173 0.36890 -2.5998
## flo 1.290 1.3924 -0.7878 0.04801 -0.58192 -2.0554
## pH -0.035 1.1098 3.7948 -0.07811 -0.02975 -1.1587
## har 1.195 1.2260 -0.3166 -0.67751 3.52493 0.4706
## pho 1.336 -1.4410 0.6644 -0.94026 0.22468 -0.3040
## nit 1.490 -0.3619 0.1687 -1.48786 -1.50406 1.3457
## amm 1.264 -1.6413 0.6221 -1.04311 -0.11649 0.3117
## oxy -1.297 1.1100 0.1575 -2.85509 -0.06818 0.4579
## bdo 1.255 -1.6174 0.3197 0.46849 0.98358 -0.9563
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 -1.03580 -0.47737 -0.1886364 -0.218531 -0.19043 -2.765e-01
## 2 -0.80759 -0.35267 0.0589557 0.172702 -0.31383 4.031e-02
## 3 -0.74604 -0.21992 0.4036746 0.116366 -0.15591 -1.107e-01
## 4 -0.69110 -0.14022 0.0001121 0.033704 0.03725 8.029e-02
## 5 -0.34957 -0.31363 0.1657693 0.222572 0.22683 1.933e-02
## 6 -0.62059 -0.34137 -0.0958822 0.146997 -0.04605 1.918e-02
## 7 -0.65279 -0.08358 0.0696338 -0.070711 0.24615 -5.490e-02
## 9 -0.21377 -0.20732 0.0186032 0.306194 0.22006 1.347e-01
## 10 -0.45276 -0.24979 -0.4805779 -0.067136 0.11609 -3.886e-02
## 11 -0.26087 0.14244 0.0015104 -0.235536 0.14305 -3.560e-02
## 12 -0.33808 0.14434 -0.2100341 -0.089252 0.07127 9.055e-02
## 13 -0.30509 0.28397 0.0010576 -0.172050 0.19558 7.596e-06
## 14 -0.17338 0.33057 0.2852075 -0.167119 0.15842 5.792e-03
## 15 -0.16785 0.46919 0.6836817 -0.037251 -0.01858 -3.583e-02
## 16 -0.12843 0.11274 -0.1024494 -0.103016 -0.01913 9.406e-02
## 17 0.11417 0.16318 -0.0516750 -0.027282 -0.02491 2.178e-01
## 18 0.06609 0.16523 -0.0718567 -0.083754 -0.05265 1.542e-01
## 19 0.08395 0.21681 0.0700204 -0.053972 -0.13277 1.429e-01
## 20 0.14219 0.16784 -0.0809685 -0.131933 -0.17656 1.717e-01
## 21 0.12834 0.13995 -0.2514160 0.041108 -0.12969 7.744e-02
## 22 0.09983 0.21776 -0.0325754 0.035797 -0.05254 -8.679e-02
## 23 0.98382 -0.59208 0.2155883 -0.108853 0.11794 -1.550e-01
## 24 0.77731 -0.21296 0.0058801 0.272664 0.10854 2.186e-02
## 25 1.60801 -0.96276 0.1087977 -0.230529 -0.03241 6.717e-02
## 26 0.68419 -0.05457 -0.2023335 0.164755 -0.05528 -1.347e-02
## 27 0.46766 0.14295 -0.0419945 0.059962 -0.14840 -1.470e-01
## 28 0.63041 0.38542 0.2417020 -0.008066 -0.15873 7.749e-03
## 29 0.51895 0.47910 -0.5209667 0.054018 0.05342 -1.561e-01
## 30 0.63881 0.64676 0.0011718 0.178152 0.01325 -2.343e-01
The total amount of variance that we can explain with two components is 78.83%, so this is a good model. We also can see the coordenates along each PC for each sample.
At this step we need to select the PC that best fits to the data. To do this, we will see the next criteria:
(ev <- env.pca$CA$eig)
## PC1 PC2 PC3 PC4 PC5 PC6
## 6.445919056 2.225185922 0.998250385 0.398313507 0.362238435 0.253611800
## PC7 PC8 PC9 PC10 PC11
## 0.161055460 0.110621889 0.022948981 0.017476247 0.004378319
ev[ev > mean(ev)]
## PC1 PC2
## 6.445919 2.225186
Any value >1 is supposed to be good for our model. In this case, we would choose both PC1 and PC2.
For this purpose we will use the evplot() function created by François Gillet (2007)
n <- length(ev)
bsm <- data.frame(j = seq(1: n), p = 0)
bsm$p[1] <- 1/n
for (i in 2:n){bsm$p[i] = bsm$p[i - 1] + (1/(n + 1 - i))}
bsm$p <- 100 * bsm$p/ n
evplot <- function(ev) {
n = length(ev)
bsm = data.frame(j=seq(1:n), p=0)
bsm$p[1] = 1/n
for (i in 2:n) {
bsm$p[i] = bsm$p[i-1] + (1/(n + 1 - i))
}
bsm$p = 100*bsm$p/n
# Plot eigenvalues and % of variation for each axis
op <- par(mfrow = c(2,1))
barplot(ev, main = "Eigenvalues", col = "bisque", las = 2)
abline(h = mean(ev), col = "red")
legend("topright", "Average eigenvalue", lwd = 1, col = 2, bty = "n")
barplot(t(cbind(100 * ev/sum(ev), bsm$p[n: 1])), beside = TRUE,
main = "% variation", col = c("bisque", 2), las = 2)
legend("topright", c("% eigenvalue", "Broken stick model"),
pch = 15, col = c("bisque", 2), bty = "n")
par(op)
}
evplot(ev)
The eigenvalue is higher in the PC1 and PC2 cases, so that is the number of components we need.
We are going to create a plot where we can see how much variance of the axis is explained by each factor.
"cleanplot.pca" <- function(res.pca, ax1 = 1, ax2 = 2, point = FALSE,
ahead = 0.07, cex = 0.7)
{
require("vegan")
par(mfrow = c(1, 2))
p <- length(res.pca$CA$eig)
# Scaling 1: "species" scores scaled to relative eigenvalues
sit.sc1 <- scores(res.pca, display = "wa", scaling = 1, choices = c(1: p))
spe.sc1 <- scores(res.pca, display = "sp", scaling = 1, choices = c(1: p))
plot(res.pca, choices = c(ax1, ax2), display = c("wa", "sp"), type = "n",
main = "PCA - scaling 1", scaling = 1)
if(point)
{
points(sit.sc1[,ax1], sit.sc1[,ax2], pch = 20)
text(res.pca, display = "wa", choices = c(ax1, ax2), cex = cex, pos = 3, scaling = 1)
}
else
{
text(res.pca, display = "wa", choices = c(ax1, ax2), cex = cex, scaling = 1)
}
text(res.pca, display = "sp", choices = c(ax1, ax2), cex = cex, pos = 4,
col = "red", scaling = 1)
arrows(0, 0, spe.sc1[,ax1], spe.sc1[,ax2], length = ahead, angle = 20, col = "red")
pcacircle(res.pca)
# Scaling 2: site scores scaled to relative eigenvalues
sit.sc2 <- scores(res.pca, display = "wa", choices = c(1: p))
spe.sc2 <- scores(res.pca, display = "sp", choices = c(1: p))
plot(res.pca, choices = c(ax1, ax2), display = c("wa", "sp"), type = "n",
main = "PCA - scaling 2")
if (point) {
points(sit.sc2[,ax1], sit.sc2[,ax2], pch = 20)
text(res.pca, display = "wa", choices = c(ax1, ax2), cex = cex, pos = 3)
}
else
{
text(res.pca, display = "wa", choices = c(ax1, ax2), cex = cex)
}
text(res.pca, display = "sp", choices = c(ax1, ax2), cex = cex, pos = 4, col = "red")
arrows(0, 0, spe.sc2[,ax1], spe.sc2[,ax2], length=ahead, angle=20, col="red")
}
"pcacircle" <- function (pca)
{
eigenv <- pca$CA$eig
p <- length(eigenv)
n <- nrow(pca$CA$u)
tot <- sum(eigenv)
const <- ((n - 1) * tot) ^ 0.25
radius <- (2/ p) ^ 0.5
radius <- radius * const
symbols(0, 0, circles = radius, inches = FALSE, add = TRUE, fg = 2)
}
cleanplot.pca(env.pca)
In this kind of plot we can see the values which are similar depending on the distance between them. The scaling 1 plot also represents the contribution of every variable to the axis according to its size. The scaling 2 plot is a representation of the correlation between values where the correlation is the cosine of the angle (90º means no correlation at all).
This is the supervised version of the PCA, that is to say, an analysis where we need an explanatory matrix and a response matrix.
spe.hel <- decostand(spe, "hellinger")
(spe.rda <- rda(spe.hel ~ ., env[, -1])) # Scaling = 2 (by default)
## Call: rda(formula = spe.hel ~ alt + slo + flo + pH + har + pho +
## nit + amm + oxy + bdo, data = env[, -1])
##
## Inertia Proportion Rank
## Total 0.5025 1.0000
## Constrained 0.3590 0.7144 10
## Unconstrained 0.1435 0.2856 18
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9
## 0.22606 0.05338 0.03152 0.02750 0.00680 0.00570 0.00304 0.00279 0.00134
## RDA10
## 0.00087
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.04579 0.02424 0.01828 0.01541 0.01005 0.00852 0.00522 0.00480
## (Showed only 8 of all 18 unconstrained eigenvalues)
summary(spe.rda)
##
## Call:
## rda(formula = spe.hel ~ alt + slo + flo + pH + har + pho + nit + amm + oxy + bdo, data = env[, -1])
##
## Partitioning of variance:
## Inertia Proportion
## Total 0.5025 1.0000
## Constrained 0.3590 0.7144
## Unconstrained 0.1435 0.2856
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 0.2261 0.05338 0.03152 0.02750 0.006801 0.005696
## Proportion Explained 0.4499 0.10622 0.06273 0.05472 0.013534 0.011336
## Cumulative Proportion 0.4499 0.55609 0.61882 0.67354 0.687075 0.698411
## RDA7 RDA8 RDA9 RDA10 PC1 PC2
## Eigenvalue 0.003036 0.002794 0.001341 0.0008689 0.04579 0.02424
## Proportion Explained 0.006042 0.005561 0.002669 0.0017290 0.09112 0.04824
## Cumulative Proportion 0.704453 0.710014 0.712682 0.7144114 0.80553 0.85376
## PC3 PC4 PC5 PC6 PC7 PC8
## Eigenvalue 0.01828 0.01541 0.01005 0.008518 0.005224 0.004799
## Proportion Explained 0.03638 0.03066 0.02000 0.016950 0.010395 0.009549
## Cumulative Proportion 0.89015 0.92081 0.94080 0.957750 0.968146 0.977695
## PC9 PC10 PC11 PC12 PC13
## Eigenvalue 0.003111 0.002779 0.001824 0.001309 0.0007633
## Proportion Explained 0.006191 0.005530 0.003630 0.002604 0.0015189
## Cumulative Proportion 0.983886 0.989415 0.993045 0.995650 0.9971685
## PC14 PC15 PC16 PC17 PC18
## Eigenvalue 0.0007211 0.0003466 0.0001628 0.0001183 7.405e-05
## Proportion Explained 0.0014350 0.0006897 0.0003240 0.0002354 1.474e-04
## Cumulative Proportion 0.9986035 0.9992932 0.9996172 0.9998526 1.000e+00
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 0.2261 0.05338 0.03152 0.0275 0.006801 0.005696
## Proportion Explained 0.6297 0.14868 0.08781 0.0766 0.018944 0.015867
## Cumulative Proportion 0.6297 0.77838 0.86619 0.9428 0.961736 0.977603
## RDA7 RDA8 RDA9 RDA10
## Eigenvalue 0.003036 0.002794 0.001341 0.0008689
## Proportion Explained 0.008457 0.007784 0.003736 0.0024202
## Cumulative Proportion 0.986060 0.993844 0.997580 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 1.93676
##
##
## Species scores
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Cogo 0.12855 0.110633 -0.209194 -0.1071449 -0.028611 0.0141994
## Satr 0.63509 0.033253 0.208510 -0.1439723 0.019792 -0.0111685
## Phph 0.47357 0.105922 -0.090979 0.1251551 0.034336 0.0009744
## Neba 0.36572 0.112390 -0.056054 0.2106755 0.032913 -0.0126325
## Thth 0.12897 0.105128 -0.199485 -0.1260043 -0.044784 0.0073734
## Teso 0.06532 0.122998 -0.181470 -0.0790183 -0.007990 0.0165030
## Chna -0.17173 0.077858 -0.016572 0.0102750 0.027716 -0.0635971
## Chto -0.12774 0.164448 -0.035461 -0.0055416 0.088885 -0.0253512
## Lele -0.08582 0.037082 -0.018133 0.0519389 0.005327 0.0774115
## Lece -0.09794 -0.146753 -0.146155 0.1089256 -0.079503 0.0057278
## Baba -0.18219 0.213000 -0.044083 -0.0486677 0.016435 -0.0116779
## Spbi -0.15864 0.162595 -0.009591 -0.0036451 0.058718 -0.0070201
## Gogo -0.20564 0.032595 -0.031655 0.0255721 0.040118 0.0853105
## Eslu -0.11131 0.030323 0.059115 0.0556236 0.027627 0.0717972
## Pefl -0.09514 0.114076 0.026404 0.0972731 0.012597 -0.0081279
## Rham -0.20707 0.161102 0.039337 -0.0181777 0.010822 -0.0007049
## Legi -0.22981 0.110587 0.014642 0.0068236 -0.009266 -0.0471023
## Scer -0.16703 -0.001545 0.043678 0.0005884 0.007734 0.0904951
## Cyca -0.17709 0.142182 0.030518 -0.0146280 -0.006343 -0.0014130
## Titi -0.13869 0.119661 0.032858 0.1454023 -0.031858 -0.0078230
## Abbr -0.18814 0.111557 0.070736 -0.0277619 -0.054583 -0.0043464
## Icme -0.15010 0.074868 0.077897 -0.0270093 -0.088491 0.0069117
## Acce -0.30997 0.011383 0.037899 -0.0131401 -0.003935 0.0447406
## Ruru -0.31367 -0.154037 -0.058600 0.1366742 0.005703 -0.0531753
## Blbj -0.24063 0.085917 0.054743 -0.0089691 -0.056208 -0.0477626
## Alal -0.42634 -0.225180 -0.081617 -0.1348421 0.092524 -0.0256802
## Anan -0.19386 0.139705 0.046797 -0.0159675 -0.007872 0.0013463
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 0.40041 -0.295511 1.14270 -1.175086 -0.11686 -0.56914
## 2 0.53738 -0.030567 0.48623 -0.061268 0.45958 -0.61134
## 3 0.49599 -0.006984 0.46543 0.116524 0.58777 -0.18445
## 4 0.33605 0.039678 0.31614 0.545907 0.24808 0.33426
## 5 0.02646 -0.180823 0.20169 0.764075 -0.03656 1.27753
## 6 0.24631 -0.081858 0.13671 0.800683 -0.03204 0.33302
## 7 0.46870 -0.110516 0.25090 0.208714 0.04653 -0.11912
## 9 0.04338 -0.523846 -0.42403 1.026668 -1.06941 -1.07501
## 10 0.31677 -0.145957 -0.14045 0.542830 -0.03552 0.75199
## 11 0.48369 -0.037798 -0.29659 -0.345340 -0.60788 -0.32583
## 12 0.49406 0.017009 -0.29630 -0.325663 -0.52782 -0.37330
## 13 0.49904 0.186836 -0.44001 -0.725909 -0.33387 -0.22269
## 14 0.38323 0.228190 -0.61700 -0.523171 -0.36312 0.45362
## 15 0.28869 0.229638 -0.67296 -0.216753 -0.60059 0.75672
## 16 0.09032 0.415053 -0.38401 0.111877 0.50058 0.41060
## 17 -0.05323 0.450469 -0.41933 -0.033371 0.99567 -0.38515
## 18 -0.14222 0.415224 -0.40307 0.009911 0.82358 -0.15473
## 19 -0.28308 0.309911 -0.14823 0.320089 0.90414 -0.16896
## 20 -0.39851 0.223514 0.04728 0.198958 0.55326 -0.11193
## 21 -0.43175 0.272783 0.20026 0.087309 0.16304 0.02239
## 22 -0.46911 0.233362 0.24707 0.016741 -0.25624 -0.02843
## 23 -0.27656 -1.156080 -0.42377 -0.339974 0.05514 -1.29147
## 24 -0.40768 -0.771586 -0.17939 -0.397065 -0.00161 -1.12261
## 25 -0.35385 -0.808410 -0.08725 -0.263340 0.56199 1.91413
## 26 -0.47197 0.077281 0.25065 0.002092 -0.32899 0.03663
## 27 -0.47349 0.193232 0.28063 -0.004540 -0.37185 0.03277
## 28 -0.47634 0.209299 0.31020 -0.024574 -0.52938 0.01265
## 29 -0.37836 0.369162 0.18643 -0.171011 -0.27590 0.09357
## 30 -0.49433 0.279294 0.41006 -0.145314 -0.41174 0.31428
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 0.56869 -0.158590 0.980138 -0.65131 0.051320 -0.06640
## 2 0.26581 0.036765 0.674201 0.26846 0.718990 -0.14076
## 3 0.37338 -0.139943 0.519657 -0.02578 0.066108 -0.25253
## 4 0.46595 0.062636 0.053487 0.42704 0.114337 0.39565
## 5 0.21562 -0.451325 0.011212 0.65932 -0.335005 -0.11493
## 6 0.37544 -0.148582 0.204815 0.24235 0.129476 -0.31015
## 7 0.55448 -0.146226 -0.035860 0.20126 -0.430822 0.36722
## 9 0.01718 -0.251178 -0.115538 1.06866 -0.296933 0.05943
## 10 0.17546 -0.138170 0.173185 0.06342 -0.165935 -0.17391
## 11 0.28137 0.165418 -0.130714 -0.23404 -0.397820 0.39167
## 12 0.32329 0.167258 -0.401398 -0.18786 0.264101 -0.28265
## 13 0.38209 0.107356 -0.507743 -0.32445 -0.147620 -0.06059
## 14 0.38151 0.152013 -0.579728 -0.35651 -0.281294 -0.03069
## 15 0.24564 0.271175 -0.412127 -0.33449 -0.279939 0.22797
## 16 -0.03340 0.250314 -0.184407 0.08800 0.073618 0.07840
## 17 -0.10817 0.258271 -0.343190 0.14689 0.513962 -0.40907
## 18 -0.06189 0.311727 -0.257640 0.07952 0.356584 0.22352
## 19 -0.03711 0.379177 -0.294081 -0.03274 0.349407 0.18128
## 20 -0.22431 0.375617 -0.070890 -0.03861 0.737369 -0.12087
## 21 -0.34883 0.258940 0.015988 0.10805 0.246878 -0.32125
## 22 -0.27772 0.068382 -0.008991 -0.20070 -0.189680 -0.46539
## 23 -0.18786 -1.048280 -0.339613 -0.71285 -0.008217 -0.42604
## 24 -0.51613 -0.555550 -0.319464 0.28272 -0.143026 -0.64407
## 25 -0.40978 -0.924082 -0.070743 -0.23578 0.678956 1.18999
## 26 -0.49964 0.002689 0.100691 0.26791 -0.374118 0.31294
## 27 -0.60047 0.063920 0.468172 -0.14854 -0.226959 -0.26419
## 28 -0.51391 0.403655 0.176190 0.07713 -0.038538 0.31623
## 29 -0.36818 0.324611 0.361512 -0.25986 -0.323494 0.09805
## 30 -0.43852 0.302002 0.332876 -0.23720 -0.661703 0.24111
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## alt 0.8281 -0.18518 0.37869 0.3532 0.04195 -0.05880
## slo 0.7399 -0.16882 0.44280 -0.0841 -0.03149 -0.10932
## flo -0.7820 0.22102 -0.05683 -0.3560 -0.22000 0.18778
## pH 0.1023 0.17603 -0.24148 -0.1881 -0.27655 0.07525
## har -0.5743 0.05063 -0.57561 -0.1145 -0.39249 0.22844
## pho -0.4936 -0.67392 -0.16113 -0.2175 0.17603 0.42118
## nit -0.7786 -0.22415 -0.19299 -0.2071 0.28305 0.33923
## amm -0.4747 -0.70878 -0.13815 -0.1882 0.32390 0.32113
## oxy 0.7648 0.58218 -0.06834 -0.2290 0.04329 -0.11011
## bdo -0.5169 -0.81046 -0.14577 -0.1320 0.05312 0.05688
Here we can see the values of the constrained axes. Another important value is the inertia, which is the explained variance of the model. Our explanatory matrix can explain a 71% of the total variance, which means that we have a good model.
sp <- spe.rda$CA$eig
sp[sp > mean(sp)] # Kaiser-Guttman test
## PC1 PC2 PC3 PC4 PC5 PC6
## 0.045786390 0.024239717 0.018282574 0.015406393 0.010047723 0.008517531
evplot(sp)
The criteria among the tests are non consistent. We see we have up to 4 significant axis according to the broken-stick test.
anova(spe.rda, step = 1000)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ alt + slo + flo + pH + har + pho + nit + amm + oxy + bdo, data = env[, -1])
## Df Variance F Pr(>F)
## Model 10 0.35900 4.5028 0.001 ***
## Residual 18 0.14351
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model is statiscally significant (p < 0.001)
anova(spe.rda, by = "axis", step = 1000)
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ alt + slo + flo + pH + har + pho + nit + amm + oxy + bdo, data = env[, -1])
## Df Variance F Pr(>F)
## RDA1 1 0.226063 28.3542 0.001 ***
## RDA2 1 0.053376 6.6947 0.001 ***
## RDA3 1 0.031523 3.9538 0.111
## RDA4 1 0.027499 3.4491 0.231
## RDA5 1 0.006801 0.8530 1.000
## RDA6 1 0.005696 0.7145 1.000
## RDA7 1 0.003036 0.3808 1.000
## RDA8 1 0.002794 0.3505 1.000
## RDA9 1 0.001341 0.1682 1.000
## RDA10 1 0.000869 0.1090 1.000
## Residual 18 0.143511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Just the two first axis are significants.
anova(spe.rda, by = "margin", step = 1000)
## Permutation test for rda under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ alt + slo + flo + pH + har + pho + nit + amm + oxy + bdo, data = env[, -1])
## Df Variance F Pr(>F)
## alt 1 0.008165 1.0240 0.382
## slo 1 0.011706 1.4682 0.202
## flo 1 0.015431 1.9354 0.101
## pH 1 0.004976 0.6241 0.644
## har 1 0.015158 1.9012 0.095 .
## pho 1 0.005999 0.7525 0.522
## nit 1 0.003812 0.4781 0.787
## amm 1 0.003209 0.4025 0.856
## oxy 1 0.025887 3.2469 0.018 *
## bdo 1 0.004806 0.6028 0.688
## Residual 18 0.143511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RsquareAdj(spe.rda); vif.cca(spe.rda)
## $r.squared
## [1] 0.7144114
##
## $adj.r.squared
## [1] 0.555751
## alt slo flo pH har pho nit
## 17.475490 4.221651 6.021996 1.471728 2.994588 25.379501 15.630086
## amm oxy bdo
## 30.353271 7.623932 18.098637
Because we have an explanatory matrix, the matrix we use in PCA analysis becomes triplot.
par(mfrow = c(1, 2))
# Scaling = 1
plot(spe.rda, scaling = 1,
main = "scaling 1")
spe.sc <- scores(spe.rda, choices = 1:2, scaling = 1, display = "sp")
arrows(0, 0, spe.sc[, 1], spe.sc[, 2], length = 0, lty = 1, col ="red")
# Scaling = 2
plot(spe.rda, main = "scaling 2")
spe2.sc <- scores(spe.rda, choices = 1:2, display = "sp")
arrows(0, 0, spe2.sc[, 1], spe2.sc[, 2], length = 0, lty = 1, col = "red")
We can see painted in blue the arrows of the explanatory matrix. The longer they are the more variance they explain.
We will use different criteria to choose this:
First up, we are going to create a model with every explanatory variable and see how it behaves. Then we will use the step function to see how many variables helps us to create the fittest model.
vif.cca(spe.rda)
## alt slo flo pH har pho nit
## 17.475490 4.221651 6.021996 1.471728 2.994588 25.379501 15.630086
## amm oxy bdo
## 30.353271 7.623932 18.098637
spe.rda.all <- rda(spe.hel ~ ., data=env)
(R2a.all <- RsquareAdj(spe.rda.all)$adj.r.squared)
## [1] 0.5577387
(step.forward <- ordistep(rda(spe.hel ~ 1, data = env), trace = 0,
scope = formula(spe.rda.all), direction = "forward", pstep = 1000))
## Call: rda(formula = spe.hel ~ dfs + oxy + bdo + alt, data = env)
##
## Inertia Proportion Rank
## Total 0.5025 1.0000
## Constrained 0.3199 0.6365 4
## Unconstrained 0.1827 0.3635 24
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3 RDA4
## 0.22016 0.05089 0.02880 0.02002
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.05290 0.03130 0.02457 0.01695 0.01375 0.01193 0.00736 0.00575
## (Showed only 8 of all 24 unconstrained eigenvalues)
The suggested model will be: \(dfs + oxy + bdo + alt + har\)
(step.forward <- ordiR2step(rda(spe.hel ~ 1, data = env), trace = 0,
scope = formula(spe.rda.all), direction = "forward", pstep = 1000))
## Call: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env)
##
## Inertia Proportion Rank
## Total 0.5025 1.0000
## Constrained 0.2947 0.5865 3
## Unconstrained 0.2078 0.4135 25
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 0.21874 0.05015 0.02581
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.05422 0.04398 0.02639 0.01803 0.01486 0.01298 0.00971 0.00633
## (Showed only 8 of all 25 unconstrained eigenvalues)
This time the suggested model is \(dfs + oxy + bdo\)
RsquareAdj(rda(spe.hel ~ dfs, data = env))$adj.r.squared
## [1] 0.3676474
RsquareAdj(rda(spe.hel ~ dfs + oxy, data = env))$adj.r.squared
## [1] 0.4706162
RsquareAdj(rda(spe.hel ~ dfs + oxy + bdo, data = env))$adj.r.squared
## [1] 0.5368359
The result is self-explanatory. The best model is \(dfs + oxy + bdo\).
(spe.rda.pars <- rda(spe.hel ~ dfs + oxy + bdo, data = env))
## Call: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env)
##
## Inertia Proportion Rank
## Total 0.5025 1.0000
## Constrained 0.2947 0.5865 3
## Unconstrained 0.2078 0.4135 25
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 0.21874 0.05015 0.02581
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
## 0.05422 0.04398 0.02639 0.01803 0.01486 0.01298 0.00971 0.00633
## (Showed only 8 of all 25 unconstrained eigenvalues)
anova(spe.rda.pars, step = 1000)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env)
## Df Variance F Pr(>F)
## Model 3 0.29470 11.818 0.001 ***
## Residual 25 0.20781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(spe.rda.pars, step = 1000, by = "axis")
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env)
## Df Variance F Pr(>F)
## RDA1 1 0.218736 26.3147 0.001 ***
## RDA2 1 0.050153 6.0336 0.001 ***
## RDA3 1 0.025813 3.1054 0.004 **
## Residual 25 0.207808
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(spe.rda.pars, step = 1000, by = "margin")
## Permutation test for rda under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env)
## Df Variance F Pr(>F)
## dfs 1 0.097402 11.7178 0.001 ***
## oxy 1 0.045476 5.4710 0.004 **
## bdo 1 0.039212 4.7173 0.006 **
## Residual 25 0.207808
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(R2a.pars <- RsquareAdj(spe.rda.pars)$adj.r.squared)
## [1] 0.5368359
All the variables are statiscally significant (p < 0.005).
vif.cca(spe.rda.all)
## dfs alt slo flo pH har
## 135.817134 51.714483 5.324914 45.740857 1.807935 4.400607
## pho nit amm oxy bdo
## 27.453045 16.614554 30.685220 17.493662 18.210558
vif.cca(spe.rda.pars)
## dfs oxy bdo
## 1.496925 4.116333 3.426102
The most parsimonious model is a model with just three explanatory variables: \(dfs + oxy + bdo\).
And now let’s finish this long analysis with a triplot with the new formula.
par(mfrow = c(1, 2))
# Scaling 1
plot(spe.rda.pars, scaling = 1, display = c("sp", "lc", "cn"),
main = "Scaling 1")
spe4.sc <- scores(spe.rda.pars, choices = 1:2, scaling = 1, display = "sp")
arrows(0, 0, spe4.sc[, 1], spe4.sc[, 2], length = 0, lty = 1, col = "red")
# Scaling 2
plot(spe.rda.pars, display = c("sp", "lc", "cn"),
main = "Scaling 2")
spe5.sc <- scores(spe.rda.pars, choices = 1:2, display = "sp")
arrows(0, 0, spe5.sc[,1], spe5.sc[,2], length = 0, lty = 1, col = "red")
Compared to the previous triplot, this time we only have three blue arrows (namely, explanatory variables), whith a low intercorrelation between them.