library(vegan)
library(vcfR)
library(SoDA)
library(adespatial)
library(ggplot2)
library(kableExtra)
library(reshape2)
Transform the vcf file to plink format (see the lecture 6) using your terminal.
Calculate allele frequencies from the .tped and .tfam file using your terminal. Yet, you have to change the first column of the .tfam and add the sampling site information (the groups from which you want the program to calculate the allele frequencies).
plink --tped filtered_2719neutral_snps.recode.tped --tfam filtered_2719neutral_snps.recode.tfam --freq --within cluster-plink-sea-cumcumber.txt --out filtered_2719neutral_snps.recode
Download allele frequencies obtained with PLINK.
allele_frequencies <- read.table("00-Data/2719neutral_snps_noAK.frq.strat", header=T, sep="\t", dec=".")
Select only the second (the locus) and the third (the sampling site) columns.
allele_frequencies2 <- allele_frequencies[,2:3]
allele_frequencies3 <- cbind(allele_frequencies2, allele_frequencies$MAF)
colnames(allele_frequencies3)=c("SNP","POP","MAF")
| CHR | SNP | CLST | A1 | A2 | MAF | MAC | NCHROBS |
|---|---|---|---|---|---|---|---|
| 1 | 54300_20 | 1 | C | A | 0.05357 | 3 | 56 |
| 1 | 54300_20 | 2 | C | A | 0.02000 | 1 | 50 |
| 1 | 54300_20 | 3 | C | A | 0.04348 | 2 | 46 |
| 1 | 54300_20 | 4 | C | A | 0.00000 | 0 | 58 |
| 1 | 54300_20 | 5 | C | A | 0.00000 | 0 | 34 |
Convert from long to wide.
allele_frequencies4 <- dcast(allele_frequencies3, POP~SNP)
write.table(allele_frequencies4, "2719neutralsnps-freq-sites.txt", quote=F, sep="\t")
Import allele frequencies obtained from PLINK.
geno <- read.table("2719neutralsnps-freq-sites.txt", header=TRUE)
| POP | X10_7 | X10013_19 | X10018_31 | X10020_41 |
|---|---|---|---|---|
| 1 | 0.01724 | 0.00000 | 0.00000 | 0.01613 |
| 2 | 0.01667 | 0.01613 | 0.00000 | 0.00000 |
| 3 | 0.00000 | 0.00000 | 0.00000 | 0.01786 |
| 4 | 0.03333 | 0.00000 | 0.03448 | 0.01667 |
| 5 | 0.00000 | 0.00000 | 0.00000 | 0.00000 |
| 6 | 0.02083 | 0.00000 | 0.01852 | 0.00000 |
Remove sampling site information.
geno <- geno[,-1]
Remember RDA is a two-steps process : 1) ordination and then 2) regression.
Here we perform the first step, the ordination of allele frequencies by sampling site¨. However, PCA requires that the variables be linearly related to each other and at approximately the same scale, which is not necessarily the case for our allele frequencies, which may subsequently lead to poor results. We must therefore apply a Hellinger transformation before the analyses. Indeed, Hellinger distance has good statistical properties, as indicated by the R2 and monotonicity criteria used by Legendre and Gallagher (2001) in their comparison of transformation methods. The Hellinger transformation is available in the decostand() function of VEGAN (method = “hellinger”). Here, we perform a Hellinger transformation on the dataset.
geno.h <- decostand(geno, "hellinger")
Next, we can perform a Principal Component Analysis (PCA) of these transformed allele frequencies. We will use the function prcomp to do the PCA. It is highly recommended to set the argument scale=TRUE. This standardize the input data so that it has zero mean and variance one before doing PCA.
geno.pca <- prcomp(geno.h, scale = T)
summary(geno.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 18.5874 13.80686 13.37751 12.00739 11.89525 11.76301
## Proportion of Variance 0.1271 0.07011 0.06582 0.05303 0.05204 0.05089
## Cumulative Proportion 0.1271 0.19718 0.26299 0.31602 0.36806 0.41895
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 11.6947 11.55599 11.4839 11.30375 11.21593 11.18517
## Proportion of Variance 0.0503 0.04911 0.0485 0.04699 0.04627 0.04601
## Cumulative Proportion 0.4693 0.51836 0.5669 0.61386 0.66013 0.70614
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 10.9381 10.89881 10.78954 10.77625 10.55668 10.49443
## Proportion of Variance 0.0440 0.04369 0.04282 0.04271 0.04099 0.04051
## Cumulative Proportion 0.7501 0.79383 0.83664 0.87935 0.92034 0.96084
## PC19 PC20
## Standard deviation 10.31829 1.641e-14
## Proportion of Variance 0.03916 0.000e+00
## Cumulative Proportion 1.00000 1.000e+00
Q1. How many PC components do you obtain? Q2. What percentage of variation does the PC5 component account for?
Visualize the genomic variation explained by each axis according to the broken stick model. The broken-stick model is one of the many techniques that have been proposed for choosing the number of principal components to retain during a principal component analysis.
The PCA seeks for a restricted dimension to maximise the inertia of the cloud when it is projected into the q-dimensional subspace generated by these factors. Then the inertia represents the sum of the variances of the variables under consideration.
The broken-stick model will then retain PC components that explain more variance than would be expected by randomly dividing the variance into p parts (line red over the black line). The observed proportions that are higher than the expected proportions indicate which principal components to keep.
screeplot(geno.pca, npcs = 20, type = "lines", bstick = T)
You could also choose to select PC components based on the Kaiser - Guttman rule. In a standard PCA, the $sdev gives the fraction of information (sdev=“standard deviation”) contained in each of the PC components. In this rule, we consider an axis to be interesting if its eigenvalue is greater the mean eigenvalue.
ev <- geno.pca$sdev^2
ev
## [1] 3.454916e+02 1.906293e+02 1.789577e+02 1.441774e+02 1.414971e+02
## [6] 1.383685e+02 1.367665e+02 1.335409e+02 1.318802e+02 1.277748e+02
## [11] 1.257970e+02 1.251081e+02 1.196416e+02 1.187840e+02 1.164142e+02
## [16] 1.161275e+02 1.114436e+02 1.101331e+02 1.064671e+02 2.693243e-28
ev[ev > mean(ev)]
## [1] 345.4916 190.6293 178.9577 144.1774 141.4971 138.3685 136.7665
Q3. How many PC components do you keep using the broken stick model and the Kaiser-Guttman? What percentage of the genomic variation is explained by these PC components cumulatively?
Select the adequate PC components.
geno.pca.axes <- geno.pca$x[,1:7] # select 7 PC axes
head(geno.pca.axes)
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## 1 -21.348747 5.770015 -3.539039 2.840527 -3.342012 4.602176 -2.1080942
## 2 -17.897240 -4.432905 7.130377 -5.159484 -3.041278 1.886322 -4.7494497
## 3 -23.854185 5.242946 2.724128 -4.202315 -15.571076 11.106313 24.4757440
## 4 -17.666731 8.564480 3.586660 -4.598861 5.001930 20.858805 -0.9250808
## 5 -9.930354 -54.783111 -14.774534 -1.066448 1.895834 -1.729385 0.6257380
## 6 -16.994368 3.991230 6.578290 3.168137 -24.916286 -22.502131 3.8859886
Download population map information.
pop <- read.table("00-Data/population_map_sea_cucumber.txt", header=TRUE)
Import sampling site information.
sites <- read.table("00-Data/24_sites.txt", header=TRUE)
sites <- sites[-c(21:24), ]
geo <- sites[, c(5:6)]
geo
## Lat Long
## 1 48.40822 -123.3875
## 2 48.75678 -123.3799
## 3 49.47478 -124.1826
## 4 49.75303 -124.0011
## 5 49.24820 -125.9384
## 6 50.52120 -126.5646
## 7 50.33027 -125.4666
## 8 50.65893 -126.2377
## 9 50.62737 -127.1282
## 10 50.49977 -127.8717
## 11 50.89923 -127.8512
## 12 51.27213 -127.8038
## 13 51.69000 -128.1425
## 14 52.71327 -128.5782
## 15 54.19700 -130.3660
## 16 54.68200 -130.4640
## 17 52.63200 -131.3960
## 18 52.93882 -131.9049
## 19 53.39900 -132.6600
## 20 54.10000 -132.5531
Transform spatial coordinates into cartesian coordinates (order is lat,long).
geo$CAR <- geoXY(geo$Lat, geo$Long)
geo
## Lat Long CAR.X CAR.Y
## 1 48.40822 -123.3875 686068.821 0.000
## 2 48.75678 -123.3799 686629.218 38761.185
## 3 49.47478 -124.1826 627300.301 118611.385
## 4 49.75303 -124.0011 640717.799 149558.799
## 5 49.24820 -125.9384 497472.140 93411.527
## 6 50.52120 -126.5646 451150.728 235003.286
## 7 50.33027 -125.4666 532366.901 213764.386
## 8 50.65893 -126.2377 475333.949 250324.799
## 9 50.62737 -127.1282 409455.077 246813.279
## 10 50.49977 -127.8717 354440.312 232619.066
## 11 50.89923 -127.8512 355963.444 277056.743
## 12 51.27213 -127.8038 359464.780 318541.836
## 13 51.69000 -128.1425 334403.422 365032.626
## 14 52.71327 -128.5782 302159.207 478892.709
## 15 54.19700 -130.3660 169829.324 644024.517
## 16 54.68200 -130.4640 162574.699 698011.404
## 17 52.63200 -131.3960 93578.837 469849.346
## 18 52.93882 -131.9049 55900.933 503992.588
## 19 53.39900 -132.6600 0.000 555206.136
## 20 54.10000 -132.5531 7918.014 633227.663
Create euclidian distance matrix.
euclidian_distances <- dist(geo$CAR, method="euclidean")
Perform dbMEM on euclidian distances.
dbMEM <- dbmem(euclidian_distances, MEM.autocor = "non-null", store.listw = TRUE)
names(dbMEM)
## [1] "MEM1" "MEM2" "MEM3" "MEM4" "MEM5" "MEM6" "MEM7" "MEM8" "MEM9"
## [10] "MEM10" "MEM11" "MEM12" "MEM13" "MEM14" "MEM15" "MEM16" "MEM17" "MEM18"
## [19] "MEM19"
Transform into a dataframe.
dbMEM.vectors <- as.data.frame(dbMEM)
| MEM1 | MEM2 | MEM3 |
|---|---|---|
| -0.2809907 | -1.5475630 | 1.3851299 |
| -0.1807063 | -1.7834844 | 0.9363306 |
| 0.1403731 | -1.8547212 | 0.1973885 |
| 0.3023279 | -1.7696849 | -0.0642742 |
| 0.6675319 | -1.1053684 | -1.1333367 |
| 1.0894456 | 0.3807982 | -0.6478060 |
| 0.9834678 | -0.2996364 | -1.1767929 |
| 1.1070556 | -0.0047903 | -0.5705352 |
| 1.0467669 | 0.7545681 | -0.5966470 |
| 1.0381771 | 0.7593626 | -0.5714938 |
| 0.9405340 | 0.9906100 | -0.1075976 |
| 0.8929247 | 1.1317322 | 0.9930829 |
| 0.7318539 | 1.1869507 | 1.5022544 |
| -0.2855643 | 0.6809658 | 2.6978790 |
| -1.5231833 | 0.4506898 | -0.8071367 |
| -0.8861806 | 0.1595813 | -0.5291624 |
| -1.4390876 | 0.5613184 | 0.4886097 |
| -1.4065878 | 0.4270411 | -0.5866334 |
| -1.4052371 | 0.4259335 | -0.5949010 |
| -1.5329207 | 0.4556967 | -0.8143582 |
Create a model with intercept only and a full model.
dbmem.mod0 <- rda(geno.pca.axes ~1, dbMEM.vectors)
dbmem.mod1 <- rda(geno.pca.axes ~., dbMEM.vectors)
Now, we need to select the best model and retain only significant explanatory and we do not want to do this manually. Therefore, we use the function ordiR2step that builds model so that it maximizes adjusted R2 at every step and stopping when the adjusted R2 starts to decrease. The function ordistep can do forward, backward and stepwise model selection using permutation tests. Perform ordistep in both directions.
dbmem.all.ord <- ordistep(dbmem.mod0, scope = formula(dbmem.mod1), direction = "both", permutations = 10000)
##
## Start: geno.pca.axes ~ 1
##
## Df AIC F Pr(>F)
## + MEM1 1 142.83 3.0982 0.0029 **
## + MEM2 1 143.48 2.4172 0.0119 *
## + MEM4 1 144.25 1.6457 0.1592
## + MEM5 1 144.53 1.3756 0.2087
## + MEM17 1 144.67 1.2385 0.2683
## + MEM14 1 144.79 1.1243 0.3532
## + MEM19 1 144.87 1.0461 0.4142
## + MEM6 1 144.88 1.0426 0.4243
## + MEM15 1 144.88 1.0362 0.4251
## + MEM3 1 144.97 0.9553 0.4824
## + MEM8 1 145.09 0.8422 0.5813
## + MEM18 1 145.27 0.6731 0.7364
## + MEM12 1 145.34 0.6052 0.7892
## + MEM16 1 145.31 0.6334 0.8019
## + MEM9 1 145.54 0.4201 0.8431
## + MEM10 1 145.45 0.5008 0.8565
## + MEM7 1 145.61 0.3539 0.9053
## + MEM13 1 145.64 0.3270 0.9350
## + MEM11 1 145.84 0.1463 0.9812
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ MEM1
##
## Df AIC F Pr(>F)
## - MEM1 1 144 3.0982 0.002 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + MEM2 1 141.84 2.7392 0.0005 ***
## + MEM4 1 142.76 1.8509 0.1019
## + MEM5 1 143.09 1.5431 0.1276
## + MEM17 1 143.26 1.3875 0.1823
## + MEM14 1 143.40 1.2581 0.2628
## + MEM19 1 143.50 1.1697 0.3345
## + MEM15 1 143.51 1.1586 0.3371
## + MEM6 1 143.50 1.1658 0.3470
## + MEM3 1 143.61 1.0673 0.4039
## + MEM8 1 143.75 0.9398 0.5086
## + MEM18 1 143.96 0.7500 0.6899
## + MEM12 1 144.05 0.6739 0.7446
## + MEM16 1 144.01 0.7054 0.7477
## + MEM9 1 144.28 0.4669 0.8218
## + MEM10 1 144.18 0.5570 0.8261
## + MEM7 1 144.37 0.3931 0.8954
## + MEM13 1 144.40 0.3631 0.9222
## + MEM11 1 144.64 0.1622 0.9796
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ MEM1 + MEM2
##
## Df AIC F Pr(>F)
## - MEM2 1 142.83 2.7392 0.005399 **
## - MEM1 1 143.48 3.3976 0.001500 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Df AIC F Pr(>F)
## + MEM5 1 141.81 1.7114 0.09279 .
## + MEM4 1 141.42 2.0588 0.09419 .
## + MEM17 1 142.00 1.5365 0.13259
## + MEM14 1 142.17 1.3915 0.20718
## + MEM19 1 142.28 1.2926 0.25617
## + MEM15 1 142.30 1.2802 0.26577
## + MEM6 1 142.29 1.2883 0.26847
## + MEM3 1 142.42 1.1782 0.32277
## + MEM8 1 142.58 1.0363 0.43396
## + MEM18 1 142.83 0.8255 0.59864
## + MEM12 1 142.93 0.7412 0.66673
## + MEM16 1 142.89 0.7761 0.67283
## + MEM9 1 143.21 0.5125 0.75222
## + MEM10 1 143.09 0.6120 0.76732
## + MEM7 1 143.31 0.4312 0.84682
## + MEM13 1 143.35 0.3982 0.88721
## + MEM11 1 143.62 0.1775 0.96610
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the explanatory variables to keep.
dbmem.all.ord$anova
## Df AIC F Pr(>F)
## + MEM1 1 142.83 3.0982 0.0029 **
## + MEM2 1 141.84 2.7392 0.0005 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
*** Q4. Which dbMEM variable is selected by the ordistep function?***
Select the db-MEM vectors kept by ordistep function.
dbmem.sub <- dbMEM.vectors[,c(1,2)]
Perform a RDA using the db-MEM vectors kept by ordistep as explanatory variables.
dbmem.mod.sel <- rda(geno.pca.axes ~., as.data.frame(dbmem.sub))
summary(dbmem.mod.sel)
##
## Call:
## rda(formula = geno.pca.axes ~ MEM1 + MEM2, data = as.data.frame(dbmem.sub))
##
## Partitioning of variance:
## Inertia Proportion
## Total 1275.9 1.0000
## Constrained 338.4 0.2652
## Unconstrained 937.5 0.7348
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## RDA1 RDA2 PC1 PC2 PC3 PC4
## Eigenvalue 299.876 38.5373 198.8345 177.4528 144.4179 142.3079
## Proportion Explained 0.235 0.0302 0.1558 0.1391 0.1132 0.1115
## Cumulative Proportion 0.235 0.2652 0.4211 0.5602 0.6733 0.7849
## PC5 PC6 PC7
## Eigenvalue 138.1969 109.27259 26.99205
## Proportion Explained 0.1083 0.08564 0.02116
## Cumulative Proportion 0.8932 0.97884 1.00000
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2
## Eigenvalue 299.8761 38.5373
## Proportion Explained 0.8861 0.1139
## Cumulative Proportion 0.8861 1.0000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 12.4779
##
##
## Species scores
##
## RDA1 RDA2 PC1 PC2 PC3 PC4
## PC1 5.91141 0.2134 -1.710377 0.3903 0.4521 -0.5569
## PC2 0.98140 -0.4210 4.446254 1.3393 0.2797 -0.3764
## PC3 -0.37326 -0.7341 -1.206774 4.3380 -0.2653 0.2207
## PC4 0.20974 -0.4854 0.060843 -0.2893 -3.6757 -1.8019
## PC5 -0.05799 -0.9383 -0.227545 -0.3676 1.8278 -2.8822
## PC6 -0.38780 1.4332 -0.001505 0.6584 -0.1505 -1.4029
## PC7 -0.59093 0.8782 -0.240104 0.4885 0.6293 -1.8285
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 PC1 PC2 PC3 PC4
## 1 -3.2510 0.44974 2.53806 -0.1640 -1.3445 0.35869
## 2 -3.0508 -0.70096 -0.22020 1.4324 0.1075 1.79903
## 3 -4.2529 9.51891 1.82669 2.1966 -0.2604 -0.22963
## 4 -2.9012 3.53269 1.94690 2.0383 1.1994 -2.00501
## 5 -2.9235 6.08130 -9.29237 -6.2663 -0.1209 0.73792
## 6 -2.4720 -3.64297 1.31018 0.9139 -3.1256 5.51733
## 7 -2.7229 -0.05948 0.08308 1.1413 2.4859 -0.51767
## 8 -1.3623 -12.82619 1.97878 -1.7813 2.0174 -0.11193
## 9 -1.8435 -0.78193 1.71143 -0.1226 -0.5093 -2.32717
## 10 -0.3534 0.33515 -0.69617 -0.6250 1.7149 -1.26172
## 11 -1.9443 -6.61197 2.73450 -1.8567 -1.0809 -0.35680
## 12 -0.6098 -7.02383 1.04132 -0.3932 -3.2591 -0.92244
## 13 2.4432 -1.41247 -3.01571 3.9750 5.4585 -0.64320
## 14 3.6858 4.65300 -2.32083 2.9619 -4.9740 4.56193
## 15 3.5835 -0.07346 -0.97233 0.6190 -0.1357 0.05609
## 16 3.6809 3.51254 -0.65901 -1.1174 -4.8540 -8.53102
## 17 3.5819 0.55056 -1.20794 2.3392 2.8339 -1.84439
## 18 3.5993 -2.00564 -1.08424 2.5118 2.9710 1.45268
## 19 3.9532 11.08076 4.42785 -7.6181 3.7540 2.57232
## 20 3.1597 -4.57576 -0.12997 -0.1846 -2.8780 1.69499
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 PC1 PC2 PC3 PC4
## 1 -2.2416 3.773 2.53806 -0.1640 -1.3445 0.35869
## 2 -2.8846 4.086 -0.22020 1.4324 0.1075 1.79903
## 3 -3.6911 3.648 1.82669 2.1966 -0.2604 -0.22963
## 4 -3.8764 3.173 1.94690 2.0383 1.1994 -2.00501
## 5 -3.4292 1.105 -9.29237 -6.2663 -0.1209 0.73792
## 6 -1.5967 -2.796 1.31018 0.9139 -3.1256 5.51733
## 7 -2.6193 -1.170 0.08308 1.1413 2.4859 -0.51767
## 8 -2.3397 -2.017 1.97878 -1.7813 2.0174 -0.11193
## 9 -0.8226 -3.505 1.71143 -0.1226 -0.5093 -2.32717
## 10 -0.7957 -3.500 -0.69617 -0.6250 1.7149 -1.26172
## 11 -0.1668 -3.808 2.73450 -1.8567 -1.0809 -0.35680
## 12 0.1918 -4.018 1.04132 -0.3932 -3.2591 -0.92244
## 13 0.6321 -3.839 -3.01571 3.9750 5.4585 -0.64320
## 14 1.8479 -0.911 -2.32083 2.9619 -4.9740 4.56193
## 15 4.0322 1.840 -0.97233 0.6190 -0.1357 0.05609
## 16 2.1580 1.286 -0.65901 -1.1174 -4.8540 -8.53102
## 17 4.0577 1.453 -1.20794 2.3392 2.8339 -1.84439
## 18 3.7434 1.676 -1.08424 2.5118 2.9710 1.45268
## 19 3.7385 1.676 4.42785 -7.6181 3.7540 2.57232
## 20 4.0619 1.847 -0.12997 -0.1846 -2.8780 1.69499
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 PC1 PC2 PC3 PC4
## MEM1 -0.7546 -0.6561 0 0 0 0
## MEM2 0.6561 -0.7546 0 0 0 0
Check the adjusted R2 value.
RsquareAdj(dbmem.mod.sel)
Q5. How much percent of the genomic variation is explained by this model?
Check the significance for the global model.
anova(dbmem.mod.sel, permutations = 999)
Check the significance for each variable.
anova(dbmem.mod.sel, by = "margin", permutations = 999)
Check the significance for each axis.
anova(dbmem.mod.sel, by = "axis", permutations = 999)
Import results from AEM vectors.
AEM.vectors.wt <- read.table("00-Data/AEM_weight_all_vectors.txt", header=TRUE)
Create a model with intercept only and a full model.
aem.mod0 <- rda(geno.pca.axes ~1, AEM.vectors.wt)
aem.mod1 <- rda(geno.pca.axes ~., AEM.vectors.wt)
Perform ordistep in both directions.
aem.sel.ord <- ordistep(aem.mod0, scope = formula(aem.mod1), direction = "forward", permutations = 999)
##
## Start: geno.pca.axes ~ 1
##
## Df AIC F Pr(>F)
## + V1 1 140.60 5.5867 0.001 ***
## + V2 1 143.92 1.9706 0.042 *
## + V19 1 143.81 2.0886 0.073 .
## + V6 1 144.52 1.3847 0.207
## + V10 1 144.71 1.1972 0.316
## + V4 1 144.77 1.1422 0.383
## + V3 1 145.00 0.9205 0.537
## + V14 1 145.02 0.9021 0.555
## + V17 1 145.05 0.8735 0.558
## + V12 1 145.23 0.7116 0.641
## + V8 1 145.33 0.6192 0.705
## + V5 1 145.29 0.6554 0.708
## + V16 1 145.53 0.4263 0.811
## + V18 1 145.75 0.2268 0.904
## + V9 1 145.64 0.3269 0.908
## + V15 1 145.68 0.2936 0.910
## + V11 1 145.61 0.3589 0.915
## + V7 1 145.62 0.3460 0.942
## + V13 1 145.80 0.1833 0.970
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ V1
##
## Df AIC F Pr(>F)
## + V2 1 139.83 2.5246 0.002 **
## + V19 1 139.67 2.6813 0.049 *
## + V6 1 140.63 1.7556 0.126
## + V10 1 140.89 1.5128 0.196
## + V4 1 140.97 1.4420 0.242
## + V3 1 141.28 1.1576 0.358
## + V17 1 141.34 1.0976 0.366
## + V14 1 141.30 1.1340 0.379
## + V12 1 141.57 0.8915 0.542
## + V8 1 141.71 0.7746 0.556
## + V5 1 141.65 0.8204 0.572
## + V16 1 141.98 0.5315 0.684
## + V9 1 142.12 0.4069 0.759
## + V18 1 142.27 0.2818 0.853
## + V11 1 142.08 0.4469 0.859
## + V15 1 142.17 0.3652 0.859
## + V7 1 142.09 0.4308 0.903
## + V13 1 142.33 0.2276 0.944
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ V1 + V2
##
## Df AIC F Pr(>F)
## + V19 1 138.42 2.9679 0.040 *
## + V6 1 139.55 1.9272 0.094 .
## + V10 1 139.86 1.6572 0.163
## + V4 1 139.94 1.5786 0.191
## + V3 1 140.31 1.2641 0.293
## + V14 1 140.34 1.2381 0.305
## + V17 1 140.38 1.1979 0.359
## + V12 1 140.65 0.9713 0.462
## + V5 1 140.74 0.8932 0.504
## + V8 1 140.80 0.8430 0.566
## + V16 1 141.12 0.5772 0.687
## + V9 1 141.28 0.4414 0.760
## + V11 1 141.23 0.4849 0.826
## + V18 1 141.45 0.3054 0.838
## + V15 1 141.34 0.3960 0.842
## + V7 1 141.25 0.4675 0.867
## + V13 1 141.52 0.2465 0.946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ V1 + V2 + V19
##
## Df AIC F Pr(>F)
## + V6 1 137.70 2.1909 0.047 *
## + V4 1 138.17 1.7872 0.122
## + V10 1 138.06 1.8779 0.125
## + V3 1 138.61 1.4258 0.233
## + V14 1 138.64 1.3960 0.250
## + V17 1 138.70 1.3501 0.271
## + V12 1 139.02 1.0918 0.386
## + V5 1 139.13 1.0030 0.426
## + V8 1 139.20 0.9462 0.510
## + V16 1 139.58 0.6458 0.729
## + V9 1 139.78 0.4931 0.752
## + V11 1 139.71 0.5420 0.777
## + V7 1 139.74 0.5224 0.827
## + V15 1 139.84 0.4421 0.856
## + V18 1 139.97 0.3406 0.874
## + V13 1 140.06 0.2748 0.938
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Step: geno.pca.axes ~ V1 + V2 + V19 + V6
##
## Df AIC F Pr(>F)
## + V10 1 136.97 2.0461 0.064 .
## + V4 1 137.09 1.9455 0.078 .
## + V3 1 137.60 1.5466 0.168
## + V17 1 137.71 1.4634 0.189
## + V14 1 137.64 1.5138 0.190
## + V12 1 138.08 1.1804 0.371
## + V5 1 138.21 1.0835 0.386
## + V8 1 138.29 1.0215 0.443
## + V16 1 138.73 0.6952 0.695
## + V11 1 138.88 0.5828 0.730
## + V9 1 138.95 0.5300 0.774
## + V7 1 138.91 0.5616 0.793
## + V15 1 139.03 0.4750 0.823
## + V18 1 139.18 0.3655 0.863
## + V13 1 139.28 0.2947 0.936
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the explanatory variables to keep.
aem.sel.ord$anova
## Df AIC F Pr(>F)
## + V1 1 140.60 5.5867 0.001 ***
## + V2 1 139.83 2.5246 0.002 **
## + V19 1 138.42 2.9679 0.040 *
## + V6 1 137.70 2.1909 0.047 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Select the AEM vectors to keep for the analysis.
aem.sub <- AEM.vectors.wt[,c(1,2,19,6)]
Build the RDA model with these AEM vectors.
aem.mod.sel <- rda(geno.pca.axes ~., as.data.frame(aem.sub))
summary(aem.mod.sel)
##
## Call:
## rda(formula = geno.pca.axes ~ V1 + V2 + V19 + V6, data = as.data.frame(aem.sub))
##
## Partitioning of variance:
## Inertia Proportion
## Total 1275.9 1.0000
## Constrained 651.9 0.5109
## Unconstrained 624.0 0.4891
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## Eigenvalue 330.6827 134.6448 112.76356 73.80353 156.5617 143.8702
## Proportion Explained 0.2592 0.1055 0.08838 0.05784 0.1227 0.1128
## Cumulative Proportion 0.2592 0.3647 0.45309 0.51093 0.6336 0.7464
## PC3 PC4 PC5 PC6 PC7
## Eigenvalue 140.5300 98.14303 53.63766 22.73396 8.516945
## Proportion Explained 0.1101 0.07692 0.04204 0.01782 0.006675
## Cumulative Proportion 0.8565 0.93347 0.97551 0.99332 1.000000
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2 RDA3 RDA4
## Eigenvalue 330.6827 134.6448 112.7636 73.8035
## Proportion Explained 0.5073 0.2065 0.1730 0.1132
## Cumulative Proportion 0.5073 0.7138 0.8868 1.0000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 12.4779
##
##
## Species scores
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## PC1 6.32898 0.1425 -0.030990 -0.09037 0.1530 -0.4266
## PC2 0.15342 -3.0980 -1.246795 0.81830 2.8228 0.2601
## PC3 -0.14429 -0.3026 -2.151961 -2.40223 -0.6291 0.6538
## PC4 0.09312 -0.7104 1.831114 -0.76155 0.5009 3.1608
## PC5 -0.32698 0.4636 1.269875 -1.14264 2.5553 -1.9847
## PC6 -0.31356 -0.6815 -0.004976 0.13618 -0.9245 -1.2425
## PC7 0.19816 -2.3531 1.615142 -0.80852 -1.7558 -1.1857
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## 1 -3.3262 -1.3140 -0.10079 2.12509 0.51723 0.950824
## 2 -2.8746 1.2885 -2.19850 -0.84803 -1.13510 0.006533
## 3 -3.5807 -5.4691 0.02532 0.29066 -5.78797 -1.305333
## 4 -2.9688 -2.2268 -1.56360 0.15149 0.83680 -2.315591
## 5 -1.7142 10.5971 7.40366 -1.16979 -5.90652 -1.267982
## 6 -2.2706 -1.4842 -2.76562 0.98271 -2.17651 5.239568
## 7 -2.4927 -0.6255 -0.88120 -1.73271 -0.37229 -2.432215
## 8 -1.8363 0.7381 0.68003 0.22571 5.38816 0.370451
## 9 -2.0061 -1.6495 0.86505 -0.35231 1.84240 -0.178256
## 10 -0.5018 1.8074 -0.26903 0.05857 1.30936 -2.378312
## 11 -2.3229 -0.6520 1.16687 1.38827 3.50236 1.854494
## 12 -1.1672 1.3899 -0.27418 -0.75298 3.68195 3.279699
## 13 2.9221 -1.1673 -2.39882 -6.82601 -1.26966 -4.205178
## 14 3.6124 1.6019 -5.34250 0.46396 -3.60823 4.961012
## 15 3.5645 0.1294 0.02375 -1.22273 -0.95382 0.556597
## 16 3.8477 -4.9607 9.34090 -3.81536 -0.08934 2.482316
## 17 3.4722 0.4085 -1.67918 -3.27246 1.30948 -3.556072
## 18 3.4045 1.6121 -4.14164 -1.89888 0.71077 -2.078501
## 19 3.3407 -1.0447 2.30568 16.31944 1.20754 -3.666111
## 20 2.8979 1.0210 -0.19620 -0.11464 0.99339 3.682056
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## 1 -2.526 0.08145 0.45257 -0.03053 0.51723 0.950824
## 2 -2.521 -0.01740 0.34092 0.02806 -1.13510 0.006533
## 3 -3.427 -7.84713 -3.26690 0.69910 -5.78797 -1.305333
## 4 -2.447 0.61331 0.62749 -0.02328 0.83680 -2.315591
## 5 -1.590 8.02153 4.04186 -0.65863 -5.90652 -1.267982
## 6 -2.200 0.24532 0.03935 0.10196 -2.17651 5.239568
## 7 -2.544 -0.22557 0.24098 0.04859 -0.37229 -2.432215
## 8 -2.112 0.32538 -0.13620 0.21093 5.38816 0.370451
## 9 -2.381 0.23427 0.12984 0.21377 1.84240 -0.178256
## 10 -2.366 0.24449 0.10611 0.22618 1.30936 -2.378312
## 11 -1.529 0.36867 -0.44505 -0.15677 3.50236 1.854494
## 12 -1.347 0.36821 -0.51578 -0.29599 3.68195 3.279699
## 13 3.029 -0.45627 -0.74937 -5.26784 -1.26966 -4.205178
## 14 3.265 1.64412 -4.37158 0.10237 -3.60823 4.961012
## 15 3.562 -0.95186 2.35749 6.38661 -0.95382 0.556597
## 16 3.713 -4.32752 8.02468 -2.35583 -0.08934 2.482316
## 17 3.032 -0.45535 -0.75145 -5.26385 1.30948 -3.556072
## 18 3.372 2.39790 -5.63743 2.24308 0.71077 -2.078501
## 19 3.419 -0.48724 1.38623 6.39482 1.20754 -3.666111
## 20 3.598 0.22369 -1.87377 -2.60275 0.99339 3.682056
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## V1 0.93553 -0.1814 0.2009 0.22693 0 0
## V2 0.33723 0.2372 -0.7067 -0.57494 0 0
## V19 -0.10429 -0.9006 -0.4148 0.07706 0 0
## V6 -0.01327 0.3157 -0.5368 0.78231 0 0
Check the adjusted R2 value.
RsquareAdj(aem.mod.sel)
## $r.squared
## [1] 0.510934
##
## $adj.r.squared
## [1] 0.3805164
Check the significance for the global model.
anova(aem.mod.sel, permutations = 999)
Check the significance for each variable.
anova(aem.mod.sel, by = "margin", permutations = 999)
Q6. What is the most significant AEM vector?
Check the significance for each axis.
anova(aem.mod.sel, by = "axis", permutations = 999)
Q7. Calculate the sum of the R2 of each significant AEM variable and indicate to what value this sum is equal. Explain why.
Combine the explanatory variables kept for the model.
full.pred <- cbind(dbmem.sub, aem.sub)
full.pred
## MEM1 MEM2 V1 V2 V19 V6
## 1 -0.2809907 -1.547562957 -0.1838173 -0.090937135 -2.430096e-06 -0.01663753
## 2 -0.1807063 -1.783484359 -0.1827644 -0.089067486 1.116618e-02 -0.01066707
## 3 0.1403731 -1.854721170 -0.1827337 -0.089013440 7.079500e-01 -0.01052445
## 4 0.3023279 -1.769684874 -0.1826879 -0.088933035 -4.482090e-02 -0.01033489
## 5 0.6675319 -1.105368444 -0.1827351 -0.089015892 -7.041188e-01 -0.01053047
## 6 1.0894456 0.380798249 -0.1660053 -0.061713643 -3.670631e-11 0.01324554
## 7 0.9834678 -0.299636397 -0.1826850 -0.088928030 2.983202e-02 -0.01032342
## 8 1.1070556 -0.004790281 -0.1614657 -0.052909412 -1.803018e-12 0.02956177
## 9 1.0467669 0.754568101 -0.1759343 -0.077095445 -6.057488e-06 0.01627533
## 10 1.0381771 0.759362636 -0.1751474 -0.075735742 1.227973e-08 0.01831731
## 11 0.9405340 0.990609984 -0.1300466 -0.001896145 -3.378655e-11 0.02027138
## 12 0.8929247 1.131732159 -0.1200337 0.013446339 3.378730e-11 0.01437954
## 13 0.7318539 1.186950691 0.1258374 0.358345776 9.926865e-16 -0.31279531
## 14 -0.2855643 0.680965800 0.1523656 0.362374835 1.262551e-15 0.23260959
## 15 -1.5231833 0.450689819 0.4350216 -0.349621391 1.141927e-15 0.27112138
## 16 -0.8861806 0.159581280 0.4276912 -0.327859169 1.176471e-15 -0.60635927
## 17 -1.4390876 0.561318394 0.1261073 0.358385179 1.196360e-15 -0.31243583
## 18 -1.4065878 0.427041147 0.1679829 0.352654917 1.256370e-15 0.44023649
## 19 -1.4052371 0.425933527 0.4020427 -0.290026633 1.286117e-15 0.32532696
## 20 -1.5329207 0.455696695 0.1890075 0.327545550 1.196044e-15 -0.08073707
Rename the columns.
colnames(full.pred) <- c("MEM1","MEM2","AEM1","AEM2","AEM19","AEM6")
Run the model using all the selected variables.
full.rda <- rda(geno.pca.axes ~., full.pred)
Check the summary.
summary(full.rda)
##
## Call:
## rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
##
## Partitioning of variance:
## Inertia Proportion
## Total 1275.9 1.0000
## Constrained 737.3 0.5778
## Unconstrained 538.6 0.4222
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 333.9872 136.0048 113.11421 80.93868 60.93670 12.278688
## Proportion Explained 0.2618 0.1066 0.08866 0.06344 0.04776 0.009624
## Cumulative Proportion 0.2618 0.3684 0.45702 0.52046 0.56822 0.577841
## PC1 PC2 PC3 PC4 PC5 PC6
## Eigenvalue 144.8671 130.2664 94.5467 93.73786 50.19313 20.30740
## Proportion Explained 0.1135 0.1021 0.0741 0.07347 0.03934 0.01592
## Cumulative Proportion 0.6914 0.7935 0.8676 0.94105 0.98039 0.99631
## PC7
## Eigenvalue 4.709185
## Proportion Explained 0.003691
## Cumulative Proportion 1.000000
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## Eigenvalue 333.987 136.0048 113.1142 80.9387 60.93670 12.27869
## Proportion Explained 0.453 0.1845 0.1534 0.1098 0.08265 0.01665
## Cumulative Proportion 0.453 0.6375 0.7909 0.9007 0.98335 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 12.4779
##
##
## Species scores
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## PC1 6.35170 0.1828 -0.02470 -0.11053 0.03627 -0.092580
## PC2 0.31955 -3.1721 -1.12455 1.45962 -0.74313 -0.001303
## PC3 -0.14815 -0.4328 -2.22159 -2.25860 -0.83003 -0.102542
## PC4 0.18281 -0.7028 1.84524 -0.22602 -1.18617 -0.126501
## PC5 -0.21745 0.4434 1.25304 -0.32942 -1.74743 -0.269721
## PC6 -0.44560 -0.5331 0.07833 -0.05214 0.84590 -1.148942
## PC7 0.09705 -2.3107 1.61916 -1.57185 1.00876 0.265471
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 -3.2937 -1.3095 -0.007111 2.2585 0.2840 -1.98437
## 2 -2.8887 1.2020 -2.266633 -1.1200 0.6678 -0.68314
## 3 -3.6852 -5.3383 0.135907 -2.9195 7.9687 0.26079
## 4 -2.9767 -2.1379 -1.394231 0.6187 0.4631 -15.83165
## 5 -1.9179 10.8106 6.951861 -4.7355 6.6882 2.85798
## 6 -2.2387 -1.7396 -2.863669 -0.4654 2.1011 22.75689
## 7 -2.5027 -0.6801 -0.915273 -1.9919 0.4729 -0.06564
## 8 -1.6480 0.4304 0.593421 2.9909 -7.9141 16.99369
## 9 -1.9575 -1.6232 0.965904 0.9599 -2.5042 -7.47468
## 10 -0.5031 1.8647 -0.241145 0.9771 -1.3624 -11.26996
## 11 -2.1929 -0.7801 1.203466 3.3338 -4.8817 6.02666
## 12 -1.0482 1.2956 -0.235065 2.5896 -7.4008 -8.28567
## 13 2.8617 -1.3246 -2.588534 -7.7935 0.6896 10.02199
## 14 3.5137 1.6544 -5.316732 -0.6019 3.6380 -9.50595
## 15 3.5447 0.1364 -0.013208 -1.3391 -0.1405 2.50193
## 16 3.8466 -4.6737 9.445549 -3.0830 -2.3622 -4.96216
## 17 3.4402 0.4453 -1.674470 -2.4019 -1.6233 -10.67473
## 18 3.3902 1.5184 -4.194277 -1.3305 -1.2434 -2.28222
## 19 3.3095 -0.6720 2.668613 13.3049 9.2031 3.67606
## 20 2.9467 0.9211 -0.254371 0.7489 -2.7439 7.92418
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## 1 -2.8780 0.56562 0.69511 0.2307 1.177256 -4.263285
## 2 -2.9663 0.43001 0.51857 -0.6246 2.486141 -3.402870
## 3 -3.7463 -7.71624 -3.00736 -0.7926 3.737836 -1.273776
## 4 -2.9195 0.85779 0.60478 -2.0588 3.846948 -0.736803
## 5 -1.9828 8.22881 3.75340 -2.3458 2.225450 0.382140
## 6 -1.9132 -0.04767 -0.06995 0.7122 -1.865286 2.059400
## 7 -2.4703 -0.39712 0.14662 -0.2702 -0.053427 1.613470
## 8 -1.9827 0.03452 -0.30937 -0.3453 -0.030168 2.751398
## 9 -1.9334 -0.04550 0.10124 2.0841 -3.735667 1.206748
## 10 -1.9183 -0.03389 0.07918 2.1090 -3.744880 1.184971
## 11 -1.0754 0.03290 -0.51401 1.5713 -3.719491 1.577922
## 12 -0.8556 0.02805 -0.57255 1.7196 -4.196113 1.394292
## 13 3.1385 -1.07136 -1.23300 -6.8746 -0.610255 4.920849
## 14 3.1636 1.31760 -4.63832 -1.9597 2.591093 3.149536
## 15 3.5948 -0.73446 2.55099 5.6645 2.805597 0.787996
## 16 3.6590 -4.11180 8.06129 -2.8755 -0.004225 -0.001447
## 17 2.9912 -0.06051 -0.47652 -2.1697 -4.446880 -6.563901
## 18 3.1980 2.51184 -5.54930 1.9307 1.971425 -1.472669
## 19 3.4280 -0.32551 1.53572 5.3765 3.196954 1.208547
## 20 3.4686 0.53694 -1.67652 -1.0821 -1.632307 -4.522520
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6
## MEM1 -0.71947 0.06795 -0.01448 -0.19319 -0.33311 0.57381
## MEM2 0.59660 0.03483 -0.13735 0.17019 -0.64475 0.42347
## AEM1 0.93069 -0.15860 0.21423 0.15547 0.19598 -0.01386
## AEM2 0.33494 0.21350 -0.72306 -0.48298 -0.25463 -0.14597
## AEM19 -0.09874 -0.90578 -0.38379 0.09359 0.07477 -0.09037
## AEM6 -0.01367 0.31151 -0.52951 0.68333 0.36441 0.15053
Obtain the adjusted R2.
RsquareAdj(full.rda)
## $r.squared
## [1] 0.5778409
##
## $adj.r.squared
## [1] 0.3829982
Check the significance for the global model.
anova(full.rda, permutations = 999)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
## Df Variance F Pr(>F)
## Model 6 737.26 2.9657 0.001 ***
## Residual 13 538.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the significance for each variable.
anova(full.rda, by = "margin", permutations = 999)
## Permutation test for rda under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
## Df Variance F Pr(>F)
## MEM1 1 33.51 0.8088 0.621
## MEM2 1 65.80 1.5881 0.104
## AEM1 1 72.84 1.7580 0.078 .
## AEM2 1 102.61 2.4765 0.009 **
## AEM19 1 133.26 3.2163 0.001 ***
## AEM6 1 78.28 1.8892 0.066 .
## Residual 13 538.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the significance for each axis.
anova(full.rda, by = "axis", permutations = 999)
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
## Df Variance F Pr(>F)
## RDA1 1 333.99 8.0609 0.001 ***
## RDA2 1 136.00 3.2825 0.062 .
## RDA3 1 113.11 2.7301 0.167
## RDA4 1 80.94 1.9535 0.428
## RDA5 1 60.94 1.4707 0.473
## RDA6 1 12.28 0.2964 0.948
## Residual 13 538.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Perform a partial RDA for ocean current corrected by dbMEM.
prda1 <- rda(geno.pca.axes ~ AEM1 + AEM2 + AEM19 + AEM6 + Condition(MEM1+MEM2), data = full.pred)
Test the significance of the model.
anova(prda1, permutations = 999)
Check the rda outputs.
summary(prda1)
Varpart partitions the variation in genomic data with respect to two, three or four explanatory variables using the adjusted R2. Here we have two explanatory variables - AEM and db-MEM.
vp1 <- varpart(geno.pca.axes, aem.sub, dbmem.sub)
vp1
##
## Partition of variance in RDA
##
## Call: varpart(Y = geno.pca.axes, X = aem.sub, dbmem.sub)
##
## Explanatory tables:
## X1: aem.sub
## X2: dbmem.sub
##
## No. of explanatory tables: 2
## Total variation (SS): 24242
## Variance: 1275.9
## No. of observations: 20
##
## Partition table:
## Df R.squared Adj.R.squared Testable
## [a+b] = X1 4 0.51093 0.38052 TRUE
## [b+c] = X2 2 0.26524 0.17879 TRUE
## [a+b+c] = X1+X2 6 0.57784 0.38300 TRUE
## Individual fractions
## [a] = X1|X2 4 0.20420 TRUE
## [b] 0 0.17631 FALSE
## [c] = X2|X1 2 0.00248 TRUE
## [d] = Residuals 0.61700 FALSE
## ---
## Use function 'rda' to test significance of fractions of interest
Represent the partitioning using a Venn diagram. Cercles X1 and X2 illustrate the same process: matrices X1 and X2 (both explanatory) each explain a portion of the variability in matrix Y (a response matrix). Each explanatory matrix uniquely explains a portion of the variation in Y (X1 explains partition “a” and X2 explains partition “c”). Additionally, both matrices are able to explain another portion of the variation in Y (partition “b”). The residual variation in matrix Y (partition “d”) is not explained by X1 or X2.
showvarparts(2, bg=2:4)
plot(vp1, bg = 2:4)
Q8. What percentage of variation is explained by db-MEM and AEM. What percentage is only explained by AEM vectors? What kind of conclusion could we drawn from this observation.
RDA ordinations may be presented as a biplot. The interpretation of this plots depends on what scaling has been chosen. In general, consider scaling = 2 as we are more interested in representing the response variables (sampling sites) than the explanatory variables (db-MEM and AEMs).
Download libraries.
library(vegan)
library(cowplot)
Getting the scores for plotting.
scrs <- scores(full.rda, display = c("species", "sites", "bp"), choices = 1:4, scaling = 2)
scrs
## $species
## RDA1 RDA2 RDA3 RDA4
## PC1 6.35169814 0.1828037 -0.02469740 -0.11052585
## PC2 0.31955251 -3.1720788 -1.12454588 1.45962298
## PC3 -0.14814782 -0.4328444 -2.22158565 -2.25860078
## PC4 0.18281483 -0.7027507 1.84523577 -0.22602062
## PC5 -0.21745248 0.4434235 1.25303557 -0.32941578
## PC6 -0.44560399 -0.5331235 0.07833302 -0.05214105
## PC7 0.09705276 -2.3106842 1.61915583 -1.57185389
##
## $sites
## RDA1 RDA2 RDA3 RDA4
## 1 -3.2937309 -1.3095410 -0.007111301 2.2585174
## 2 -2.8886984 1.2020129 -2.266632797 -1.1199571
## 3 -3.6852388 -5.3383085 0.135906635 -2.9194700
## 4 -2.9767109 -2.1378877 -1.394231308 0.6187001
## 5 -1.9179058 10.8106438 6.951860581 -4.7354518
## 6 -2.2386627 -1.7396261 -2.863668712 -0.4654120
## 7 -2.5027181 -0.6800963 -0.915273442 -1.9918757
## 8 -1.6479666 0.4304491 0.593420562 2.9908597
## 9 -1.9575256 -1.6232019 0.965903743 0.9599266
## 10 -0.5030823 1.8646960 -0.241144846 0.9770617
## 11 -2.1928917 -0.7800897 1.203465928 3.3337588
## 12 -1.0482215 1.2956428 -0.235065170 2.5895538
## 13 2.8617235 -1.3245688 -2.588533849 -7.7935447
## 14 3.5137456 1.6543783 -5.316732421 -0.6019437
## 15 3.5447463 0.1364092 -0.013208212 -1.3391314
## 16 3.8466374 -4.6736600 9.445549393 -3.0829697
## 17 3.4401557 0.4452516 -1.674469630 -2.4019175
## 18 3.3902074 1.5183614 -4.194276668 -1.3304764
## 19 3.3094612 -0.6719693 2.668612513 13.3048605
## 20 2.9466763 0.9211042 -0.254370999 0.7489114
##
## $biplot
## RDA1 RDA2 RDA3 RDA4
## MEM1 -0.71947066 0.06795135 -0.01447611 -0.19319025
## MEM2 0.59659679 0.03482658 -0.13735298 0.17019052
## AEM1 0.93068886 -0.15859812 0.21423280 0.15546584
## AEM2 0.33494487 0.21349601 -0.72306069 -0.48297886
## AEM19 -0.09873530 -0.90578251 -0.38378649 0.09359047
## AEM6 -0.01366803 0.31151497 -0.52950980 0.68332735
##
## attr(,"const")
## [1] 12.4779
Transform the information obtained from the model to a data frame.
species_centroids <- data.frame(scrs$species)
species_centroids
## RDA1 RDA2 RDA3 RDA4
## PC1 6.35169814 0.1828037 -0.02469740 -0.11052585
## PC2 0.31955251 -3.1720788 -1.12454588 1.45962298
## PC3 -0.14814782 -0.4328444 -2.22158565 -2.25860078
## PC4 0.18281483 -0.7027507 1.84523577 -0.22602062
## PC5 -0.21745248 0.4434235 1.25303557 -0.32941578
## PC6 -0.44560399 -0.5331235 0.07833302 -0.05214105
## PC7 0.09705276 -2.3106842 1.61915583 -1.57185389
Add the PC components names.
species_centroids$PC_names <- rownames(species_centroids)
Add explanatory variables coordinates to create arrows on the biplot.
aem_continuous_arrows <- data.frame(scrs$biplot)
aem_continuous_arrows
## RDA1 RDA2 RDA3 RDA4
## MEM1 -0.71947066 0.06795135 -0.01447611 -0.19319025
## MEM2 0.59659679 0.03482658 -0.13735298 0.17019052
## AEM1 0.93068886 -0.15859812 0.21423280 0.15546584
## AEM2 0.33494487 0.21349601 -0.72306069 -0.48297886
## AEM19 -0.09873530 -0.90578251 -0.38378649 0.09359047
## AEM6 -0.01366803 0.31151497 -0.52950980 0.68332735
Add rownames.
rownames(aem_continuous_arrows) <- c("MEM1","MEM2","AEM1", "AEM2", "AEM19", "AEM6")
Add AEM names to the arrows.
aem_continuous_arrows$aem_number <- rownames(aem_continuous_arrows)
aem_continuous_arrows
## RDA1 RDA2 RDA3 RDA4 aem_number
## MEM1 -0.71947066 0.06795135 -0.01447611 -0.19319025 MEM1
## MEM2 0.59659679 0.03482658 -0.13735298 0.17019052 MEM2
## AEM1 0.93068886 -0.15859812 0.21423280 0.15546584 AEM1
## AEM2 0.33494487 0.21349601 -0.72306069 -0.48297886 AEM2
## AEM19 -0.09873530 -0.90578251 -0.38378649 0.09359047 AEM19
## AEM6 -0.01366803 0.31151497 -0.52950980 0.68332735 AEM6
Create the biplot.
baseplot<-plot(full.rda, scaling = 2)
Use the arrow.mul argument to make the scale of the arrows comparable to one another.
mult <- attributes(baseplot$biplot)$arrow.mul
Add the sampling site names.
sitenames <- sites$Abb
Remove the samples from Alaska.
sitenames <- sitenames[-c(21:24)]
sitenames
## [1] OGD SGI LAS JER TOF CRA RBY SHE MAL QUA HOP TBL CAL TOL PRI LEG JUA SEL REN
## [20] MAZ
## 24 Levels: AK1 AK2 AK3 AK4 CAL CRA HOP JER JUA LAS LEG MAL MAZ OGD PRI ... TOL
Add colour to the sampling sites regarding their locations (North or South).
sitecols <- c(rep("black", 12), rep("lightgrey", 8))
Add the region information.
sitereg <- c(rep("S", 12), rep("N", 8))
Keep the response variables coordinates for the RDA model.
sites_centroids <- data.frame(scrs$sites)
sites_centroids$SITE <- sitenames
sites_centroids$REGION <- as.factor(sitereg)
head(sites_centroids)
## RDA1 RDA2 RDA3 RDA4 SITE REGION
## 1 -3.293731 -1.309541 -0.007111301 2.2585174 OGD S
## 2 -2.888698 1.202013 -2.266632797 -1.1199571 SGI S
## 3 -3.685239 -5.338309 0.135906635 -2.9194700 LAS S
## 4 -2.976711 -2.137888 -1.394231308 0.6187001 JER S
## 5 -1.917906 10.810644 6.951860581 -4.7354518 TOF S
## 6 -2.238663 -1.739626 -2.863668712 -0.4654120 CRA S
Make a ggplot.
RDA_plot <- ggplot(data = sites_centroids, aes(x = RDA1, y = RDA2))+
geom_point(data = sites_centroids, pch = 21, size = 2, aes(fill = REGION))+
scale_fill_manual(values = c(S = "black", N = "lightgrey"), name = "Region", levels(factor(sites_centroids$REGION)), labels = c("North", "South"))+
geom_text(data = aem_continuous_arrows,
aes(x= (mult + mult/10) * RDA1, y = (mult + mult/10) * RDA2,
label = aem_number), size = 4, hjust = 0.5)+
geom_text(data = sites_centroids, aes(x = RDA1, y = RDA2, label = SITE),
size = 2.5, col = "black", hjust = 1.2)+
geom_segment(data = aem_continuous_arrows,
aes(x = 0, xend = mult * RDA1, y = 0, yend = mult * RDA2),
arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
geom_hline(yintercept = 0, lty = "dotted") + geom_vline(xintercept = 0, lty = "dotted") +
labs(x = paste("RDA1 (", round(summary(full.rda)$cont$importance[2,1]*100, 2), "%)", sep = ""), y = paste("RDA2 (", round(summary(full.rda)$cont$importance[2,2]*100, 2), "%)", sep = ""))+
theme(axis.text = element_text(colour = "black", size = 12)) +
theme(axis.title = element_text(size = 18, colour = "black", family = "Helvetica", face = "bold"))
RDA_plot
Save the RDA.
ggsave("RDA_AEM_dbMEM.pdf", width = 15, height = 15, dpi = 600, units = "cm", useDingbats = F)