RDA

0. Download libraries

library(vegan)
library(vcfR)
library(SoDA)
library(adespatial)
library(ggplot2)
library(kableExtra)
library(reshape2)

1. Calculate allele frequencies per sampling site in PLINK

Transform the vcf file to plink format (see the lecture 6) using your terminal.

Calculate allele frequencies from the .tped and .tfam file using your terminal. Yet, you have to change the first column of the .tfam and add the sampling site information (the groups from which you want the program to calculate the allele frequencies).

plink --tped filtered_2719neutral_snps.recode.tped --tfam filtered_2719neutral_snps.recode.tfam --freq --within cluster-plink-sea-cumcumber.txt --out filtered_2719neutral_snps.recode

Download allele frequencies obtained with PLINK.

allele_frequencies <- read.table("00-Data/2719neutral_snps_noAK.frq.strat", header=T, sep="\t", dec=".")

Select only the second (the locus) and the third (the sampling site) columns.

allele_frequencies2 <- allele_frequencies[,2:3]
allele_frequencies3 <- cbind(allele_frequencies2, allele_frequencies$MAF)
colnames(allele_frequencies3)=c("SNP","POP","MAF")

Allele frequencies per sampling site from PLINK
CHR	SNP	CLST	A1	A2	MAF	MAC	NCHROBS
1	54300_20	1	C	A	0.05357	3	56
1	54300_20	2	C	A	0.02000	1	50
1	54300_20	3	C	A	0.04348	2	46
1	54300_20	4	C	A	0.00000	0	58
1	54300_20	5	C	A	0.00000	0	34

Convert from long to wide.

allele_frequencies4 <- dcast(allele_frequencies3, POP~SNP)
write.table(allele_frequencies4, "2719neutralsnps-freq-sites.txt", quote=F, sep="\t")

2. Playing with the genomic dataset

Import allele frequencies obtained from PLINK.

geno <- read.table("2719neutralsnps-freq-sites.txt", header=TRUE)

Allele frequencies format to use in RDA
POP	X10_7	X10013_19	X10018_31	X10020_41
1	0.01724	0.00000	0.00000	0.01613
2	0.01667	0.01613	0.00000	0.00000
3	0.00000	0.00000	0.00000	0.01786
4	0.03333	0.00000	0.03448	0.01667
5	0.00000	0.00000	0.00000	0.00000
6	0.02083	0.00000	0.01852	0.00000

Remove sampling site information.

geno <- geno[,-1]

Remember RDA is a two-steps process : 1) ordination and then 2) regression.

Here we perform the first step, the ordination of allele frequencies by sampling site¨. However, PCA requires that the variables be linearly related to each other and at approximately the same scale, which is not necessarily the case for our allele frequencies, which may subsequently lead to poor results. We must therefore apply a Hellinger transformation before the analyses. Indeed, Hellinger distance has good statistical properties, as indicated by the R2 and monotonicity criteria used by Legendre and Gallagher (2001) in their comparison of transformation methods. The Hellinger transformation is available in the decostand() function of VEGAN (method = “hellinger”). Here, we perform a Hellinger transformation on the dataset.

geno.h <- decostand(geno, "hellinger")

Next, we can perform a Principal Component Analysis (PCA) of these transformed allele frequencies. We will use the function prcomp to do the PCA. It is highly recommended to set the argument scale=TRUE. This standardize the input data so that it has zero mean and variance one before doing PCA.

geno.pca <- prcomp(geno.h, scale = T)
summary(geno.pca)

## Importance of components:
##                            PC1      PC2      PC3      PC4      PC5      PC6
## Standard deviation     18.5874 13.80686 13.37751 12.00739 11.89525 11.76301
## Proportion of Variance  0.1271  0.07011  0.06582  0.05303  0.05204  0.05089
## Cumulative Proportion   0.1271  0.19718  0.26299  0.31602  0.36806  0.41895
##                            PC7      PC8     PC9     PC10     PC11     PC12
## Standard deviation     11.6947 11.55599 11.4839 11.30375 11.21593 11.18517
## Proportion of Variance  0.0503  0.04911  0.0485  0.04699  0.04627  0.04601
## Cumulative Proportion   0.4693  0.51836  0.5669  0.61386  0.66013  0.70614
##                           PC13     PC14     PC15     PC16     PC17     PC18
## Standard deviation     10.9381 10.89881 10.78954 10.77625 10.55668 10.49443
## Proportion of Variance  0.0440  0.04369  0.04282  0.04271  0.04099  0.04051
## Cumulative Proportion   0.7501  0.79383  0.83664  0.87935  0.92034  0.96084
##                            PC19      PC20
## Standard deviation     10.31829 1.641e-14
## Proportion of Variance  0.03916 0.000e+00
## Cumulative Proportion   1.00000 1.000e+00

Q1. How many PC components do you obtain? Q2. What percentage of variation does the PC5 component account for?

Visualize the genomic variation explained by each axis according to the broken stick model. The broken-stick model is one of the many techniques that have been proposed for choosing the number of principal components to retain during a principal component analysis.

The PCA seeks for a restricted dimension to maximise the inertia of the cloud when it is projected into the q-dimensional subspace generated by these factors. Then the inertia represents the sum of the variances of the variables under consideration.

The broken-stick model will then retain PC components that explain more variance than would be expected by randomly dividing the variance into p parts (line red over the black line). The observed proportions that are higher than the expected proportions indicate which principal components to keep.

screeplot(geno.pca, npcs = 20, type = "lines", bstick = T)

You could also choose to select PC components based on the Kaiser - Guttman rule. In a standard PCA, the $sdev gives the fraction of information (sdev=“standard deviation”) contained in each of the PC components. In this rule, we consider an axis to be interesting if its eigenvalue is greater the mean eigenvalue.

ev <- geno.pca$sdev^2
ev

##  [1] 3.454916e+02 1.906293e+02 1.789577e+02 1.441774e+02 1.414971e+02
##  [6] 1.383685e+02 1.367665e+02 1.335409e+02 1.318802e+02 1.277748e+02
## [11] 1.257970e+02 1.251081e+02 1.196416e+02 1.187840e+02 1.164142e+02
## [16] 1.161275e+02 1.114436e+02 1.101331e+02 1.064671e+02 2.693243e-28

ev[ev > mean(ev)]

## [1] 345.4916 190.6293 178.9577 144.1774 141.4971 138.3685 136.7665

Q3. How many PC components do you keep using the broken stick model and the Kaiser-Guttman? What percentage of the genomic variation is explained by these PC components cumulatively?

Select the adequate PC components.

geno.pca.axes <- geno.pca$x[,1:7] # select 7 PC axes
head(geno.pca.axes)

##          PC1        PC2        PC3       PC4        PC5        PC6        PC7
## 1 -21.348747   5.770015  -3.539039  2.840527  -3.342012   4.602176 -2.1080942
## 2 -17.897240  -4.432905   7.130377 -5.159484  -3.041278   1.886322 -4.7494497
## 3 -23.854185   5.242946   2.724128 -4.202315 -15.571076  11.106313 24.4757440
## 4 -17.666731   8.564480   3.586660 -4.598861   5.001930  20.858805 -0.9250808
## 5  -9.930354 -54.783111 -14.774534 -1.066448   1.895834  -1.729385  0.6257380
## 6 -16.994368   3.991230   6.578290  3.168137 -24.916286 -22.502131  3.8859886

3. RDA on db-MEM vectors

Download population map information.

pop <- read.table("00-Data/population_map_sea_cucumber.txt", header=TRUE)

Import sampling site information.

sites <- read.table("00-Data/24_sites.txt", header=TRUE)
sites <- sites[-c(21:24), ]
geo <- sites[, c(5:6)]
geo

##         Lat      Long
## 1  48.40822 -123.3875
## 2  48.75678 -123.3799
## 3  49.47478 -124.1826
## 4  49.75303 -124.0011
## 5  49.24820 -125.9384
## 6  50.52120 -126.5646
## 7  50.33027 -125.4666
## 8  50.65893 -126.2377
## 9  50.62737 -127.1282
## 10 50.49977 -127.8717
## 11 50.89923 -127.8512
## 12 51.27213 -127.8038
## 13 51.69000 -128.1425
## 14 52.71327 -128.5782
## 15 54.19700 -130.3660
## 16 54.68200 -130.4640
## 17 52.63200 -131.3960
## 18 52.93882 -131.9049
## 19 53.39900 -132.6600
## 20 54.10000 -132.5531

Transform spatial coordinates into cartesian coordinates (order is lat,long).

geo$CAR <- geoXY(geo$Lat, geo$Long)
geo

##         Lat      Long      CAR.X      CAR.Y
## 1  48.40822 -123.3875 686068.821      0.000
## 2  48.75678 -123.3799 686629.218  38761.185
## 3  49.47478 -124.1826 627300.301 118611.385
## 4  49.75303 -124.0011 640717.799 149558.799
## 5  49.24820 -125.9384 497472.140  93411.527
## 6  50.52120 -126.5646 451150.728 235003.286
## 7  50.33027 -125.4666 532366.901 213764.386
## 8  50.65893 -126.2377 475333.949 250324.799
## 9  50.62737 -127.1282 409455.077 246813.279
## 10 50.49977 -127.8717 354440.312 232619.066
## 11 50.89923 -127.8512 355963.444 277056.743
## 12 51.27213 -127.8038 359464.780 318541.836
## 13 51.69000 -128.1425 334403.422 365032.626
## 14 52.71327 -128.5782 302159.207 478892.709
## 15 54.19700 -130.3660 169829.324 644024.517
## 16 54.68200 -130.4640 162574.699 698011.404
## 17 52.63200 -131.3960  93578.837 469849.346
## 18 52.93882 -131.9049  55900.933 503992.588
## 19 53.39900 -132.6600      0.000 555206.136
## 20 54.10000 -132.5531   7918.014 633227.663

Create euclidian distance matrix.

euclidian_distances <- dist(geo$CAR, method="euclidean")

Perform dbMEM on euclidian distances.

dbMEM <- dbmem(euclidian_distances, MEM.autocor = "non-null", store.listw = TRUE)
names(dbMEM)

##  [1] "MEM1"  "MEM2"  "MEM3"  "MEM4"  "MEM5"  "MEM6"  "MEM7"  "MEM8"  "MEM9" 
## [10] "MEM10" "MEM11" "MEM12" "MEM13" "MEM14" "MEM15" "MEM16" "MEM17" "MEM18"
## [19] "MEM19"

Transform into a dataframe.

dbMEM.vectors <- as.data.frame(dbMEM)

db-MEM vectors
MEM1	MEM2	MEM3
-0.2809907	-1.5475630	1.3851299
-0.1807063	-1.7834844	0.9363306
0.1403731	-1.8547212	0.1973885
0.3023279	-1.7696849	-0.0642742
0.6675319	-1.1053684	-1.1333367
1.0894456	0.3807982	-0.6478060
0.9834678	-0.2996364	-1.1767929
1.1070556	-0.0047903	-0.5705352
1.0467669	0.7545681	-0.5966470
1.0381771	0.7593626	-0.5714938
0.9405340	0.9906100	-0.1075976
0.8929247	1.1317322	0.9930829
0.7318539	1.1869507	1.5022544
-0.2855643	0.6809658	2.6978790
-1.5231833	0.4506898	-0.8071367
-0.8861806	0.1595813	-0.5291624
-1.4390876	0.5613184	0.4886097
-1.4065878	0.4270411	-0.5866334
-1.4052371	0.4259335	-0.5949010
-1.5329207	0.4556967	-0.8143582

Create a model with intercept only and a full model.

dbmem.mod0 <- rda(geno.pca.axes ~1, dbMEM.vectors)
dbmem.mod1 <- rda(geno.pca.axes ~., dbMEM.vectors)

Now, we need to select the best model and retain only significant explanatory and we do not want to do this manually. Therefore, we use the function ordiR2step that builds model so that it maximizes adjusted R2 at every step and stopping when the adjusted R2 starts to decrease. The function ordistep can do forward, backward and stepwise model selection using permutation tests. Perform ordistep in both directions.

dbmem.all.ord <- ordistep(dbmem.mod0, scope = formula(dbmem.mod1), direction = "both", permutations = 10000)

## 
## Start: geno.pca.axes ~ 1 
## 
##         Df    AIC      F Pr(>F)   
## + MEM1   1 142.83 3.0982 0.0029 **
## + MEM2   1 143.48 2.4172 0.0119 * 
## + MEM4   1 144.25 1.6457 0.1592   
## + MEM5   1 144.53 1.3756 0.2087   
## + MEM17  1 144.67 1.2385 0.2683   
## + MEM14  1 144.79 1.1243 0.3532   
## + MEM19  1 144.87 1.0461 0.4142   
## + MEM6   1 144.88 1.0426 0.4243   
## + MEM15  1 144.88 1.0362 0.4251   
## + MEM3   1 144.97 0.9553 0.4824   
## + MEM8   1 145.09 0.8422 0.5813   
## + MEM18  1 145.27 0.6731 0.7364   
## + MEM12  1 145.34 0.6052 0.7892   
## + MEM16  1 145.31 0.6334 0.8019   
## + MEM9   1 145.54 0.4201 0.8431   
## + MEM10  1 145.45 0.5008 0.8565   
## + MEM7   1 145.61 0.3539 0.9053   
## + MEM13  1 145.64 0.3270 0.9350   
## + MEM11  1 145.84 0.1463 0.9812   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ MEM1 
## 
##        Df AIC      F Pr(>F)   
## - MEM1  1 144 3.0982  0.002 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##         Df    AIC      F Pr(>F)    
## + MEM2   1 141.84 2.7392 0.0005 ***
## + MEM4   1 142.76 1.8509 0.1019    
## + MEM5   1 143.09 1.5431 0.1276    
## + MEM17  1 143.26 1.3875 0.1823    
## + MEM14  1 143.40 1.2581 0.2628    
## + MEM19  1 143.50 1.1697 0.3345    
## + MEM15  1 143.51 1.1586 0.3371    
## + MEM6   1 143.50 1.1658 0.3470    
## + MEM3   1 143.61 1.0673 0.4039    
## + MEM8   1 143.75 0.9398 0.5086    
## + MEM18  1 143.96 0.7500 0.6899    
## + MEM12  1 144.05 0.6739 0.7446    
## + MEM16  1 144.01 0.7054 0.7477    
## + MEM9   1 144.28 0.4669 0.8218    
## + MEM10  1 144.18 0.5570 0.8261    
## + MEM7   1 144.37 0.3931 0.8954    
## + MEM13  1 144.40 0.3631 0.9222    
## + MEM11  1 144.64 0.1622 0.9796    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ MEM1 + MEM2 
## 
##        Df    AIC      F   Pr(>F)   
## - MEM2  1 142.83 2.7392 0.005399 **
## - MEM1  1 143.48 3.3976 0.001500 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##         Df    AIC      F  Pr(>F)  
## + MEM5   1 141.81 1.7114 0.09279 .
## + MEM4   1 141.42 2.0588 0.09419 .
## + MEM17  1 142.00 1.5365 0.13259  
## + MEM14  1 142.17 1.3915 0.20718  
## + MEM19  1 142.28 1.2926 0.25617  
## + MEM15  1 142.30 1.2802 0.26577  
## + MEM6   1 142.29 1.2883 0.26847  
## + MEM3   1 142.42 1.1782 0.32277  
## + MEM8   1 142.58 1.0363 0.43396  
## + MEM18  1 142.83 0.8255 0.59864  
## + MEM12  1 142.93 0.7412 0.66673  
## + MEM16  1 142.89 0.7761 0.67283  
## + MEM9   1 143.21 0.5125 0.75222  
## + MEM10  1 143.09 0.6120 0.76732  
## + MEM7   1 143.31 0.4312 0.84682  
## + MEM13  1 143.35 0.3982 0.88721  
## + MEM11  1 143.62 0.1775 0.96610  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the explanatory variables to keep.

dbmem.all.ord$anova

##        Df    AIC      F Pr(>F)    
## + MEM1  1 142.83 3.0982 0.0029 ** 
## + MEM2  1 141.84 2.7392 0.0005 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

*** Q4. Which dbMEM variable is selected by the ordistep function?***

Select the db-MEM vectors kept by ordistep function.

dbmem.sub <- dbMEM.vectors[,c(1,2)]

Perform a RDA using the db-MEM vectors kept by ordistep as explanatory variables.

dbmem.mod.sel <- rda(geno.pca.axes ~., as.data.frame(dbmem.sub))
summary(dbmem.mod.sel)

## 
## Call:
## rda(formula = geno.pca.axes ~ MEM1 + MEM2, data = as.data.frame(dbmem.sub)) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total          1275.9     1.0000
## Constrained     338.4     0.2652
## Unconstrained   937.5     0.7348
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                          RDA1    RDA2      PC1      PC2      PC3      PC4
## Eigenvalue            299.876 38.5373 198.8345 177.4528 144.4179 142.3079
## Proportion Explained    0.235  0.0302   0.1558   0.1391   0.1132   0.1115
## Cumulative Proportion   0.235  0.2652   0.4211   0.5602   0.6733   0.7849
##                            PC5       PC6      PC7
## Eigenvalue            138.1969 109.27259 26.99205
## Proportion Explained    0.1083   0.08564  0.02116
## Cumulative Proportion   0.8932   0.97884  1.00000
## 
## Accumulated constrained eigenvalues
## Importance of components:
##                           RDA1    RDA2
## Eigenvalue            299.8761 38.5373
## Proportion Explained    0.8861  0.1139
## Cumulative Proportion   0.8861  1.0000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  12.4779 
## 
## 
## Species scores
## 
##         RDA1    RDA2       PC1     PC2     PC3     PC4
## PC1  5.91141  0.2134 -1.710377  0.3903  0.4521 -0.5569
## PC2  0.98140 -0.4210  4.446254  1.3393  0.2797 -0.3764
## PC3 -0.37326 -0.7341 -1.206774  4.3380 -0.2653  0.2207
## PC4  0.20974 -0.4854  0.060843 -0.2893 -3.6757 -1.8019
## PC5 -0.05799 -0.9383 -0.227545 -0.3676  1.8278 -2.8822
## PC6 -0.38780  1.4332 -0.001505  0.6584 -0.1505 -1.4029
## PC7 -0.59093  0.8782 -0.240104  0.4885  0.6293 -1.8285
## 
## 
## Site scores (weighted sums of species scores)
## 
##       RDA1      RDA2      PC1     PC2     PC3      PC4
## 1  -3.2510   0.44974  2.53806 -0.1640 -1.3445  0.35869
## 2  -3.0508  -0.70096 -0.22020  1.4324  0.1075  1.79903
## 3  -4.2529   9.51891  1.82669  2.1966 -0.2604 -0.22963
## 4  -2.9012   3.53269  1.94690  2.0383  1.1994 -2.00501
## 5  -2.9235   6.08130 -9.29237 -6.2663 -0.1209  0.73792
## 6  -2.4720  -3.64297  1.31018  0.9139 -3.1256  5.51733
## 7  -2.7229  -0.05948  0.08308  1.1413  2.4859 -0.51767
## 8  -1.3623 -12.82619  1.97878 -1.7813  2.0174 -0.11193
## 9  -1.8435  -0.78193  1.71143 -0.1226 -0.5093 -2.32717
## 10 -0.3534   0.33515 -0.69617 -0.6250  1.7149 -1.26172
## 11 -1.9443  -6.61197  2.73450 -1.8567 -1.0809 -0.35680
## 12 -0.6098  -7.02383  1.04132 -0.3932 -3.2591 -0.92244
## 13  2.4432  -1.41247 -3.01571  3.9750  5.4585 -0.64320
## 14  3.6858   4.65300 -2.32083  2.9619 -4.9740  4.56193
## 15  3.5835  -0.07346 -0.97233  0.6190 -0.1357  0.05609
## 16  3.6809   3.51254 -0.65901 -1.1174 -4.8540 -8.53102
## 17  3.5819   0.55056 -1.20794  2.3392  2.8339 -1.84439
## 18  3.5993  -2.00564 -1.08424  2.5118  2.9710  1.45268
## 19  3.9532  11.08076  4.42785 -7.6181  3.7540  2.57232
## 20  3.1597  -4.57576 -0.12997 -0.1846 -2.8780  1.69499
## 
## 
## Site constraints (linear combinations of constraining variables)
## 
##       RDA1   RDA2      PC1     PC2     PC3      PC4
## 1  -2.2416  3.773  2.53806 -0.1640 -1.3445  0.35869
## 2  -2.8846  4.086 -0.22020  1.4324  0.1075  1.79903
## 3  -3.6911  3.648  1.82669  2.1966 -0.2604 -0.22963
## 4  -3.8764  3.173  1.94690  2.0383  1.1994 -2.00501
## 5  -3.4292  1.105 -9.29237 -6.2663 -0.1209  0.73792
## 6  -1.5967 -2.796  1.31018  0.9139 -3.1256  5.51733
## 7  -2.6193 -1.170  0.08308  1.1413  2.4859 -0.51767
## 8  -2.3397 -2.017  1.97878 -1.7813  2.0174 -0.11193
## 9  -0.8226 -3.505  1.71143 -0.1226 -0.5093 -2.32717
## 10 -0.7957 -3.500 -0.69617 -0.6250  1.7149 -1.26172
## 11 -0.1668 -3.808  2.73450 -1.8567 -1.0809 -0.35680
## 12  0.1918 -4.018  1.04132 -0.3932 -3.2591 -0.92244
## 13  0.6321 -3.839 -3.01571  3.9750  5.4585 -0.64320
## 14  1.8479 -0.911 -2.32083  2.9619 -4.9740  4.56193
## 15  4.0322  1.840 -0.97233  0.6190 -0.1357  0.05609
## 16  2.1580  1.286 -0.65901 -1.1174 -4.8540 -8.53102
## 17  4.0577  1.453 -1.20794  2.3392  2.8339 -1.84439
## 18  3.7434  1.676 -1.08424  2.5118  2.9710  1.45268
## 19  3.7385  1.676  4.42785 -7.6181  3.7540  2.57232
## 20  4.0619  1.847 -0.12997 -0.1846 -2.8780  1.69499
## 
## 
## Biplot scores for constraining variables
## 
##         RDA1    RDA2 PC1 PC2 PC3 PC4
## MEM1 -0.7546 -0.6561   0   0   0   0
## MEM2  0.6561 -0.7546   0   0   0   0

Check the adjusted R2 value.

RsquareAdj(dbmem.mod.sel)

Q5. How much percent of the genomic variation is explained by this model?

Check the significance for the global model.

anova(dbmem.mod.sel, permutations = 999)

Check the significance for each variable.

anova(dbmem.mod.sel, by = "margin", permutations = 999)

Check the significance for each axis.

anova(dbmem.mod.sel, by = "axis", permutations = 999)

4. RDA on AEM vectors

Import results from AEM vectors.

AEM.vectors.wt <- read.table("00-Data/AEM_weight_all_vectors.txt", header=TRUE)

Create a model with intercept only and a full model.

aem.mod0 <- rda(geno.pca.axes ~1, AEM.vectors.wt)
aem.mod1 <- rda(geno.pca.axes ~., AEM.vectors.wt)

Perform ordistep in both directions.

aem.sel.ord <- ordistep(aem.mod0, scope = formula(aem.mod1), direction = "forward", permutations = 999)

## 
## Start: geno.pca.axes ~ 1 
## 
##       Df    AIC      F Pr(>F)    
## + V1   1 140.60 5.5867  0.001 ***
## + V2   1 143.92 1.9706  0.042 *  
## + V19  1 143.81 2.0886  0.073 .  
## + V6   1 144.52 1.3847  0.207    
## + V10  1 144.71 1.1972  0.316    
## + V4   1 144.77 1.1422  0.383    
## + V3   1 145.00 0.9205  0.537    
## + V14  1 145.02 0.9021  0.555    
## + V17  1 145.05 0.8735  0.558    
## + V12  1 145.23 0.7116  0.641    
## + V8   1 145.33 0.6192  0.705    
## + V5   1 145.29 0.6554  0.708    
## + V16  1 145.53 0.4263  0.811    
## + V18  1 145.75 0.2268  0.904    
## + V9   1 145.64 0.3269  0.908    
## + V15  1 145.68 0.2936  0.910    
## + V11  1 145.61 0.3589  0.915    
## + V7   1 145.62 0.3460  0.942    
## + V13  1 145.80 0.1833  0.970    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ V1 
## 
##       Df    AIC      F Pr(>F)   
## + V2   1 139.83 2.5246  0.002 **
## + V19  1 139.67 2.6813  0.049 * 
## + V6   1 140.63 1.7556  0.126   
## + V10  1 140.89 1.5128  0.196   
## + V4   1 140.97 1.4420  0.242   
## + V3   1 141.28 1.1576  0.358   
## + V17  1 141.34 1.0976  0.366   
## + V14  1 141.30 1.1340  0.379   
## + V12  1 141.57 0.8915  0.542   
## + V8   1 141.71 0.7746  0.556   
## + V5   1 141.65 0.8204  0.572   
## + V16  1 141.98 0.5315  0.684   
## + V9   1 142.12 0.4069  0.759   
## + V18  1 142.27 0.2818  0.853   
## + V11  1 142.08 0.4469  0.859   
## + V15  1 142.17 0.3652  0.859   
## + V7   1 142.09 0.4308  0.903   
## + V13  1 142.33 0.2276  0.944   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ V1 + V2 
## 
##       Df    AIC      F Pr(>F)  
## + V19  1 138.42 2.9679  0.040 *
## + V6   1 139.55 1.9272  0.094 .
## + V10  1 139.86 1.6572  0.163  
## + V4   1 139.94 1.5786  0.191  
## + V3   1 140.31 1.2641  0.293  
## + V14  1 140.34 1.2381  0.305  
## + V17  1 140.38 1.1979  0.359  
## + V12  1 140.65 0.9713  0.462  
## + V5   1 140.74 0.8932  0.504  
## + V8   1 140.80 0.8430  0.566  
## + V16  1 141.12 0.5772  0.687  
## + V9   1 141.28 0.4414  0.760  
## + V11  1 141.23 0.4849  0.826  
## + V18  1 141.45 0.3054  0.838  
## + V15  1 141.34 0.3960  0.842  
## + V7   1 141.25 0.4675  0.867  
## + V13  1 141.52 0.2465  0.946  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ V1 + V2 + V19 
## 
##       Df    AIC      F Pr(>F)  
## + V6   1 137.70 2.1909  0.047 *
## + V4   1 138.17 1.7872  0.122  
## + V10  1 138.06 1.8779  0.125  
## + V3   1 138.61 1.4258  0.233  
## + V14  1 138.64 1.3960  0.250  
## + V17  1 138.70 1.3501  0.271  
## + V12  1 139.02 1.0918  0.386  
## + V5   1 139.13 1.0030  0.426  
## + V8   1 139.20 0.9462  0.510  
## + V16  1 139.58 0.6458  0.729  
## + V9   1 139.78 0.4931  0.752  
## + V11  1 139.71 0.5420  0.777  
## + V7   1 139.74 0.5224  0.827  
## + V15  1 139.84 0.4421  0.856  
## + V18  1 139.97 0.3406  0.874  
## + V13  1 140.06 0.2748  0.938  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step: geno.pca.axes ~ V1 + V2 + V19 + V6 
## 
##       Df    AIC      F Pr(>F)  
## + V10  1 136.97 2.0461  0.064 .
## + V4   1 137.09 1.9455  0.078 .
## + V3   1 137.60 1.5466  0.168  
## + V17  1 137.71 1.4634  0.189  
## + V14  1 137.64 1.5138  0.190  
## + V12  1 138.08 1.1804  0.371  
## + V5   1 138.21 1.0835  0.386  
## + V8   1 138.29 1.0215  0.443  
## + V16  1 138.73 0.6952  0.695  
## + V11  1 138.88 0.5828  0.730  
## + V9   1 138.95 0.5300  0.774  
## + V7   1 138.91 0.5616  0.793  
## + V15  1 139.03 0.4750  0.823  
## + V18  1 139.18 0.3655  0.863  
## + V13  1 139.28 0.2947  0.936  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the explanatory variables to keep.

aem.sel.ord$anova

##       Df    AIC      F Pr(>F)    
## + V1   1 140.60 5.5867  0.001 ***
## + V2   1 139.83 2.5246  0.002 ** 
## + V19  1 138.42 2.9679  0.040 *  
## + V6   1 137.70 2.1909  0.047 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Select the AEM vectors to keep for the analysis.

aem.sub <- AEM.vectors.wt[,c(1,2,19,6)]

Build the RDA model with these AEM vectors.

aem.mod.sel <- rda(geno.pca.axes ~., as.data.frame(aem.sub))
summary(aem.mod.sel)

## 
## Call:
## rda(formula = geno.pca.axes ~ V1 + V2 + V19 + V6, data = as.data.frame(aem.sub)) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total          1275.9     1.0000
## Constrained     651.9     0.5109
## Unconstrained   624.0     0.4891
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                           RDA1     RDA2      RDA3     RDA4      PC1      PC2
## Eigenvalue            330.6827 134.6448 112.76356 73.80353 156.5617 143.8702
## Proportion Explained    0.2592   0.1055   0.08838  0.05784   0.1227   0.1128
## Cumulative Proportion   0.2592   0.3647   0.45309  0.51093   0.6336   0.7464
##                            PC3      PC4      PC5      PC6      PC7
## Eigenvalue            140.5300 98.14303 53.63766 22.73396 8.516945
## Proportion Explained    0.1101  0.07692  0.04204  0.01782 0.006675
## Cumulative Proportion   0.8565  0.93347  0.97551  0.99332 1.000000
## 
## Accumulated constrained eigenvalues
## Importance of components:
##                           RDA1     RDA2     RDA3    RDA4
## Eigenvalue            330.6827 134.6448 112.7636 73.8035
## Proportion Explained    0.5073   0.2065   0.1730  0.1132
## Cumulative Proportion   0.5073   0.7138   0.8868  1.0000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  12.4779 
## 
## 
## Species scores
## 
##         RDA1    RDA2      RDA3     RDA4     PC1     PC2
## PC1  6.32898  0.1425 -0.030990 -0.09037  0.1530 -0.4266
## PC2  0.15342 -3.0980 -1.246795  0.81830  2.8228  0.2601
## PC3 -0.14429 -0.3026 -2.151961 -2.40223 -0.6291  0.6538
## PC4  0.09312 -0.7104  1.831114 -0.76155  0.5009  3.1608
## PC5 -0.32698  0.4636  1.269875 -1.14264  2.5553 -1.9847
## PC6 -0.31356 -0.6815 -0.004976  0.13618 -0.9245 -1.2425
## PC7  0.19816 -2.3531  1.615142 -0.80852 -1.7558 -1.1857
## 
## 
## Site scores (weighted sums of species scores)
## 
##       RDA1    RDA2     RDA3     RDA4      PC1       PC2
## 1  -3.3262 -1.3140 -0.10079  2.12509  0.51723  0.950824
## 2  -2.8746  1.2885 -2.19850 -0.84803 -1.13510  0.006533
## 3  -3.5807 -5.4691  0.02532  0.29066 -5.78797 -1.305333
## 4  -2.9688 -2.2268 -1.56360  0.15149  0.83680 -2.315591
## 5  -1.7142 10.5971  7.40366 -1.16979 -5.90652 -1.267982
## 6  -2.2706 -1.4842 -2.76562  0.98271 -2.17651  5.239568
## 7  -2.4927 -0.6255 -0.88120 -1.73271 -0.37229 -2.432215
## 8  -1.8363  0.7381  0.68003  0.22571  5.38816  0.370451
## 9  -2.0061 -1.6495  0.86505 -0.35231  1.84240 -0.178256
## 10 -0.5018  1.8074 -0.26903  0.05857  1.30936 -2.378312
## 11 -2.3229 -0.6520  1.16687  1.38827  3.50236  1.854494
## 12 -1.1672  1.3899 -0.27418 -0.75298  3.68195  3.279699
## 13  2.9221 -1.1673 -2.39882 -6.82601 -1.26966 -4.205178
## 14  3.6124  1.6019 -5.34250  0.46396 -3.60823  4.961012
## 15  3.5645  0.1294  0.02375 -1.22273 -0.95382  0.556597
## 16  3.8477 -4.9607  9.34090 -3.81536 -0.08934  2.482316
## 17  3.4722  0.4085 -1.67918 -3.27246  1.30948 -3.556072
## 18  3.4045  1.6121 -4.14164 -1.89888  0.71077 -2.078501
## 19  3.3407 -1.0447  2.30568 16.31944  1.20754 -3.666111
## 20  2.8979  1.0210 -0.19620 -0.11464  0.99339  3.682056
## 
## 
## Site constraints (linear combinations of constraining variables)
## 
##      RDA1     RDA2     RDA3     RDA4      PC1       PC2
## 1  -2.526  0.08145  0.45257 -0.03053  0.51723  0.950824
## 2  -2.521 -0.01740  0.34092  0.02806 -1.13510  0.006533
## 3  -3.427 -7.84713 -3.26690  0.69910 -5.78797 -1.305333
## 4  -2.447  0.61331  0.62749 -0.02328  0.83680 -2.315591
## 5  -1.590  8.02153  4.04186 -0.65863 -5.90652 -1.267982
## 6  -2.200  0.24532  0.03935  0.10196 -2.17651  5.239568
## 7  -2.544 -0.22557  0.24098  0.04859 -0.37229 -2.432215
## 8  -2.112  0.32538 -0.13620  0.21093  5.38816  0.370451
## 9  -2.381  0.23427  0.12984  0.21377  1.84240 -0.178256
## 10 -2.366  0.24449  0.10611  0.22618  1.30936 -2.378312
## 11 -1.529  0.36867 -0.44505 -0.15677  3.50236  1.854494
## 12 -1.347  0.36821 -0.51578 -0.29599  3.68195  3.279699
## 13  3.029 -0.45627 -0.74937 -5.26784 -1.26966 -4.205178
## 14  3.265  1.64412 -4.37158  0.10237 -3.60823  4.961012
## 15  3.562 -0.95186  2.35749  6.38661 -0.95382  0.556597
## 16  3.713 -4.32752  8.02468 -2.35583 -0.08934  2.482316
## 17  3.032 -0.45535 -0.75145 -5.26385  1.30948 -3.556072
## 18  3.372  2.39790 -5.63743  2.24308  0.71077 -2.078501
## 19  3.419 -0.48724  1.38623  6.39482  1.20754 -3.666111
## 20  3.598  0.22369 -1.87377 -2.60275  0.99339  3.682056
## 
## 
## Biplot scores for constraining variables
## 
##         RDA1    RDA2    RDA3     RDA4 PC1 PC2
## V1   0.93553 -0.1814  0.2009  0.22693   0   0
## V2   0.33723  0.2372 -0.7067 -0.57494   0   0
## V19 -0.10429 -0.9006 -0.4148  0.07706   0   0
## V6  -0.01327  0.3157 -0.5368  0.78231   0   0

Check the adjusted R2 value.

RsquareAdj(aem.mod.sel)

## $r.squared
## [1] 0.510934
## 
## $adj.r.squared
## [1] 0.3805164

Check the significance for the global model.

anova(aem.mod.sel, permutations = 999)

Check the significance for each variable.

anova(aem.mod.sel, by = "margin", permutations = 999)

Q6. What is the most significant AEM vector?

Check the significance for each axis.

anova(aem.mod.sel, by = "axis", permutations = 999)

Q7. Calculate the sum of the R2 of each significant AEM variable and indicate to what value this sum is equal. Explain why.

5. RDA with db-MEM and AEM

Combine the explanatory variables kept for the model.

full.pred <- cbind(dbmem.sub, aem.sub)
full.pred

##          MEM1         MEM2         V1           V2           V19          V6
## 1  -0.2809907 -1.547562957 -0.1838173 -0.090937135 -2.430096e-06 -0.01663753
## 2  -0.1807063 -1.783484359 -0.1827644 -0.089067486  1.116618e-02 -0.01066707
## 3   0.1403731 -1.854721170 -0.1827337 -0.089013440  7.079500e-01 -0.01052445
## 4   0.3023279 -1.769684874 -0.1826879 -0.088933035 -4.482090e-02 -0.01033489
## 5   0.6675319 -1.105368444 -0.1827351 -0.089015892 -7.041188e-01 -0.01053047
## 6   1.0894456  0.380798249 -0.1660053 -0.061713643 -3.670631e-11  0.01324554
## 7   0.9834678 -0.299636397 -0.1826850 -0.088928030  2.983202e-02 -0.01032342
## 8   1.1070556 -0.004790281 -0.1614657 -0.052909412 -1.803018e-12  0.02956177
## 9   1.0467669  0.754568101 -0.1759343 -0.077095445 -6.057488e-06  0.01627533
## 10  1.0381771  0.759362636 -0.1751474 -0.075735742  1.227973e-08  0.01831731
## 11  0.9405340  0.990609984 -0.1300466 -0.001896145 -3.378655e-11  0.02027138
## 12  0.8929247  1.131732159 -0.1200337  0.013446339  3.378730e-11  0.01437954
## 13  0.7318539  1.186950691  0.1258374  0.358345776  9.926865e-16 -0.31279531
## 14 -0.2855643  0.680965800  0.1523656  0.362374835  1.262551e-15  0.23260959
## 15 -1.5231833  0.450689819  0.4350216 -0.349621391  1.141927e-15  0.27112138
## 16 -0.8861806  0.159581280  0.4276912 -0.327859169  1.176471e-15 -0.60635927
## 17 -1.4390876  0.561318394  0.1261073  0.358385179  1.196360e-15 -0.31243583
## 18 -1.4065878  0.427041147  0.1679829  0.352654917  1.256370e-15  0.44023649
## 19 -1.4052371  0.425933527  0.4020427 -0.290026633  1.286117e-15  0.32532696
## 20 -1.5329207  0.455696695  0.1890075  0.327545550  1.196044e-15 -0.08073707

Rename the columns.

colnames(full.pred) <- c("MEM1","MEM2","AEM1","AEM2","AEM19","AEM6")

Run the model using all the selected variables.

full.rda <- rda(geno.pca.axes ~., full.pred)

Check the summary.

summary(full.rda)

## 
## Call:
## rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 +      AEM6, data = full.pred) 
## 
## Partitioning of variance:
##               Inertia Proportion
## Total          1275.9     1.0000
## Constrained     737.3     0.5778
## Unconstrained   538.6     0.4222
## 
## Eigenvalues, and their contribution to the variance 
## 
## Importance of components:
##                           RDA1     RDA2      RDA3     RDA4     RDA5      RDA6
## Eigenvalue            333.9872 136.0048 113.11421 80.93868 60.93670 12.278688
## Proportion Explained    0.2618   0.1066   0.08866  0.06344  0.04776  0.009624
## Cumulative Proportion   0.2618   0.3684   0.45702  0.52046  0.56822  0.577841
##                            PC1      PC2     PC3      PC4      PC5      PC6
## Eigenvalue            144.8671 130.2664 94.5467 93.73786 50.19313 20.30740
## Proportion Explained    0.1135   0.1021  0.0741  0.07347  0.03934  0.01592
## Cumulative Proportion   0.6914   0.7935  0.8676  0.94105  0.98039  0.99631
##                            PC7
## Eigenvalue            4.709185
## Proportion Explained  0.003691
## Cumulative Proportion 1.000000
## 
## Accumulated constrained eigenvalues
## Importance of components:
##                          RDA1     RDA2     RDA3    RDA4     RDA5     RDA6
## Eigenvalue            333.987 136.0048 113.1142 80.9387 60.93670 12.27869
## Proportion Explained    0.453   0.1845   0.1534  0.1098  0.08265  0.01665
## Cumulative Proportion   0.453   0.6375   0.7909  0.9007  0.98335  1.00000
## 
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores:  12.4779 
## 
## 
## Species scores
## 
##         RDA1    RDA2     RDA3     RDA4     RDA5      RDA6
## PC1  6.35170  0.1828 -0.02470 -0.11053  0.03627 -0.092580
## PC2  0.31955 -3.1721 -1.12455  1.45962 -0.74313 -0.001303
## PC3 -0.14815 -0.4328 -2.22159 -2.25860 -0.83003 -0.102542
## PC4  0.18281 -0.7028  1.84524 -0.22602 -1.18617 -0.126501
## PC5 -0.21745  0.4434  1.25304 -0.32942 -1.74743 -0.269721
## PC6 -0.44560 -0.5331  0.07833 -0.05214  0.84590 -1.148942
## PC7  0.09705 -2.3107  1.61916 -1.57185  1.00876  0.265471
## 
## 
## Site scores (weighted sums of species scores)
## 
##       RDA1    RDA2      RDA3    RDA4    RDA5      RDA6
## 1  -3.2937 -1.3095 -0.007111  2.2585  0.2840  -1.98437
## 2  -2.8887  1.2020 -2.266633 -1.1200  0.6678  -0.68314
## 3  -3.6852 -5.3383  0.135907 -2.9195  7.9687   0.26079
## 4  -2.9767 -2.1379 -1.394231  0.6187  0.4631 -15.83165
## 5  -1.9179 10.8106  6.951861 -4.7355  6.6882   2.85798
## 6  -2.2387 -1.7396 -2.863669 -0.4654  2.1011  22.75689
## 7  -2.5027 -0.6801 -0.915273 -1.9919  0.4729  -0.06564
## 8  -1.6480  0.4304  0.593421  2.9909 -7.9141  16.99369
## 9  -1.9575 -1.6232  0.965904  0.9599 -2.5042  -7.47468
## 10 -0.5031  1.8647 -0.241145  0.9771 -1.3624 -11.26996
## 11 -2.1929 -0.7801  1.203466  3.3338 -4.8817   6.02666
## 12 -1.0482  1.2956 -0.235065  2.5896 -7.4008  -8.28567
## 13  2.8617 -1.3246 -2.588534 -7.7935  0.6896  10.02199
## 14  3.5137  1.6544 -5.316732 -0.6019  3.6380  -9.50595
## 15  3.5447  0.1364 -0.013208 -1.3391 -0.1405   2.50193
## 16  3.8466 -4.6737  9.445549 -3.0830 -2.3622  -4.96216
## 17  3.4402  0.4453 -1.674470 -2.4019 -1.6233 -10.67473
## 18  3.3902  1.5184 -4.194277 -1.3305 -1.2434  -2.28222
## 19  3.3095 -0.6720  2.668613 13.3049  9.2031   3.67606
## 20  2.9467  0.9211 -0.254371  0.7489 -2.7439   7.92418
## 
## 
## Site constraints (linear combinations of constraining variables)
## 
##       RDA1     RDA2     RDA3    RDA4      RDA5      RDA6
## 1  -2.8780  0.56562  0.69511  0.2307  1.177256 -4.263285
## 2  -2.9663  0.43001  0.51857 -0.6246  2.486141 -3.402870
## 3  -3.7463 -7.71624 -3.00736 -0.7926  3.737836 -1.273776
## 4  -2.9195  0.85779  0.60478 -2.0588  3.846948 -0.736803
## 5  -1.9828  8.22881  3.75340 -2.3458  2.225450  0.382140
## 6  -1.9132 -0.04767 -0.06995  0.7122 -1.865286  2.059400
## 7  -2.4703 -0.39712  0.14662 -0.2702 -0.053427  1.613470
## 8  -1.9827  0.03452 -0.30937 -0.3453 -0.030168  2.751398
## 9  -1.9334 -0.04550  0.10124  2.0841 -3.735667  1.206748
## 10 -1.9183 -0.03389  0.07918  2.1090 -3.744880  1.184971
## 11 -1.0754  0.03290 -0.51401  1.5713 -3.719491  1.577922
## 12 -0.8556  0.02805 -0.57255  1.7196 -4.196113  1.394292
## 13  3.1385 -1.07136 -1.23300 -6.8746 -0.610255  4.920849
## 14  3.1636  1.31760 -4.63832 -1.9597  2.591093  3.149536
## 15  3.5948 -0.73446  2.55099  5.6645  2.805597  0.787996
## 16  3.6590 -4.11180  8.06129 -2.8755 -0.004225 -0.001447
## 17  2.9912 -0.06051 -0.47652 -2.1697 -4.446880 -6.563901
## 18  3.1980  2.51184 -5.54930  1.9307  1.971425 -1.472669
## 19  3.4280 -0.32551  1.53572  5.3765  3.196954  1.208547
## 20  3.4686  0.53694 -1.67652 -1.0821 -1.632307 -4.522520
## 
## 
## Biplot scores for constraining variables
## 
##           RDA1     RDA2     RDA3     RDA4     RDA5     RDA6
## MEM1  -0.71947  0.06795 -0.01448 -0.19319 -0.33311  0.57381
## MEM2   0.59660  0.03483 -0.13735  0.17019 -0.64475  0.42347
## AEM1   0.93069 -0.15860  0.21423  0.15547  0.19598 -0.01386
## AEM2   0.33494  0.21350 -0.72306 -0.48298 -0.25463 -0.14597
## AEM19 -0.09874 -0.90578 -0.38379  0.09359  0.07477 -0.09037
## AEM6  -0.01367  0.31151 -0.52951  0.68333  0.36441  0.15053

Obtain the adjusted R2.

RsquareAdj(full.rda)

## $r.squared
## [1] 0.5778409
## 
## $adj.r.squared
## [1] 0.3829982

Check the significance for the global model.

anova(full.rda, permutations = 999)

## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
##          Df Variance      F Pr(>F)    
## Model     6   737.26 2.9657  0.001 ***
## Residual 13   538.63                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the significance for each variable.

anova(full.rda, by = "margin", permutations = 999)

## Permutation test for rda under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
##          Df Variance      F Pr(>F)    
## MEM1      1    33.51 0.8088  0.621    
## MEM2      1    65.80 1.5881  0.104    
## AEM1      1    72.84 1.7580  0.078 .  
## AEM2      1   102.61 2.4765  0.009 ** 
## AEM19     1   133.26 3.2163  0.001 ***
## AEM6      1    78.28 1.8892  0.066 .  
## Residual 13   538.63                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check the significance for each axis.

anova(full.rda, by = "axis", permutations = 999)

## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = geno.pca.axes ~ MEM1 + MEM2 + AEM1 + AEM2 + AEM19 + AEM6, data = full.pred)
##          Df Variance      F Pr(>F)    
## RDA1      1   333.99 8.0609  0.001 ***
## RDA2      1   136.00 3.2825  0.062 .  
## RDA3      1   113.11 2.7301  0.167    
## RDA4      1    80.94 1.9535  0.428    
## RDA5      1    60.94 1.4707  0.473    
## RDA6      1    12.28 0.2964  0.948    
## Residual 13   538.63                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

6. Partial RDA on AEM and db-MEM vectors

Perform a partial RDA for ocean current corrected by dbMEM.

prda1 <- rda(geno.pca.axes ~ AEM1 + AEM2 + AEM19 + AEM6 + Condition(MEM1+MEM2), data = full.pred)

Test the significance of the model.

anova(prda1, permutations = 999)

Check the rda outputs.

summary(prda1)

7. Variance Partitioning

Varpart partitions the variation in genomic data with respect to two, three or four explanatory variables using the adjusted R2. Here we have two explanatory variables - AEM and db-MEM.

vp1 <- varpart(geno.pca.axes, aem.sub, dbmem.sub)
vp1

## 
## Partition of variance in RDA 
## 
## Call: varpart(Y = geno.pca.axes, X = aem.sub, dbmem.sub)
## 
## Explanatory tables:
## X1:  aem.sub
## X2:  dbmem.sub 
## 
## No. of explanatory tables: 2 
## Total variation (SS): 24242 
##             Variance: 1275.9 
## No. of observations: 20 
## 
## Partition table:
##                      Df R.squared Adj.R.squared Testable
## [a+b] = X1            4   0.51093       0.38052     TRUE
## [b+c] = X2            2   0.26524       0.17879     TRUE
## [a+b+c] = X1+X2       6   0.57784       0.38300     TRUE
## Individual fractions                                    
## [a] = X1|X2           4                 0.20420     TRUE
## [b]                   0                 0.17631    FALSE
## [c] = X2|X1           2                 0.00248     TRUE
## [d] = Residuals                         0.61700    FALSE
## ---
## Use function 'rda' to test significance of fractions of interest

Represent the partitioning using a Venn diagram. Cercles X1 and X2 illustrate the same process: matrices X1 and X2 (both explanatory) each explain a portion of the variability in matrix Y (a response matrix). Each explanatory matrix uniquely explains a portion of the variation in Y (X1 explains partition “a” and X2 explains partition “c”). Additionally, both matrices are able to explain another portion of the variation in Y (partition “b”). The residual variation in matrix Y (partition “d”) is not explained by X1 or X2.

showvarparts(2, bg=2:4)

plot(vp1, bg = 2:4)

Q8. What percentage of variation is explained by db-MEM and AEM. What percentage is only explained by AEM vectors? What kind of conclusion could we drawn from this observation.

8. Plots Biplot

RDA ordinations may be presented as a biplot. The interpretation of this plots depends on what scaling has been chosen. In general, consider scaling = 2 as we are more interested in representing the response variables (sampling sites) than the explanatory variables (db-MEM and AEMs).

Download libraries.

library(vegan)
library(cowplot)

Getting the scores for plotting.

scrs <- scores(full.rda, display = c("species", "sites", "bp"), choices = 1:4, scaling = 2)
scrs

## $species
##            RDA1       RDA2        RDA3        RDA4
## PC1  6.35169814  0.1828037 -0.02469740 -0.11052585
## PC2  0.31955251 -3.1720788 -1.12454588  1.45962298
## PC3 -0.14814782 -0.4328444 -2.22158565 -2.25860078
## PC4  0.18281483 -0.7027507  1.84523577 -0.22602062
## PC5 -0.21745248  0.4434235  1.25303557 -0.32941578
## PC6 -0.44560399 -0.5331235  0.07833302 -0.05214105
## PC7  0.09705276 -2.3106842  1.61915583 -1.57185389
## 
## $sites
##          RDA1       RDA2         RDA3       RDA4
## 1  -3.2937309 -1.3095410 -0.007111301  2.2585174
## 2  -2.8886984  1.2020129 -2.266632797 -1.1199571
## 3  -3.6852388 -5.3383085  0.135906635 -2.9194700
## 4  -2.9767109 -2.1378877 -1.394231308  0.6187001
## 5  -1.9179058 10.8106438  6.951860581 -4.7354518
## 6  -2.2386627 -1.7396261 -2.863668712 -0.4654120
## 7  -2.5027181 -0.6800963 -0.915273442 -1.9918757
## 8  -1.6479666  0.4304491  0.593420562  2.9908597
## 9  -1.9575256 -1.6232019  0.965903743  0.9599266
## 10 -0.5030823  1.8646960 -0.241144846  0.9770617
## 11 -2.1928917 -0.7800897  1.203465928  3.3337588
## 12 -1.0482215  1.2956428 -0.235065170  2.5895538
## 13  2.8617235 -1.3245688 -2.588533849 -7.7935447
## 14  3.5137456  1.6543783 -5.316732421 -0.6019437
## 15  3.5447463  0.1364092 -0.013208212 -1.3391314
## 16  3.8466374 -4.6736600  9.445549393 -3.0829697
## 17  3.4401557  0.4452516 -1.674469630 -2.4019175
## 18  3.3902074  1.5183614 -4.194276668 -1.3304764
## 19  3.3094612 -0.6719693  2.668612513 13.3048605
## 20  2.9466763  0.9211042 -0.254370999  0.7489114
## 
## $biplot
##              RDA1        RDA2        RDA3        RDA4
## MEM1  -0.71947066  0.06795135 -0.01447611 -0.19319025
## MEM2   0.59659679  0.03482658 -0.13735298  0.17019052
## AEM1   0.93068886 -0.15859812  0.21423280  0.15546584
## AEM2   0.33494487  0.21349601 -0.72306069 -0.48297886
## AEM19 -0.09873530 -0.90578251 -0.38378649  0.09359047
## AEM6  -0.01366803  0.31151497 -0.52950980  0.68332735
## 
## attr(,"const")
## [1] 12.4779

Transform the information obtained from the model to a data frame.

species_centroids <- data.frame(scrs$species)
species_centroids

##            RDA1       RDA2        RDA3        RDA4
## PC1  6.35169814  0.1828037 -0.02469740 -0.11052585
## PC2  0.31955251 -3.1720788 -1.12454588  1.45962298
## PC3 -0.14814782 -0.4328444 -2.22158565 -2.25860078
## PC4  0.18281483 -0.7027507  1.84523577 -0.22602062
## PC5 -0.21745248  0.4434235  1.25303557 -0.32941578
## PC6 -0.44560399 -0.5331235  0.07833302 -0.05214105
## PC7  0.09705276 -2.3106842  1.61915583 -1.57185389

Add the PC components names.

species_centroids$PC_names <- rownames(species_centroids)

Add explanatory variables coordinates to create arrows on the biplot.

aem_continuous_arrows <- data.frame(scrs$biplot)
aem_continuous_arrows

##              RDA1        RDA2        RDA3        RDA4
## MEM1  -0.71947066  0.06795135 -0.01447611 -0.19319025
## MEM2   0.59659679  0.03482658 -0.13735298  0.17019052
## AEM1   0.93068886 -0.15859812  0.21423280  0.15546584
## AEM2   0.33494487  0.21349601 -0.72306069 -0.48297886
## AEM19 -0.09873530 -0.90578251 -0.38378649  0.09359047
## AEM6  -0.01366803  0.31151497 -0.52950980  0.68332735

Add rownames.

rownames(aem_continuous_arrows) <- c("MEM1","MEM2","AEM1", "AEM2", "AEM19", "AEM6")

Add AEM names to the arrows.

aem_continuous_arrows$aem_number <- rownames(aem_continuous_arrows)
aem_continuous_arrows

##              RDA1        RDA2        RDA3        RDA4 aem_number
## MEM1  -0.71947066  0.06795135 -0.01447611 -0.19319025       MEM1
## MEM2   0.59659679  0.03482658 -0.13735298  0.17019052       MEM2
## AEM1   0.93068886 -0.15859812  0.21423280  0.15546584       AEM1
## AEM2   0.33494487  0.21349601 -0.72306069 -0.48297886       AEM2
## AEM19 -0.09873530 -0.90578251 -0.38378649  0.09359047      AEM19
## AEM6  -0.01366803  0.31151497 -0.52950980  0.68332735       AEM6

Create the biplot.

baseplot<-plot(full.rda, scaling = 2)

Use the arrow.mul argument to make the scale of the arrows comparable to one another.

mult <- attributes(baseplot$biplot)$arrow.mul

Add the sampling site names.

sitenames <- sites$Abb

Remove the samples from Alaska.

sitenames <- sitenames[-c(21:24)]
sitenames

##  [1] OGD SGI LAS JER TOF CRA RBY SHE MAL QUA HOP TBL CAL TOL PRI LEG JUA SEL REN
## [20] MAZ
## 24 Levels: AK1 AK2 AK3 AK4 CAL CRA HOP JER JUA LAS LEG MAL MAZ OGD PRI ... TOL

Add colour to the sampling sites regarding their locations (North or South).

sitecols <- c(rep("black", 12), rep("lightgrey", 8))

Add the region information.

sitereg <- c(rep("S", 12), rep("N", 8))

Keep the response variables coordinates for the RDA model.

sites_centroids <- data.frame(scrs$sites)
sites_centroids$SITE <- sitenames
sites_centroids$REGION <- as.factor(sitereg)
head(sites_centroids)

##        RDA1      RDA2         RDA3       RDA4 SITE REGION
## 1 -3.293731 -1.309541 -0.007111301  2.2585174  OGD      S
## 2 -2.888698  1.202013 -2.266632797 -1.1199571  SGI      S
## 3 -3.685239 -5.338309  0.135906635 -2.9194700  LAS      S
## 4 -2.976711 -2.137888 -1.394231308  0.6187001  JER      S
## 5 -1.917906 10.810644  6.951860581 -4.7354518  TOF      S
## 6 -2.238663 -1.739626 -2.863668712 -0.4654120  CRA      S

Make a ggplot.

RDA_plot <- ggplot(data = sites_centroids, aes(x = RDA1, y = RDA2))+
  geom_point(data = sites_centroids, pch = 21, size = 2, aes(fill = REGION))+
  scale_fill_manual(values = c(S = "black", N = "lightgrey"), name = "Region", levels(factor(sites_centroids$REGION)), labels = c("North", "South"))+
  geom_text(data = aem_continuous_arrows,
            aes(x= (mult + mult/10) * RDA1, y = (mult + mult/10) * RDA2,
                label = aem_number), size = 4, hjust = 0.5)+
  geom_text(data = sites_centroids, aes(x = RDA1, y = RDA2, label = SITE), 
            size = 2.5, col = "black", hjust = 1.2)+
  geom_segment(data = aem_continuous_arrows,
               aes(x = 0, xend = mult * RDA1, y = 0, yend = mult * RDA2),
               arrow = arrow(length = unit(0.25, "cm")), colour = "grey") +
  theme_bw() + 
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
  geom_hline(yintercept = 0, lty = "dotted") + geom_vline(xintercept = 0, lty = "dotted") +
  labs(x = paste("RDA1 (", round(summary(full.rda)$cont$importance[2,1]*100, 2), "%)", sep = ""), y = paste("RDA2 (", round(summary(full.rda)$cont$importance[2,2]*100, 2), "%)", sep = ""))+
  theme(axis.text = element_text(colour = "black", size = 12)) +
  theme(axis.title = element_text(size = 18, colour = "black", family = "Helvetica", face = "bold")) 

RDA_plot

Save the RDA.

ggsave("RDA_AEM_dbMEM.pdf", width = 15, height = 15, dpi = 600, units = "cm", useDingbats = F)

RDA

Laura Benestan

9/28/2020