Load libraries and data

library("here")

zp <- read.csv(here("Week_14", "Data", "mangrove_fishpond_zooplankton_edit.csv"))

PCA

Start by characterizing how the environmental variables (columns 37:49) are correlated with each other across the ten sites, using a principal components analysis. Provide an appropriate plot and summary of the PCA results. What are the major patterns of variation across these sites?

# we can ask “what is the dominant axis of variation among these two variables?”. PCA answers this question by finding the the line through the data that explains the most variation in the data.

enviro.small = subset(zp, select = c(37:49)) #env variables 
enviro.small = na.omit(enviro.small)

pca = princomp(enviro.small, cor = TRUE) #each variable has the same total variance
summary(pca)

## Importance of components:
##                           Comp.1    Comp.2    Comp.3     Comp.4     Comp.5
## Standard deviation     2.4591664 1.5418092 1.3316315 1.09389399 0.90423622
## Proportion of Variance 0.4651923 0.1828597 0.1364033 0.09204647 0.06289563
## Cumulative Proportion  0.4651923 0.6480519 0.7844552 0.87650164 0.93939726
##                            Comp.6     Comp.7      Comp.8      Comp.9
## Standard deviation     0.69506314 0.49771057 0.198369951 0.132877250
## Proportion of Variance 0.03716252 0.01905506 0.003026972 0.001358182
## Cumulative Proportion  0.97655978 0.99561485 0.998641818 1.000000000
##                             Comp.10      Comp.11 Comp.12 Comp.13
## Standard deviation     2.499503e-08 1.077036e-08       0       0
## Proportion of Variance 4.805782e-17 8.923119e-18       0       0
## Cumulative Proportion  1.000000e+00 1.000000e+00       1       1

Info on the axes

# What are these axes? We can see with loadings():
loadings(pca)

## 
## Loadings:
##                            Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## mangrove.shoreline          0.329         0.352  0.270         0.122  0.205
## non.mangrove.shoreline      0.197 -0.517  0.163  0.192 -0.155  0.150       
## percent.mangrove.shoreline  0.238  0.420  0.266         0.316  0.226       
## pond.area                   0.351         0.235  0.219         0.289  0.376
## d13C                       -0.112 -0.465        -0.193  0.498  0.509 -0.352
## C                          -0.213  0.315  0.236  0.478         0.133 -0.629
## d15N                        0.141  0.300  0.281 -0.581 -0.344  0.181 -0.173
## N                                         0.574 -0.142  0.471 -0.578       
## relief                     -0.354  0.121  0.151 -0.228         0.381  0.290
## percent.trees              -0.340  0.231 -0.179         0.269  0.194  0.382
## precip                     -0.357                0.399                0.143
## percent.impervious          0.330  0.179 -0.314         0.312              
## percent.developed           0.334  0.167 -0.304         0.313              
##                            Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## mangrove.shoreline                 0.406  0.658   0.176                 
## non.mangrove.shoreline     -0.415        -0.237          -0.379   0.456 
## percent.mangrove.shoreline  0.362 -0.176 -0.179  -0.348  -0.416   0.238 
## pond.area                         -0.351 -0.368   0.138   0.369  -0.343 
## d13C                        0.206  0.103                  0.179         
## C                          -0.224 -0.195          0.213          -0.120 
## d15N                               0.309 -0.221   0.202   0.243   0.196 
## N                          -0.213        -0.119   0.108                 
## relief                     -0.552 -0.135  0.217  -0.373          -0.200 
## percent.trees                            -0.111   0.650  -0.111   0.304 
## precip                             0.536 -0.335  -0.367   0.336   0.150 
## percent.impervious         -0.338  0.429 -0.294          -0.244  -0.458 
## percent.developed          -0.347 -0.197  0.138  -0.161   0.509   0.438 
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077
## Cumulative Var  0.077  0.154  0.231  0.308  0.385  0.462  0.538  0.615  0.692
##                Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings      1.000   1.000   1.000   1.000
## Proportion Var   0.077   0.077   0.077   0.077
## Cumulative Var   0.769   0.846   0.923   1.000

Biplots

The idea of the biplot is to summarize PCA results by plotting two things at once, using two PC axes: 1) the scores of the observations, 2) the relationship between the original variables and the PC axes.

par(mfrow = c(1,2)) 
# distance biplot, scaled so distance bt the observation scores approximates the euclidean distance bt the observations in raw data
biplot(pca, pch = 21, col = c('grey', 'blue'), scale = 1) 

# correlation biplot, scald so angle bt the vectors approximates the correlation bt those vectors, and angle bt vector and axis approximates the correlation bt the vector and PC component
biplot(pca, pch = 21, col = c('grey', 'blue'), scale = 0)

Next, characterize the major axes of variation in zooplankton community composition (i.e., an unconstrained ordination). Consider how to transform and standardize the community data before the analysis. Provide an appropriate plot and interpret what it means. Finally, use envfit() to assess which environmental variables are correlated with the major axes of community composition.

community <- zp[, 8:35]
head(community)

##   polychaete barnacle copepod monstrillidae para.cope amphipod cumacean Lucifer
## 1         76        0   63808            58         1       59        1      67
## 2          8        0    5664            64         0      608       16      56
## 3          0        0    3752             9         0        0        4       0
## 4          0        0  221184             1         0       42        6      12
## 5          0        0    9264            17         0       85        5       4
## 6          2        1      92            38         0       23      110       0
##   shrimp megalopa1 megalopa2 megalopa.oth zoea postlarval.crab lobster isopod
## 1     62       127         9            0 1616               0       1      0
## 2    288       512        32            0 8608               1       0     12
## 3   1272         1         1            0  936               0       2     11
## 4   1088        30         5            0 1547               0       8     24
## 5     37        31        15            3   20               0       0     34
## 6     59         6         4            4  355               1       1      9
##   mysid mantis.shrimp ostracod cephalopod fish hydropoids jellyfish bivalve
## 1    52             0        0          1    4          0         0       0
## 2     0             0        0          0    8          0         0       0
## 3     0             0        8          0    2          0         0       0
## 4     0             2        0          2    2          0         0       0
## 5     0            11        0          0    3          0         0       0
## 6    14             1        0          0    2          0         0       0
##   gastropod nematode leech nauplius
## 1         0        1     0        1
## 2         0        0     0        0
## 3         0        0     7        0
## 4         2        0     0        0
## 5         0        2     0        0
## 6         3        0     9        0

#ord = metaMDS(community, distance = "bray", k = 2, trymax = 100)
#ord2 = metaMDS(community, distance = "bray", k = 3, trymax = 100)
#ord3 = metaMDS(community, distance = "bray", k = 4, trymax = 100)

Although envfit() is a useful tool, it does not directly ask how well community composition can be explained by a set of environmental variables. Use a constrained ordination to ask this question. How much of variation in community composition can be explained by the set of environmental variables? How much can be explained by the first two axes? What do those axes represent (visualizing with a triplot will help)?
The authors were interested in whether zooplankton composition is affected by the presence of mangroves, and whether composition is affected by the presence of a fishpond. Use a PERMANOVA to test these questions. Considering all the results together, what have you learned about this system?

Assignment 14 PCA

Load libraries and data

PCA

Info on the axes

Biplots