library("here")
zp <- read.csv(here("Week_14", "Data", "mangrove_fishpond_zooplankton_edit.csv"))
# we can ask “what is the dominant axis of variation among these two variables?”. PCA answers this question by finding the the line through the data that explains the most variation in the data.
enviro.small = subset(zp, select = c(37:49)) #env variables
enviro.small = na.omit(enviro.small)
pca = princomp(enviro.small, cor = TRUE) #each variable has the same total variance
summary(pca)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.4591664 1.5418092 1.3316315 1.09389399 0.90423622
## Proportion of Variance 0.4651923 0.1828597 0.1364033 0.09204647 0.06289563
## Cumulative Proportion 0.4651923 0.6480519 0.7844552 0.87650164 0.93939726
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.69506314 0.49771057 0.198369951 0.132877250
## Proportion of Variance 0.03716252 0.01905506 0.003026972 0.001358182
## Cumulative Proportion 0.97655978 0.99561485 0.998641818 1.000000000
## Comp.10 Comp.11 Comp.12 Comp.13
## Standard deviation 2.499503e-08 1.077036e-08 0 0
## Proportion of Variance 4.805782e-17 8.923119e-18 0 0
## Cumulative Proportion 1.000000e+00 1.000000e+00 1 1
# What are these axes? We can see with loadings():
loadings(pca)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
## mangrove.shoreline 0.329 0.352 0.270 0.122 0.205
## non.mangrove.shoreline 0.197 -0.517 0.163 0.192 -0.155 0.150
## percent.mangrove.shoreline 0.238 0.420 0.266 0.316 0.226
## pond.area 0.351 0.235 0.219 0.289 0.376
## d13C -0.112 -0.465 -0.193 0.498 0.509 -0.352
## C -0.213 0.315 0.236 0.478 0.133 -0.629
## d15N 0.141 0.300 0.281 -0.581 -0.344 0.181 -0.173
## N 0.574 -0.142 0.471 -0.578
## relief -0.354 0.121 0.151 -0.228 0.381 0.290
## percent.trees -0.340 0.231 -0.179 0.269 0.194 0.382
## precip -0.357 0.399 0.143
## percent.impervious 0.330 0.179 -0.314 0.312
## percent.developed 0.334 0.167 -0.304 0.313
## Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## mangrove.shoreline 0.406 0.658 0.176
## non.mangrove.shoreline -0.415 -0.237 -0.379 0.456
## percent.mangrove.shoreline 0.362 -0.176 -0.179 -0.348 -0.416 0.238
## pond.area -0.351 -0.368 0.138 0.369 -0.343
## d13C 0.206 0.103 0.179
## C -0.224 -0.195 0.213 -0.120
## d15N 0.309 -0.221 0.202 0.243 0.196
## N -0.213 -0.119 0.108
## relief -0.552 -0.135 0.217 -0.373 -0.200
## percent.trees -0.111 0.650 -0.111 0.304
## precip 0.536 -0.335 -0.367 0.336 0.150
## percent.impervious -0.338 0.429 -0.294 -0.244 -0.458
## percent.developed -0.347 -0.197 0.138 -0.161 0.509 0.438
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077
## Cumulative Var 0.077 0.154 0.231 0.308 0.385 0.462 0.538 0.615 0.692
## Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings 1.000 1.000 1.000 1.000
## Proportion Var 0.077 0.077 0.077 0.077
## Cumulative Var 0.769 0.846 0.923 1.000
The idea of the biplot is to summarize PCA results by plotting two things at once, using two PC axes: 1) the scores of the observations, 2) the relationship between the original variables and the PC axes.
par(mfrow = c(1,2))
# distance biplot, scaled so distance bt the observation scores approximates the euclidean distance bt the observations in raw data
biplot(pca, pch = 21, col = c('grey', 'blue'), scale = 1)
# correlation biplot, scald so angle bt the vectors approximates the correlation bt those vectors, and angle bt vector and axis approximates the correlation bt the vector and PC component
biplot(pca, pch = 21, col = c('grey', 'blue'), scale = 0)
community <- zp[, 8:35]
head(community)
## polychaete barnacle copepod monstrillidae para.cope amphipod cumacean Lucifer
## 1 76 0 63808 58 1 59 1 67
## 2 8 0 5664 64 0 608 16 56
## 3 0 0 3752 9 0 0 4 0
## 4 0 0 221184 1 0 42 6 12
## 5 0 0 9264 17 0 85 5 4
## 6 2 1 92 38 0 23 110 0
## shrimp megalopa1 megalopa2 megalopa.oth zoea postlarval.crab lobster isopod
## 1 62 127 9 0 1616 0 1 0
## 2 288 512 32 0 8608 1 0 12
## 3 1272 1 1 0 936 0 2 11
## 4 1088 30 5 0 1547 0 8 24
## 5 37 31 15 3 20 0 0 34
## 6 59 6 4 4 355 1 1 9
## mysid mantis.shrimp ostracod cephalopod fish hydropoids jellyfish bivalve
## 1 52 0 0 1 4 0 0 0
## 2 0 0 0 0 8 0 0 0
## 3 0 0 8 0 2 0 0 0
## 4 0 2 0 2 2 0 0 0
## 5 0 11 0 0 3 0 0 0
## 6 14 1 0 0 2 0 0 0
## gastropod nematode leech nauplius
## 1 0 1 0 1
## 2 0 0 0 0
## 3 0 0 7 0
## 4 2 0 0 0
## 5 0 2 0 0
## 6 3 0 9 0
#ord = metaMDS(community, distance = "bray", k = 2, trymax = 100)
#ord2 = metaMDS(community, distance = "bray", k = 3, trymax = 100)
#ord3 = metaMDS(community, distance = "bray", k = 4, trymax = 100)
Although envfit() is a useful tool, it does not directly ask how well community composition can be explained by a set of environmental variables. Use a constrained ordination to ask this question. How much of variation in community composition can be explained by the set of environmental variables? How much can be explained by the first two axes? What do those axes represent (visualizing with a triplot will help)?
The authors were interested in whether zooplankton composition is affected by the presence of mangroves, and whether composition is affected by the presence of a fishpond. Use a PERMANOVA to test these questions. Considering all the results together, what have you learned about this system?