We’ll read in the data and take a look at how the study is organized.
library(scatterplot3d)
library(vegan)
library(dplyr)
library(ggplot2)
library(reshape2)
options(stringsAsFactors = FALSE)
setwd('S:/OSMP/SCIENCE_OFFICER/Projects/Misc_analyses/jarret/')
fn <- 'Gerorgia Pass Species Comp Data Sheet_DATA CLEAN UP INPROGRESS 7.1 HAR.csv'
fn2 <- 'Gerorgia Pass Species Comp Data Sheet_DATA CLEAN UP INPROGRESS 7.1 HAR_ Level 1 Hits Only.csv'
d <- read.csv(fn)
names(d) <- tolower(names(d))
d2 <- subset(d, site != '')
d2$site2 <- gsub('[A-Z]| ', '', d2$site)
d2$site3 <- gsub('[0-9]$', '0', d2$site2)
d3 <- d2[order(d2$site3),]
table(d3$treatment, d3$treatmentcombine)
##
## Control Matting Rock Cover
## Control East 12 0 0
## Control West 18 0 0
## Matting 0 18 0
## Rock Cover 0 0 12
table(d3$date)
##
## 1/21/16 7/21/16 7/22/16 7/23/16 7/25/16 Unknown
## 1 7 17 22 9 4
table(d3$site3)
##
## 000 100 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200
## 2 2 2 2 3 3 2 2 2 2 2 2 2 2 2
## 2300 2400 250 2500 2600 2700 2800 300 400 500 600 700 800 900
## 2 2 2 2 2 2 2 2 2 3 1 2 2 2
sort(d3$site)
## [1] "002 CW" "002 RR" "1002 CW" "1002 RM" "102 CW" "102 RM"
## [7] "1100 CE" "1102 RM" "1200 CW" "1200 RM" "1202 CE" "1300 CE"
## [13] "1300 RC" "1300 RM" "1400 RM" "1402 CW" "1500 CW" "1500 RR"
## [19] "1602 RC" "1602 CW" "1702 CE" "1702 RM" "1802 CW" "1802 RC"
## [25] "1902 CW" "1902 RC" "2000 CE" "2002 RM" "2100 CW" "2102 RM"
## [31] "2202 CE" "2202 RC" "2300 CW" "2300 RC" "2400 CW" "2402 RM"
## [37] "250 CE" "250 RR" "2500 CE" "2502 RC" "2600 CE" "2602 RM"
## [43] "2700 CW" "2702 RC" "2800 CW" "2802 RC" "302 CW" "302 RM"
## [49] "402 CW" "402 RM" "502 CE" "502 RM" "504 RM" "602 CE"
## [55] "700 CE" "700 RM" "800 RM" "802 CW" "900 RM" "902 CW"
You said “I am interested in doing a NMDS for composition to compare composition between the two treatments excluding bare ground, liter, moss, and lichen. I believe I can do this including the controls or excluding them? It would also be interesting to compare the results of the NMDS using the 1 and 3 level data sets.”
You can do it however you want, it’s just a matter of what hypothesis you want to test. To test whether the communities in the treatments are different than in the controls, you’d want to ordinate the controls and treatments together.
We’ll take a peak at the ordination with the 3 level data first.
spp_mat <- dplyr:::select(d3, acmi:zrgele)
n <- metaMDS(spp_mat, k = 3)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1052727
## Run 1 stress 0.1032255
## ... New best solution
## ... Procrustes: rmse 0.06216054 max resid 0.2494565
## Run 2 stress 0.1032236
## ... New best solution
## ... Procrustes: rmse 0.001092126 max resid 0.004593825
## ... Similar to previous best
## Run 3 stress 0.1088181
## Run 4 stress 0.1062429
## Run 5 stress 0.105387
## Run 6 stress 0.1050632
## Run 7 stress 0.1043834
## Run 8 stress 0.1062822
## Run 9 stress 0.1043978
## Run 10 stress 0.1052351
## Run 11 stress 0.108677
## Run 12 stress 0.1032242
## ... Procrustes: rmse 0.0009164418 max resid 0.004091523
## ... Similar to previous best
## Run 13 stress 0.1071435
## Run 14 stress 0.106208
## Run 15 stress 0.1031422
## ... New best solution
## ... Procrustes: rmse 0.007989671 max resid 0.0480741
## Run 16 stress 0.1032352
## ... Procrustes: rmse 0.009825673 max resid 0.05053581
## Run 17 stress 0.1032101
## ... Procrustes: rmse 0.007208341 max resid 0.04715309
## Run 18 stress 0.110659
## Run 19 stress 0.103129
## ... New best solution
## ... Procrustes: rmse 0.003553303 max resid 0.01441219
## Run 20 stress 0.110367
## *** No convergence -- monoMDS stopping criteria:
## 1: no. of iterations >= maxit
## 19: stress ratio > sratmax
cols <- as.numeric(as.factor(d3$treatmentcombine))
plot(n$points, col = cols, pch = 20)
scatterplot3d(n$points, color = cols, pch = 20)
Wasn’t getting convergence with k = 2, which implies increasing k until you converge. Convergence was reached at k = 3. Looking at the plot, it looks like all the separation with regards to the treatment and control is on axis 1.
d4 <- dplyr:::select(d3, site3, treatmentcombine, total.cover, species.count, x2bg, x2l, x2r)
d5 <- melt(data = d4, id.vars = c('site3', 'treatmentcombine'))
d5$trt <- factor(d5$treatmentcombine, levels = c('Matting', 'Control', 'Rock Cover'))
ggplot(data = d5, aes(x = trt, y = value)) +
geom_boxplot() +
facet_wrap(~variable, scales = 'free')
ggplot(data = d5, aes(x = trt, y = value, group = site3)) +
geom_point() +
geom_line()+
facet_wrap(~variable, scales = 'free')
Differences everywhere you look. Some interesting variance among the paired sites (lines connect the pairs).
d6 <- split(d5, d5$variable)
lapply(d6, function(x) summary(aov(value ~ trt, data = x)))
## $total.cover
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 2 100017 50008 47.4 7.53e-13 ***
## Residuals 57 60140 1055
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $species.count
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 2 2077.8 1038.9 89.52 <2e-16 ***
## Residuals 57 661.5 11.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $x2bg
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 2 8513 4257 27.98 3.41e-09 ***
## Residuals 57 8670 152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $x2l
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 2 6541 3271 65.25 1.83e-15 ***
## Residuals 57 2857 50
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## $x2r
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 2 3074 1537 21.66 1.01e-07 ***
## Residuals 57 4045 71
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Massive differences everywhere you look. When can also look at r2 values.
lapply(d6, function(x) summary(lm(value ~ trt, data = x)))
## $total.cover
##
## Call:
## lm(formula = value ~ trt, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -98.833 -14.333 0.833 15.167 119.167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.167 7.656 4.854 9.77e-06 ***
## trtControl 77.667 9.684 8.020 6.37e-11 ***
## trtRock Cover -9.333 12.105 -0.771 0.444
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.48 on 57 degrees of freedom
## Multiple R-squared: 0.6245, Adjusted R-squared: 0.6113
## F-statistic: 47.4 on 2 and 57 DF, p-value: 7.528e-13
##
##
## $species.count
##
## Call:
## lm(formula = value ~ trt, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1333 -1.2222 -0.1333 2.8667 7.7778
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2222 0.8030 5.258 2.27e-06 ***
## trtControl 11.9111 1.0157 11.727 < 2e-16 ***
## trtRock Cover 0.3611 1.2696 0.284 0.777
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.407 on 57 degrees of freedom
## Multiple R-squared: 0.7585, Adjusted R-squared: 0.75
## F-statistic: 89.52 on 2 and 57 DF, p-value: < 2.2e-16
##
##
## $x2bg
##
## Call:
## lm(formula = value ~ trt, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -21.167 -8.819 -2.322 10.308 30.133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.778 2.907 7.148 1.81e-09 ***
## trtControl -5.911 3.677 -1.608 0.113
## trtRock Cover 25.389 4.596 5.524 8.54e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.33 on 57 degrees of freedom
## Multiple R-squared: 0.4954, Adjusted R-squared: 0.4777
## F-statistic: 27.98 on 2 and 57 DF, p-value: 3.414e-09
##
##
## $x2l
##
## Call:
## lm(formula = value ~ trt, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.222 -3.458 -1.694 2.792 20.833
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.222 1.669 16.913 < 2e-16 ***
## trtControl -23.056 2.111 -10.923 1.34e-15 ***
## trtRock Cover -22.056 2.638 -8.359 1.74e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.08 on 57 degrees of freedom
## Multiple R-squared: 0.696, Adjusted R-squared: 0.6853
## F-statistic: 65.25 on 2 and 57 DF, p-value: 1.826e-15
##
##
## $x2r
##
## Call:
## lm(formula = value ~ trt, data = x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.750 -3.800 -2.206 2.535 28.389
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.611 1.986 7.862 1.17e-10 ***
## trtControl -11.811 2.512 -4.702 1.67e-05 ***
## trtRock Cover 5.139 3.140 1.637 0.107
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.424 on 57 degrees of freedom
## Multiple R-squared: 0.4318, Adjusted R-squared: 0.4119
## F-statistic: 21.66 on 2 and 57 DF, p-value: 1.007e-07
The treatment explains 75% of the variance in species richness, whoa!
I’ll just take a quick glance at the ordination.
x <- read.csv(fn2)
names(x) <- tolower(names(x))
x2 <- subset(x, site != '')
spp_mat <- dplyr:::select(x2, acmi:zrgele)
spp_mat[is.na(spp_mat)] <- 0
n <- metaMDS(spp_mat, k = 3)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1081346
## Run 1 stress 0.1154191
## Run 2 stress 0.1107454
## Run 3 stress 0.1075524
## ... New best solution
## ... Procrustes: rmse 0.01708509 max resid 0.1072118
## Run 4 stress 0.1163185
## Run 5 stress 0.1078292
## ... Procrustes: rmse 0.01438192 max resid 0.08057942
## Run 6 stress 0.1073857
## ... New best solution
## ... Procrustes: rmse 0.007480586 max resid 0.0507909
## Run 7 stress 0.1145224
## Run 8 stress 0.1107707
## Run 9 stress 0.1073872
## ... Procrustes: rmse 0.0004236427 max resid 0.001465027
## ... Similar to previous best
## Run 10 stress 0.1073905
## ... Procrustes: rmse 0.00108646 max resid 0.003657302
## ... Similar to previous best
## Run 11 stress 0.1073893
## ... Procrustes: rmse 0.0008260192 max resid 0.002548624
## ... Similar to previous best
## Run 12 stress 0.1107426
## Run 13 stress 0.1073867
## ... Procrustes: rmse 0.0004495397 max resid 0.001393039
## ... Similar to previous best
## Run 14 stress 0.1073859
## ... Procrustes: rmse 0.0003679856 max resid 0.001321349
## ... Similar to previous best
## Run 15 stress 0.1073889
## ... Procrustes: rmse 0.001048054 max resid 0.003379078
## ... Similar to previous best
## Run 16 stress 0.1073904
## ... Procrustes: rmse 0.001018037 max resid 0.003123514
## ... Similar to previous best
## Run 17 stress 0.1073854
## ... New best solution
## ... Procrustes: rmse 0.0005809261 max resid 0.002736778
## ... Similar to previous best
## Run 18 stress 0.108129
## Run 19 stress 0.1107494
## Run 20 stress 0.1073883
## ... Procrustes: rmse 0.000700848 max resid 0.003611871
## ... Similar to previous best
## *** Solution reached
cols <- as.numeric(as.factor(x2$treatmentcombine))
plot(n$points, col = cols, pch = 20)
scatterplot3d(n$points, color = cols, pch = 20)
Way different, still.
Well, up to you! I think you wanted to run a pca for some reason? I don’t see any reason for that but easy enough to do. We could drill down into the species specific-patterns? Let me know :)