1. Import and study the sample design

We’ll read in the data and take a look at how the study is organized.

library(scatterplot3d)
library(vegan)
library(dplyr)
library(ggplot2)
library(reshape2)


options(stringsAsFactors = FALSE)


setwd('S:/OSMP/SCIENCE_OFFICER/Projects/Misc_analyses/jarret/')
fn <- 'Gerorgia Pass Species Comp Data Sheet_DATA CLEAN UP INPROGRESS 7.1 HAR.csv'
fn2 <- 'Gerorgia Pass Species Comp Data Sheet_DATA CLEAN UP INPROGRESS 7.1 HAR_ Level 1 Hits Only.csv'

d <- read.csv(fn)
names(d) <- tolower(names(d))
d2 <- subset(d, site != '')

d2$site2 <- gsub('[A-Z]| ', '', d2$site)
d2$site3 <- gsub('[0-9]$', '0', d2$site2)
d3 <- d2[order(d2$site3),]

table(d3$treatment, d3$treatmentcombine)
##               
##                Control Matting Rock Cover
##   Control East      12       0          0
##   Control West      18       0          0
##   Matting            0      18          0
##   Rock Cover         0       0         12
table(d3$date)
## 
## 1/21/16 7/21/16 7/22/16 7/23/16 7/25/16 Unknown 
##       1       7      17      22       9       4
table(d3$site3)
## 
##  000  100 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 
##    2    2    2    2    3    3    2    2    2    2    2    2    2    2    2 
## 2300 2400  250 2500 2600 2700 2800  300  400  500  600  700  800  900 
##    2    2    2    2    2    2    2    2    2    3    1    2    2    2
sort(d3$site)
##  [1] "002 CW"   "002 RR"   "1002 CW"  "1002 RM"  "102 CW"   "102 RM"  
##  [7] "1100 CE"  "1102 RM"  "1200 CW"  "1200 RM"  "1202 CE"  "1300 CE" 
## [13] "1300 RC"  "1300 RM"  "1400 RM"  "1402 CW"  "1500 CW"  "1500 RR" 
## [19] "1602  RC" "1602 CW"  "1702 CE"  "1702 RM"  "1802 CW"  "1802 RC" 
## [25] "1902 CW"  "1902 RC"  "2000 CE"  "2002 RM"  "2100 CW"  "2102 RM" 
## [31] "2202 CE"  "2202 RC"  "2300 CW"  "2300 RC"  "2400 CW"  "2402 RM" 
## [37] "250 CE"   "250 RR"   "2500 CE"  "2502 RC"  "2600 CE"  "2602 RM" 
## [43] "2700 CW"  "2702 RC"  "2800 CW"  "2802 RC"  "302 CW"   "302 RM"  
## [49] "402 CW"   "402 RM"   "502 CE"   "502 RM"   "504 RM"   "602 CE"  
## [55] "700 CE"   "700 RM"   "800 RM"   "802 CW"   "900 RM"   "902 CW"
  • 30 sites, each with one control, for 60 rows of data.
  • control east (ce) and control west (cw) get combined as control.
  • matting (rm) and rock cover (rr or rc) are the treatment.
  • one weird January date and 4 unknown dates
  • I reworked the site IDs to see if we had a fully paired design, and it looks close to being paired, but there’s one too many site 500s and one too few site 600s. Specifically, there is 502 CE, 502 RM, 504 RM, and 602 CE. One of those 500 is probably a 600…typo? It doesn’t really matter unless we want to do a paired analysis, which you would want to do when treatment effects are too subtle to be detected when ignoring the paired design (that is, pairing sites gives you max sensitivity for detecting your treatment effect…because everything other than your treatment can be assumed to be the same). Anywho, we can plot out the data by site pair later, but we’ll ignore it for now.

2. NMDS

You said “I am interested in doing a NMDS for composition to compare composition between the two treatments excluding bare ground, liter, moss, and lichen. I believe I can do this including the controls or excluding them? It would also be interesting to compare the results of the NMDS using the 1 and 3 level data sets.”

You can do it however you want, it’s just a matter of what hypothesis you want to test. To test whether the communities in the treatments are different than in the controls, you’d want to ordinate the controls and treatments together.

We’ll take a peak at the ordination with the 3 level data first.

spp_mat <- dplyr:::select(d3, acmi:zrgele)
n <- metaMDS(spp_mat, k = 3)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1052727 
## Run 1 stress 0.1032255 
## ... New best solution
## ... Procrustes: rmse 0.06216054  max resid 0.2494565 
## Run 2 stress 0.1032236 
## ... New best solution
## ... Procrustes: rmse 0.001092126  max resid 0.004593825 
## ... Similar to previous best
## Run 3 stress 0.1088181 
## Run 4 stress 0.1062429 
## Run 5 stress 0.105387 
## Run 6 stress 0.1050632 
## Run 7 stress 0.1043834 
## Run 8 stress 0.1062822 
## Run 9 stress 0.1043978 
## Run 10 stress 0.1052351 
## Run 11 stress 0.108677 
## Run 12 stress 0.1032242 
## ... Procrustes: rmse 0.0009164418  max resid 0.004091523 
## ... Similar to previous best
## Run 13 stress 0.1071435 
## Run 14 stress 0.106208 
## Run 15 stress 0.1031422 
## ... New best solution
## ... Procrustes: rmse 0.007989671  max resid 0.0480741 
## Run 16 stress 0.1032352 
## ... Procrustes: rmse 0.009825673  max resid 0.05053581 
## Run 17 stress 0.1032101 
## ... Procrustes: rmse 0.007208341  max resid 0.04715309 
## Run 18 stress 0.110659 
## Run 19 stress 0.103129 
## ... New best solution
## ... Procrustes: rmse 0.003553303  max resid 0.01441219 
## Run 20 stress 0.110367 
## *** No convergence -- monoMDS stopping criteria:
##      1: no. of iterations >= maxit
##     19: stress ratio > sratmax
cols <- as.numeric(as.factor(d3$treatmentcombine))
plot(n$points, col = cols, pch = 20)

scatterplot3d(n$points, color = cols, pch = 20)

Wasn’t getting convergence with k = 2, which implies increasing k until you converge. Convergence was reached at k = 3. Looking at the plot, it looks like all the separation with regards to the treatment and control is on axis 1.

3. Profile some predictors

d4 <- dplyr:::select(d3, site3, treatmentcombine, total.cover, species.count, x2bg, x2l, x2r)
d5 <- melt(data = d4, id.vars = c('site3', 'treatmentcombine'))
d5$trt <- factor(d5$treatmentcombine, levels = c('Matting', 'Control', 'Rock Cover'))

ggplot(data = d5, aes(x = trt, y = value)) +
  geom_boxplot() +
  facet_wrap(~variable, scales = 'free')

ggplot(data = d5, aes(x = trt, y = value, group = site3)) +
  geom_point() +
  geom_line()+
  facet_wrap(~variable, scales = 'free')

Differences everywhere you look. Some interesting variance among the paired sites (lines connect the pairs).

4. Stats

d6 <- split(d5, d5$variable)
lapply(d6, function(x) summary(aov(value ~ trt, data = x)))
## $total.cover
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt          2 100017   50008    47.4 7.53e-13 ***
## Residuals   57  60140    1055                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $species.count
##             Df Sum Sq Mean Sq F value Pr(>F)    
## trt          2 2077.8  1038.9   89.52 <2e-16 ***
## Residuals   57  661.5    11.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $x2bg
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt          2   8513    4257   27.98 3.41e-09 ***
## Residuals   57   8670     152                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $x2l
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt          2   6541    3271   65.25 1.83e-15 ***
## Residuals   57   2857      50                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $x2r
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt          2   3074    1537   21.66 1.01e-07 ***
## Residuals   57   4045      71                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Massive differences everywhere you look. When can also look at r2 values.

lapply(d6, function(x) summary(lm(value ~ trt, data = x)))
## $total.cover
## 
## Call:
## lm(formula = value ~ trt, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.833 -14.333   0.833  15.167 119.167 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     37.167      7.656   4.854 9.77e-06 ***
## trtControl      77.667      9.684   8.020 6.37e-11 ***
## trtRock Cover   -9.333     12.105  -0.771    0.444    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.48 on 57 degrees of freedom
## Multiple R-squared:  0.6245, Adjusted R-squared:  0.6113 
## F-statistic:  47.4 on 2 and 57 DF,  p-value: 7.528e-13
## 
## 
## $species.count
## 
## Call:
## lm(formula = value ~ trt, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1333  -1.2222  -0.1333   2.8667   7.7778 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.2222     0.8030   5.258 2.27e-06 ***
## trtControl     11.9111     1.0157  11.727  < 2e-16 ***
## trtRock Cover   0.3611     1.2696   0.284    0.777    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.407 on 57 degrees of freedom
## Multiple R-squared:  0.7585, Adjusted R-squared:   0.75 
## F-statistic: 89.52 on 2 and 57 DF,  p-value: < 2.2e-16
## 
## 
## $x2bg
## 
## Call:
## lm(formula = value ~ trt, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.167  -8.819  -2.322  10.308  30.133 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     20.778      2.907   7.148 1.81e-09 ***
## trtControl      -5.911      3.677  -1.608    0.113    
## trtRock Cover   25.389      4.596   5.524 8.54e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.33 on 57 degrees of freedom
## Multiple R-squared:  0.4954, Adjusted R-squared:  0.4777 
## F-statistic: 27.98 on 2 and 57 DF,  p-value: 3.414e-09
## 
## 
## $x2l
## 
## Call:
## lm(formula = value ~ trt, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.222  -3.458  -1.694   2.792  20.833 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     28.222      1.669  16.913  < 2e-16 ***
## trtControl     -23.056      2.111 -10.923 1.34e-15 ***
## trtRock Cover  -22.056      2.638  -8.359 1.74e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.08 on 57 degrees of freedom
## Multiple R-squared:  0.696,  Adjusted R-squared:  0.6853 
## F-statistic: 65.25 on 2 and 57 DF,  p-value: 1.826e-15
## 
## 
## $x2r
## 
## Call:
## lm(formula = value ~ trt, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.750  -3.800  -2.206   2.535  28.389 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     15.611      1.986   7.862 1.17e-10 ***
## trtControl     -11.811      2.512  -4.702 1.67e-05 ***
## trtRock Cover    5.139      3.140   1.637    0.107    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.424 on 57 degrees of freedom
## Multiple R-squared:  0.4318, Adjusted R-squared:  0.4119 
## F-statistic: 21.66 on 2 and 57 DF,  p-value: 1.007e-07

The treatment explains 75% of the variance in species richness, whoa!

5. What about the top hits only?

I’ll just take a quick glance at the ordination.

x <- read.csv(fn2)
names(x) <- tolower(names(x))
x2 <- subset(x, site != '')
spp_mat <- dplyr:::select(x2, acmi:zrgele)
spp_mat[is.na(spp_mat)] <- 0
n <- metaMDS(spp_mat, k = 3)
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1081346 
## Run 1 stress 0.1154191 
## Run 2 stress 0.1107454 
## Run 3 stress 0.1075524 
## ... New best solution
## ... Procrustes: rmse 0.01708509  max resid 0.1072118 
## Run 4 stress 0.1163185 
## Run 5 stress 0.1078292 
## ... Procrustes: rmse 0.01438192  max resid 0.08057942 
## Run 6 stress 0.1073857 
## ... New best solution
## ... Procrustes: rmse 0.007480586  max resid 0.0507909 
## Run 7 stress 0.1145224 
## Run 8 stress 0.1107707 
## Run 9 stress 0.1073872 
## ... Procrustes: rmse 0.0004236427  max resid 0.001465027 
## ... Similar to previous best
## Run 10 stress 0.1073905 
## ... Procrustes: rmse 0.00108646  max resid 0.003657302 
## ... Similar to previous best
## Run 11 stress 0.1073893 
## ... Procrustes: rmse 0.0008260192  max resid 0.002548624 
## ... Similar to previous best
## Run 12 stress 0.1107426 
## Run 13 stress 0.1073867 
## ... Procrustes: rmse 0.0004495397  max resid 0.001393039 
## ... Similar to previous best
## Run 14 stress 0.1073859 
## ... Procrustes: rmse 0.0003679856  max resid 0.001321349 
## ... Similar to previous best
## Run 15 stress 0.1073889 
## ... Procrustes: rmse 0.001048054  max resid 0.003379078 
## ... Similar to previous best
## Run 16 stress 0.1073904 
## ... Procrustes: rmse 0.001018037  max resid 0.003123514 
## ... Similar to previous best
## Run 17 stress 0.1073854 
## ... New best solution
## ... Procrustes: rmse 0.0005809261  max resid 0.002736778 
## ... Similar to previous best
## Run 18 stress 0.108129 
## Run 19 stress 0.1107494 
## Run 20 stress 0.1073883 
## ... Procrustes: rmse 0.000700848  max resid 0.003611871 
## ... Similar to previous best
## *** Solution reached
cols <- as.numeric(as.factor(x2$treatmentcombine))
plot(n$points, col = cols, pch = 20)

scatterplot3d(n$points, color = cols, pch = 20)

Way different, still.

6. Next steps

Well, up to you! I think you wanted to run a pca for some reason? I don’t see any reason for that but easy enough to do. We could drill down into the species specific-patterns? Let me know :)