Introduction

These are codes I used to run simple linear regression on water chemistry data and NMDS scores from NMDS all sites count part B.

Import NMDS score data

Data can be downloaded here.

setwd("F:/GitHub Projects/thesis_codes/") #set working directory
library(readxl)
nmds_scores <- read_excel("Part b (09-19)/Water chemistry/Regression - WC and NMDS scores/NMDS all sites count part b scores.xlsx")
ibi_nmds_scores <- read_excel("Part b (09-19)/Water chemistry/Regression - WC and NMDS scores/NMDS all sites IBI part b scores.xlsx")

Import water chemistry data

Data can be downloaded here.

setwd("F:/GitHub Projects/thesis_codes/") #set working directory
wc <- read_excel("Part b (09-19)/Water chemistry/Regression - WC and NMDS scores/Ambient WC data.xlsx", 
    sheet = "Combined")

Next step is to clean up data, filter out data that are not need for the analyses.

Clean up data

library(dplyr)

new_nmds_scores <- semi_join(nmds_scores,wc)#return all rows from NMDS score dataset with a match in the WC dataset

wc_nmds_combine<- left_join(new_nmds_scores,wc) #join both WC and new NMDS score datasets together and match rows


############join with IBI scores###########
new_ibi_nmds_scores <- semi_join(ibi_nmds_scores,wc)#return all rows from NMDS score dataset with a match in the WC dataset

wc_ibi_combine<- left_join(new_ibi_nmds_scores,wc) #join both WC and new NMDS score datasets together and match rows

Data filtering and cleaning are done. Now comes to analysis part.

Simple Linear Regression

Water temperature can be variable due to varying weather, time of the day, amount of sunshine, etc. during the sampling process. This can also affect DO and pH. Thus, I am choosing TSS, SC, and salinity to investigate. However, the three variables are significantly correlated (p < 0.001). Check my correlation diagram of part 2 ambient water chemistry here. Therefore, I chose SC for this analysis.

library(dplyr)
data_wo_sc <- wc_nmds_combine %>% filter(!is.na(SC)) 

cor.test(data_wo_sc$NMDS1,data_wo_sc$SC,method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  data_wo_sc$NMDS1 and data_wo_sc$SC
## t = -0.99168, df = 30, p-value = 0.3293
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4960367  0.1818320
## sample estimates:
##        cor 
## -0.1781578
cor.test(data_wo_sc$NMDS2,data_wo_sc$SC,method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  data_wo_sc$NMDS2 and data_wo_sc$SC
## t = -4.4298, df = 30, p-value = 0.0001159
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8017368 -0.3588308
## sample estimates:
##        cor 
## -0.6288426
cor.test(data_wo_sc$NMDS3,data_wo_sc$SC,method="pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  data_wo_sc$NMDS3 and data_wo_sc$SC
## t = 0.45437, df = 30, p-value = 0.6528
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2739179  0.4192797
## sample estimates:
##        cor 
## 0.08267245

Conclusion: Only axis 2 scores had a significant correlations with sC measurements. This further supported the finding from the NMDS of count data from all sites, as axis 2 was shown to separate REF-o17 from other sites in Red Run.