These are codes to analyze and explore patterns in part 2 ambient water chemistry data.
library(readxl)
partb_wc <- read_excel("F:/One Drive - Towson U/Thesis stuff/Data/Water chemistry data/Part b WC data/Ambient WC data.xlsx",
sheet = "Combined")
First, let’s summarize the data.
library(dplyr)
partb_wc %>% summary()
## Year Site Temp (C) SC (mS/cm)
## Min. :2010 Length:32 Min. : 5.620 Min. :256.0
## 1st Qu.:2013 Class :character 1st Qu.: 8.523 1st Qu.:354.2
## Median :2015 Mode :character Median :11.510 Median :456.5
## Mean :2015 Mean :11.616 Mean :466.3
## 3rd Qu.:2018 3rd Qu.:14.137 3rd Qu.:549.5
## Max. :2019 Max. :17.210 Max. :823.0
## NA's :2
## TSS (ppm) DO (ppm) DO (%) pH
## Min. :142.0 Min. : 7.33 Min. : 75.50 Min. :6.680
## 1st Qu.:225.0 1st Qu.:10.56 1st Qu.: 99.05 1st Qu.:7.652
## Median :321.0 Median :11.36 Median :104.65 Median :7.955
## Mean :312.2 Mean :11.30 Mean :104.08 Mean :8.123
## 3rd Qu.:382.0 3rd Qu.:12.22 3rd Qu.:111.42 3rd Qu.:8.443
## Max. :535.0 Max. :13.21 Max. :120.20 Max. :9.800
## NA's :7 NA's :4 NA's :4
## Salinity (ppt)
## Min. :0.0100
## 1st Qu.:0.1550
## Median :0.2200
## Mean :0.2206
## 3rd Qu.:0.2500
## Max. :0.4100
## NA's :1
The relationship between %Do and pH is expected to be positive. As more photosynthesis happens in stream, more CO2 is pumped out, which then leads to higher %Do and higher pH. Let’s investigate to see if the relationship holds.
library(dplyr)
ph_do_clean <- as.data.frame(partb_wc) %>%
rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`,
DO_ppm = `DO (ppm)`, DO_p = `DO (%)`, Sal = `Salinity (ppt)`)) %>%
filter(!is.na(DO_p)) # (1) rename columns; (2) filter NA data from DO column;
cor.test(ph_do_clean$pH,ph_do_clean$DO_p, method = "pearson") #correlational test between pH and DO using Pearson cor
##
## Pearson's product-moment correlation
##
## data: ph_do_clean$pH and ph_do_clean$DO_p
## t = 0.057477, df = 26, p-value = 0.9546
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3633333 0.3827389
## sample estimates:
## cor
## 0.01127148
cor.test(ph_do_clean$pH,ph_do_clean$DO_p, method = "spearman") #correlational test between pH and DO using Spearman cor
##
## Spearman's rank correlation rho
##
## data: ph_do_clean$pH and ph_do_clean$DO_p
## S = 3435.9, p-value = 0.7629
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.05967698
library(ggplot2)
ph_do_clean %>% ggplot(aes(pH, DO_p)) + geom_point()
In conclusion, pH and %DO have no correlation with each other in this data set, since there seems to be some extreme values.
These codes were used to create a scatterplot matrix (Pearson’s correlation) using the package GGally (Schloerke et al. 2021).
library(dplyr)
library(GGally)
lowerFn <- function(data, mapping, method = "lm", ...) {
p <- ggplot(data = data, mapping = mapping) +
geom_point(colour = "blue") +
geom_smooth(method = method, color = "red", ...)
p
} #code adopted from WCC - an user on Stack Overflow. Check out his profile here: https://stackoverflow.com/users/1102552/wcc.
wc_clean <- partb_wc %>% rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`,
DO_ppm = `DO (ppm)`, DO_p = `DO (%)`, Sal = `Salinity (ppt)`)) %>%
filter(!is.na(SC) & !is.na(TSS) & !is.na(DO_p))
wc_clean %>%
ggpairs(columns = c("Temp","SC","TSS","DO_ppm","DO_p","pH","Sal"),
lower = list(continuous = wrap(lowerFn, method = "lm")),
diag = list(continuous = wrap("barDiag", colour = "blue")),
upper = list(continuous = wrap("cor", size = 10))
) + theme(axis.text.y = element_text(colour = "black", size = 12, face = "bold"),
axis.text.x = element_text(colour = "black", face = "bold", size = 12),
legend.text = element_text(size = 12, face ="bold", colour ="black"),
axis.title.x = element_text(face = "bold", size = 14, colour = "black"),
legend.title = element_text(size = 14, colour = "black", face = "bold"),
panel.background = element_blank(), panel.border = element_rect(colour = "black", fill = NA, size = 1.2),
legend.key=element_blank(),
plot.title = element_text(color = "black", size = 30, face = "bold", hjust = 0.5)) +
labs(
title = "Correlation matrix of ambient WC (Pearson)")
Previous codes will be modified to display Spearman correlation. For now, traditional method will be applied. Package corrr (Kuhn et al. 2020) will be used to test for significant correlations among variables.
library(corrr)
wc_cor <- wc_clean[,3:ncol(wc_clean)] %>% correlate(use = "pairwise.complete.obs", method = "spearman") %>%
select(Temp, SC, TSS, DO_ppm, DO_p, pH, Sal) %>% mutate(Variable = c(1,2,3,4,5,6,7), .before = Temp)
wc_cor$Variable <- recode(wc_cor$Variable,
"1" = "Temp",
"2" = "SC",
"3" = "TSS",
"4" = "DO_ppm",
"5" = "DO_p",
"6" = "pH",
"7" = "Sal")
wc_cor
## # A tibble: 7 x 8
## Variable Temp SC TSS DO_ppm DO_p pH Sal
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Temp NA -0.229 -0.251 -0.396 0.338 0.623 -0.185
## 2 SC -0.229 NA 0.996 0.236 0.187 -0.290 0.988
## 3 TSS -0.251 0.996 NA 0.217 0.155 -0.289 0.984
## 4 DO_ppm -0.396 0.236 0.217 NA 0.652 -0.389 0.239
## 5 DO_p 0.338 0.187 0.155 0.652 NA 0.0325 0.200
## 6 pH 0.623 -0.290 -0.289 -0.389 0.0325 NA -0.282
## 7 Sal -0.185 0.988 0.984 0.239 0.200 -0.282 NA
Another way of doing this, but will also give you a correlation and p-value matrix. Package rstatix will be used (Kassambara 2021.)
library(rstatix)
options(digits = 3) # report 3 significant digits
wc_clean[,3:ncol(wc_clean)] %>% cor_mat(Temp, SC, TSS, DO_ppm, DO_p, pH, Sal, method = "spearman") %>% cor_get_pval()
## # A tibble: 7 x 8
## rowname Temp SC TSS DO_ppm DO_p pH Sal
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Temp 0.00000395 3.17e- 1 2.72e- 1 0.0754 0.135 0.00256 4.21e- 1
## 2 SC 0.317 3.95e- 6 4.11e- 6 0.302 0.415 0.203 9.68e-17
## 3 TSS 0.272 4.11e- 6 3.95e- 6 0.345 0.502 0.204 1.3 e-15
## 4 DO_ppm 0.0754 3.02e- 1 3.45e- 1 0 0.00136 0.0817 2.97e- 1
## 5 DO_p 0.135 4.15e- 1 5.02e- 1 0.00136 0.00000395 0.889 3.84e- 1
## 6 pH 0.00256 2.03e- 1 2.04e- 1 0.0817 0.889 0 2.15e- 1
## 7 Sal 0.421 9.68e-17 1.3 e-15 0.297 0.384 0.215 0
It seems like the relationship between water temperature and pH is a little odd. In the previous code chunk, NA data were removed, thus, led to some temperature and pH data being removed as well. Now I am testing correlation between temp vs pH alone.
library(dplyr)
temp_ph <- as.data.frame(partb_wc) %>%
rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`,
DO_ppm = `DO (ppm)`, DO_p = `DO (%)`, Sal = `Salinity (ppt)`)) %>%
filter(!is.na(Temp) & !is.na(pH)) # (1) rename columns; (2) filter NA data from temp and pH column;
cor.test(temp_ph$pH,temp_ph$Temp, method = "pearson") #correlational test between pH and DO using Pearson cor
##
## Pearson's product-moment correlation
##
## data: temp_ph$pH and temp_ph$Temp
## t = 0.3, df = 30, p-value = 0.8
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.306 0.390
## sample estimates:
## cor
## 0.0483
cor.test(temp_ph$pH,temp_ph$Temp, method = "spearman") #correlational test between pH and DO using Spearman cor
##
## Spearman's rank correlation rho
##
## data: temp_ph$pH and temp_ph$Temp
## S = 4656, p-value = 0.4
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.147
Results showed that the correlation between temperature and pH is not statisitcally significant with a wide range of 95% CI.