Introduction

These are codes to analyze and explore patterns in part 2 ambient water chemistry data.

Import data

library(readxl)
partb_wc <- read_excel("F:/One Drive - Towson U/Thesis stuff/Data/Water chemistry data/Part b WC data/Ambient WC data.xlsx", 
    sheet = "Combined")

Range of data

First, let’s summarize the data.

library(dplyr)
partb_wc %>% summary()
##       Year          Site              Temp (C)        SC (mS/cm)   
##  Min.   :2010   Length:32          Min.   : 5.620   Min.   :256.0  
##  1st Qu.:2013   Class :character   1st Qu.: 8.523   1st Qu.:354.2  
##  Median :2015   Mode  :character   Median :11.510   Median :456.5  
##  Mean   :2015                      Mean   :11.616   Mean   :466.3  
##  3rd Qu.:2018                      3rd Qu.:14.137   3rd Qu.:549.5  
##  Max.   :2019                      Max.   :17.210   Max.   :823.0  
##                                                     NA's   :2      
##    TSS (ppm)        DO (ppm)         DO (%)             pH       
##  Min.   :142.0   Min.   : 7.33   Min.   : 75.50   Min.   :6.680  
##  1st Qu.:225.0   1st Qu.:10.56   1st Qu.: 99.05   1st Qu.:7.652  
##  Median :321.0   Median :11.36   Median :104.65   Median :7.955  
##  Mean   :312.2   Mean   :11.30   Mean   :104.08   Mean   :8.123  
##  3rd Qu.:382.0   3rd Qu.:12.22   3rd Qu.:111.42   3rd Qu.:8.443  
##  Max.   :535.0   Max.   :13.21   Max.   :120.20   Max.   :9.800  
##  NA's   :7       NA's   :4       NA's   :4                       
##  Salinity (ppt)  
##  Min.   :0.0100  
##  1st Qu.:0.1550  
##  Median :0.2200  
##  Mean   :0.2206  
##  3rd Qu.:0.2500  
##  Max.   :0.4100  
##  NA's   :1

Investigate the relationship between pH and %Do

The relationship between %Do and pH is expected to be positive. As more photosynthesis happens in stream, more CO2 is pumped out, which then leads to higher %Do and higher pH. Let’s investigate to see if the relationship holds.

library(dplyr)
ph_do_clean <- as.data.frame(partb_wc) %>% 
  rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`, 
          DO_ppm = `DO (ppm)`, DO_p =  `DO (%)`, Sal =  `Salinity (ppt)`)) %>% 
  filter(!is.na(DO_p)) # (1) rename columns; (2) filter NA data from DO column; 

cor.test(ph_do_clean$pH,ph_do_clean$DO_p, method = "pearson") #correlational test between pH and DO using Pearson cor
## 
##  Pearson's product-moment correlation
## 
## data:  ph_do_clean$pH and ph_do_clean$DO_p
## t = 0.057477, df = 26, p-value = 0.9546
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3633333  0.3827389
## sample estimates:
##        cor 
## 0.01127148
cor.test(ph_do_clean$pH,ph_do_clean$DO_p, method = "spearman") #correlational test between pH and DO using Spearman cor
## 
##  Spearman's rank correlation rho
## 
## data:  ph_do_clean$pH and ph_do_clean$DO_p
## S = 3435.9, p-value = 0.7629
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.05967698
library(ggplot2)
ph_do_clean %>% ggplot(aes(pH, DO_p)) + geom_point()

In conclusion, pH and %DO have no correlation with each other in this data set, since there seems to be some extreme values.

Create a matrix of scatterplots

Pearson

These codes were used to create a scatterplot matrix (Pearson’s correlation) using the package GGally (Schloerke et al. 2021).

library(dplyr)
library(GGally)

lowerFn <- function(data, mapping, method = "lm", ...) {
  p <- ggplot(data = data, mapping = mapping) +
    geom_point(colour = "blue") +
    geom_smooth(method = method, color = "red", ...)
  p
} #code adopted from WCC - an user on Stack Overflow. Check out his profile here: https://stackoverflow.com/users/1102552/wcc.

wc_clean <- partb_wc %>% rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`, 
          DO_ppm = `DO (ppm)`, DO_p =  `DO (%)`, Sal =  `Salinity (ppt)`)) %>% 
  filter(!is.na(SC) & !is.na(TSS) & !is.na(DO_p)) 

wc_clean %>% 
  ggpairs(columns = c("Temp","SC","TSS","DO_ppm","DO_p","pH","Sal"), 
          lower = list(continuous = wrap(lowerFn, method = "lm")),
  diag = list(continuous = wrap("barDiag", colour = "blue")),
  upper = list(continuous = wrap("cor", size = 10))
) +  theme(axis.text.y = element_text(colour = "black", size = 12, face = "bold"), 
    axis.text.x = element_text(colour = "black", face = "bold", size = 12), 
    legend.text = element_text(size = 12, face ="bold", colour ="black"), 
    axis.title.x = element_text(face = "bold", size = 14, colour = "black"), 
    legend.title = element_text(size = 14, colour = "black", face = "bold"), 
    panel.background = element_blank(), panel.border = element_rect(colour = "black", fill = NA, size = 1.2),
    legend.key=element_blank(),
    plot.title = element_text(color = "black", size = 30, face = "bold", hjust = 0.5)) + 
   labs(
    title = "Correlation matrix of ambient WC (Pearson)")

Spearman

Previous codes will be modified to display Spearman correlation. For now, traditional method will be applied. Package corrr (Kuhn et al. 2020) will be used to test for significant correlations among variables.

library(corrr)

wc_cor <- wc_clean[,3:ncol(wc_clean)] %>% correlate(use = "pairwise.complete.obs", method = "spearman") %>% 
  select(Temp, SC, TSS, DO_ppm, DO_p, pH, Sal) %>% mutate(Variable = c(1,2,3,4,5,6,7), .before = Temp)
wc_cor$Variable <- recode(wc_cor$Variable, 
                          "1" = "Temp", 
                          "2" = "SC",
                          "3" = "TSS",
                          "4" = "DO_ppm",
                          "5" = "DO_p",
                          "6" = "pH",
                          "7" = "Sal")
wc_cor
## # A tibble: 7 x 8
##   Variable   Temp     SC    TSS DO_ppm    DO_p      pH    Sal
##   <chr>     <dbl>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl>  <dbl>
## 1 Temp     NA     -0.229 -0.251 -0.396  0.338   0.623  -0.185
## 2 SC       -0.229 NA      0.996  0.236  0.187  -0.290   0.988
## 3 TSS      -0.251  0.996 NA      0.217  0.155  -0.289   0.984
## 4 DO_ppm   -0.396  0.236  0.217 NA      0.652  -0.389   0.239
## 5 DO_p      0.338  0.187  0.155  0.652 NA       0.0325  0.200
## 6 pH        0.623 -0.290 -0.289 -0.389  0.0325 NA      -0.282
## 7 Sal      -0.185  0.988  0.984  0.239  0.200  -0.282  NA

Another way of doing this, but will also give you a correlation and p-value matrix. Package rstatix will be used (Kassambara 2021.)

library(rstatix)
options(digits = 3)   # report 3 significant digits
wc_clean[,3:ncol(wc_clean)] %>% cor_mat(Temp, SC, TSS, DO_ppm, DO_p, pH, Sal, method = "spearman") %>% cor_get_pval()
## # A tibble: 7 x 8
##   rowname       Temp       SC      TSS  DO_ppm       DO_p      pH      Sal
##   <chr>        <dbl>    <dbl>    <dbl>   <dbl>      <dbl>   <dbl>    <dbl>
## 1 Temp    0.00000395 3.17e- 1 2.72e- 1 0.0754  0.135      0.00256 4.21e- 1
## 2 SC      0.317      3.95e- 6 4.11e- 6 0.302   0.415      0.203   9.68e-17
## 3 TSS     0.272      4.11e- 6 3.95e- 6 0.345   0.502      0.204   1.3 e-15
## 4 DO_ppm  0.0754     3.02e- 1 3.45e- 1 0       0.00136    0.0817  2.97e- 1
## 5 DO_p    0.135      4.15e- 1 5.02e- 1 0.00136 0.00000395 0.889   3.84e- 1
## 6 pH      0.00256    2.03e- 1 2.04e- 1 0.0817  0.889      0       2.15e- 1
## 7 Sal     0.421      9.68e-17 1.3 e-15 0.297   0.384      0.215   0

Temp vs pH

It seems like the relationship between water temperature and pH is a little odd. In the previous code chunk, NA data were removed, thus, led to some temperature and pH data being removed as well. Now I am testing correlation between temp vs pH alone.

library(dplyr)
temp_ph <- as.data.frame(partb_wc) %>% 
  rename(c(Temp=`Temp (C)`, SC = `SC (mS/cm)`, TSS = `TSS (ppm)`, 
          DO_ppm = `DO (ppm)`, DO_p =  `DO (%)`, Sal =  `Salinity (ppt)`)) %>% 
  filter(!is.na(Temp) & !is.na(pH)) # (1) rename columns; (2) filter NA data from temp and pH column;

cor.test(temp_ph$pH,temp_ph$Temp, method = "pearson") #correlational test between pH and DO using Pearson cor
## 
##  Pearson's product-moment correlation
## 
## data:  temp_ph$pH and temp_ph$Temp
## t = 0.3, df = 30, p-value = 0.8
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.306  0.390
## sample estimates:
##    cor 
## 0.0483
cor.test(temp_ph$pH,temp_ph$Temp, method = "spearman") #correlational test between pH and DO using Spearman cor
## 
##  Spearman's rank correlation rho
## 
## data:  temp_ph$pH and temp_ph$Temp
## S = 4656, p-value = 0.4
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##   rho 
## 0.147

Results showed that the correlation between temperature and pH is not statisitcally significant with a wide range of 95% CI.

References

  1. Barret Schloerke, Di Cook, Joseph Larmarange, Francois Briatte, Moritz Marbach, Edwin Thoen, Amos Elberg and Jason Crowley (2021). GGally: Extension to ‘ggplot2’. R package version 2.1.2. https://CRAN.R-project.org/package=GGally
  2. Max Kuhn, Simon Jackson and Jorge Cimentada (2020). corrr: Correlations in R. R package version 0.4.3. https://CRAN.R-project.org/package=corrr
  3. Alboukadel Kassambara (2021). rstatix: Pipe-Friendly Framework for Basic Statistical Tests. R package version 0.7.0. https://CRAN.R-project.org/package=rstatix