GEOG 576: Water Quality Analysis

In this Exercise you will

Load water quality data
Plot Q-C relationships
Use power functions to quantify the Q-C relationships
Plot discharge data on the plot to determine hydrological conditions at time of sampling

0. Water quality data

The water quality data have been downloaded and formatted for you.
Water quality data can be downloaded from:

** You don’t have to download data from here for this exercise. It has been downloaded for you and posted in the url below. These websites are given for your information and to cite the data source **

The CEDEN website (California): http://ceden.waterboards.ca.gov/AdvancedQueryTool
Watershed urban management plans (San Diego County only): http://www.projectcleanwater.org/index.php Go to “Stormwater copermiettees > Work Products” or to: “Watersheds > Your Watershed > Data, Plans and Projects”

I. Load water quality data for San Diego River at Mast Blvd

A. Read in csv:

Here is the URL for a link to the data, which is in a .csv file:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSOq8oKMIndzYZS_7RshftKrjQJYKN-iZQT6PVKY4abLgXbDDHwCJt8UVgcU5-GnkALiQI_y1MFOues/pub?output=csv

Read this csv and put it in a data.frame in R. I called the data.frame “x”.

Hint: Look back to previous exercises to find the commands to use to read a csv file using a URL.

B. Get Date vector

Format the “Date” column as a date and give the formatted date field the name “dates”. I’m not showing the code…see previous exercises.

C. Find constituent rating curves

Plot C vs Q for Electrical conductivity (EC), Total suspended solids (TSS), turbidity, and Total.coliforms I’m not showing the code to make this plot…

D. Now plot them as log-log:

I’m not showing the code to make this plot… Hint: use the plot option log=“xy” as described under Key R concepts for Exercise 7.

E. Determine if there is a statistically signifiant relationship, and the exponent on the Q-C rating curve

x$EC.log = log(x$EC.umhos.cm)
x$Q.log = log(x$Q.ft3.s)
lm.EC = lm(EC.log ~ Q.log,x) #  Type ?lm to see format of input
summary(lm.EC)

## 
## Call:
## lm(formula = EC.log ~ Q.log, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3417 -0.1645 -0.1325  0.2430  0.4397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.42908    0.30398  27.729 3.09e-09 ***
## Q.log       -0.20220    0.07043  -2.871   0.0208 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2906 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5075, Adjusted R-squared:  0.4459 
## F-statistic: 8.243 on 1 and 8 DF,  p-value: 0.0208

# Generate an array of Q with all values between 5 and 400 in order to plot the lm
Qnew = seq(5,400,by=10)
Qnew.log = data.frame(Q.log=log(Qnew))  #  Create a new dataframe that has a variable "Q.log" in it

EC.pred.log = predict.lm(lm.EC,Qnew.log)
# Use exp to turn the log-EC values into non-log values:
EC.pred = exp(EC.pred.log)

Do the same thing for Turbidity, TSS and Total Coliforms. I’m not showing the code.

x$turb.log = log(x$Turbidity.NTU)
lm.turb = lm(turb.log ~ Q.log,x)
summary(lm.turb)

## 
## Call:
## lm(formula = turb.log ~ Q.log, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83459 -0.31994 -0.00517  0.42493  0.65739 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.5015     0.5445   0.921 0.381064    
## Q.log         0.5996     0.1227   4.887 0.000863 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.532 on 9 degrees of freedom
## Multiple R-squared:  0.7263, Adjusted R-squared:  0.6959 
## F-statistic: 23.88 on 1 and 9 DF,  p-value: 0.000863

Turbidity.pred = exp(predict.lm(lm.turb,Qnew.log))

TSS

## 
## Call:
## lm(formula = TSS.log ~ Q.log, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01415 -0.34337 -0.02042  0.32490  0.91866 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.5106     0.5902   0.865 0.409366    
## Q.log         0.6667     0.1330   5.014 0.000725 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5765 on 9 degrees of freedom
## Multiple R-squared:  0.7363, Adjusted R-squared:  0.7071 
## F-statistic: 25.14 on 1 and 9 DF,  p-value: 0.0007254

Total Coliforms

## 
## Call:
## lm(formula = TC.log ~ Q.log, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0443 -0.9515 -0.4254  0.9244  3.9137 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   6.5980     2.3146   2.851   0.0191 *
## Q.log         0.8407     0.5216   1.612   0.1414  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.261 on 9 degrees of freedom
## Multiple R-squared:  0.224,  Adjusted R-squared:  0.1378 
## F-statistic: 2.598 on 1 and 9 DF,  p-value: 0.1414

E. Redo the log-log plots and add the regression line.

I’m not showing the code for this… Hint: to add the regression line, use “lines”, with Qnew as the x-variable and the .pred series as the y-variables.

Here’s a reminder about how to add lines to a plot:

x.example = c(1,2,3,4,5)
y.example = c(2,3,1,6,7)
y2.example = c(4,1,2,8,9)

plot(x.example,y.example)
lines(x.example,y2.example)

II. Load and plot Q data, and sampling points on top.

A. Load Q data for Fashion Valley

Site code for San Diego River at Fashion Valley is 11023000. I’m not showing the code for this…look back at previous exercises. Put the discharge data into a data.frame called “xQ”. Rename the discharge column “Q.ft.s”.

##   agency_cd  site_no       Date X_00060_00003 X_00060_00003_cd
## 1      USGS 11023000 1982-01-18           7.6                A
## 2      USGS 11023000 1982-01-19           8.8                A
## 3      USGS 11023000 1982-01-20         151.0                A
## 4      USGS 11023000 1982-01-21         574.0                A
## 5      USGS 11023000 1982-01-22         215.0                A
## 6      USGS 11023000 1982-01-23          70.0                A

##   agency_cd  site_no       Date Q.ft.s QA
## 1      USGS 11023000 1982-01-18    7.6  A
## 2      USGS 11023000 1982-01-19    8.8  A
## 3      USGS 11023000 1982-01-20  151.0  A
## 4      USGS 11023000 1982-01-21  574.0  A
## 5      USGS 11023000 1982-01-22  215.0  A
## 6      USGS 11023000 1982-01-23   70.0  A

B. Choose a larger set of events and plot the hydrograph, and the water quality sampling dates on top.

par(mfrow=c(1,1))
plot(as.Date(xQ$Date),xQ$Q.ft.s,type="l",xlim=as.Date(c("2006-10-01","2007-04-01")),ylim=c(0,500))
points(dates,x$Q.ft3.s)