GEOG 576: Water Quality Analysis

I. Load water quality data for San Diego River at Mast Blvd

A. Read in csv:

Here is the URL for a link to the data, which is in a .csv file:

https://docs.google.com/spreadsheets/d/e/2PACX-1vSOq8oKMIndzYZS_7RshftKrjQJYKN-iZQT6PVKY4abLgXbDDHwCJt8UVgcU5-GnkALiQI_y1MFOues/pub?output=csv

Look back to previous exercises to find the commands to use to read a csv file using a URL.

B. Get Date vector

Format the “Date” column as a date and give the formatted date field the name “dates”. I’m not showing the code…see previous exercises.

C. Find constituent rating curves

Plot C vs Q for Electrical conductivity (EC), Total suspended solids (TSS), turbidity, and Total.coliforms

par(mfrow=c(2,2),mar=c(1,4,0,4),oma=c(5,1,2,0))
plot(x$Q.ft3.s,x$EC.umhos.cm,xlab="",ylab="EC umhos/cm",xaxt="n")  # xaxt = "n" suppresses the labeling of the x-axis.
axis(side=1,labels=FALSE)   #  This adds the x-axis (side 1 is bottom axis, side 2 is left axis)
plot(x$Q.ft3.s,x$TSS.mgL,ylab="TSS mg/L",xaxt="n")
axis(side=1,labels=FALSE)
plot(x$Q.ft3.s,x$Turbidity.NTU,xlab="",ylab="Turbidity, NTU")
mtext(side=1,"Q ft3/s", line=3)
plot(x$Q.ft3.s,x$Total.Colif.MPN.100mL,ylab="Total coliforms, MPU/100mL")
mtext(side=1,"Q ft3/s", line=3)

D. Now plot them as log-log:

par(mfrow=c(2,2),mar=c(1,4,0,4),oma=c(5,1,2,0))
plot(x$Q.ft3.s,x$EC.umhos.cm,xlab="",ylab="EC umhos/cm",xaxt="n",log="xy")
axis(side=1,labels=FALSE)
plot(x$Q.ft3.s,x$TSS.mgL,ylab="TSS mg/L",xaxt="n",log="xy")
axis(side=1,labels=FALSE)
plot(x$Q.ft3.s,x$Turbidity.NTU,xlab="",ylab="Turbidity, NTU",log="xy")
mtext(side=1,"Q ft3/s", line=3)
plot(x$Q.ft3.s,x$Total.Colif.MPN.100mL,ylab="Total coliforms, MPU/100mL",log="xy")
mtext(side=1,"Q ft3/s", line=3)

E. Determine if there is a statistically signifiant relationship, and the exponent on the Q-C rating curve

x$EC.log = log(x$EC.umhos.cm)
x$Q.log = log(x$Q.ft3.s)
lm.EC = lm(EC.log ~ Q.log,x) #  do ?lm to see format of input
summary(lm.EC)

## 
## Call:
## lm(formula = EC.log ~ Q.log, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.3417 -0.1645 -0.1325  0.2430  0.4397 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.42908    0.30398  27.729 3.09e-09 ***
## Q.log       -0.20220    0.07043  -2.871   0.0208 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2906 on 8 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5075, Adjusted R-squared:  0.4459 
## F-statistic: 8.243 on 1 and 8 DF,  p-value: 0.0208

# Generate an array of Q with all values between 5 and 400 in order to plot the lm
Qnew = seq(5,400,by=10)
Qnew.log = data.frame(Q.log=log(Qnew))  #  Create a new dataframe that has a variable "Q.log" in it

EC.pred.log = predict.lm(lm.EC,Qnew.log)
# Use exp to turn the log-EC values into non-log values:
EC.pred = exp(EC.pred.log)

Do the same thing for Turbidity, TSS and Total Coliforms. I’m not showing the code.

x$turb.log = log(x$Turbidity.NTU)
lm.turb = lm(turb.log ~ Q.log,x)
summary(lm.turb)

## 
## Call:
## lm(formula = turb.log ~ Q.log, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83459 -0.31994 -0.00517  0.42493  0.65739 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.5015     0.5445   0.921 0.381064    
## Q.log         0.5996     0.1227   4.887 0.000863 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.532 on 9 degrees of freedom
## Multiple R-squared:  0.7263, Adjusted R-squared:  0.6959 
## F-statistic: 23.88 on 1 and 9 DF,  p-value: 0.000863

Turbidity.pred = exp(predict.lm(lm.turb,Qnew.log))

TSS

## 
## Call:
## lm(formula = TSS.log ~ Q.log, data = x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.01415 -0.34337 -0.02042  0.32490  0.91866 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.5106     0.5902   0.865 0.409366    
## Q.log         0.6667     0.1330   5.014 0.000725 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5765 on 9 degrees of freedom
## Multiple R-squared:  0.7363, Adjusted R-squared:  0.7071 
## F-statistic: 25.14 on 1 and 9 DF,  p-value: 0.0007254

Total Coliforms

## 
## Call:
## lm(formula = TC.log ~ Q.log, data = x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0443 -0.9515 -0.4254  0.9244  3.9137 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   6.5980     2.3146   2.851   0.0191 *
## Q.log         0.8407     0.5216   1.612   0.1414  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.261 on 9 degrees of freedom
## Multiple R-squared:  0.224,  Adjusted R-squared:  0.1378 
## F-statistic: 2.598 on 1 and 9 DF,  p-value: 0.1414

E. Redo the log-log plots and add the regression line.

I’m not showing the code for this… Hint: to add the regression line, use “lines”, with Qnew as the x-variable and the .pred series as the y-variables.

Here’s a reminder about how to add lines to a plot:

x.example = c(1,2,3,4,5)
y.example = c(2,3,1,6,7)
y2.example = c(4,1,2,8,9)

plot(x.example,y.example)
lines(x.example,y2.example)

II. Load and plot Q data, and sampling points on top.

A. Load Q data for Fashion Valley

Site code for San Diego River at Fashion Valley is 11023000. I’m not showing the code for this…look back at previous exercises. Put the discharge data into a data.frame called “xQ”. Rename the discharge column “Q.ft.s”.

##   agency_cd  site_no       Date X_00060_00003 X_00060_00003_cd
## 1      USGS 11023000 1982-01-18           7.6                A
## 2      USGS 11023000 1982-01-19           8.8                A
## 3      USGS 11023000 1982-01-20         151.0                A
## 4      USGS 11023000 1982-01-21         574.0                A
## 5      USGS 11023000 1982-01-22         215.0                A
## 6      USGS 11023000 1982-01-23          70.0                A

##   agency_cd  site_no       Date Q.ft.s QA
## 1      USGS 11023000 1982-01-18    7.6  A
## 2      USGS 11023000 1982-01-19    8.8  A
## 3      USGS 11023000 1982-01-20  151.0  A
## 4      USGS 11023000 1982-01-21  574.0  A
## 5      USGS 11023000 1982-01-22  215.0  A
## 6      USGS 11023000 1982-01-23   70.0  A

B. Choose a larger set of events and plot the hydrograph, and the water quality sampling dates on top.

par(mfrow=c(1,1))
plot(as.Date(xQ$Date),xQ$Q.ft.s,type="l",xlim=as.Date(c("2006-10-01","2007-04-01")),ylim=c(0,500))
points(dates,x$Q.ft3.s)