In this lab, will will learn how to download, wrangle, and plot water quality data and create a R markdown report as a html that can be downloaded and viewed in a browser. You must type your answers directly into this R markdown file in the space in between R chunks where it says “ANSWER =”

QUESTION 1

How many different (HINT: unique()) sites in Pennsylvania have data on the water quality and discharge variables of interest?

ANSWER = 716 sites

### USGS parameter codes for select water quality parameters
# discharge (ft3/s) = 00060
# nitrate (mg/L) = 99133
# chl-a (ug/L) =  32316
# temperature (C) = 00010
# specific conductivity (uS/cm) =  00095

## make vector of parameters of interest 
params <- c("00060", "00010", "32316", "00300")

## Find what USGS sites have mean daily values of the given parameters in Pennsylvania. 
## Look up ?whatNWISdata and fill in missing arguments. This is going to take a while to ## run so be patient and try not to run it more than once (i.e. save this data or R session so you do not have to run it again when if working on the lab in multiple sessins)
sites <- whatNWISdata(stateCd = "PA" , service= "dv" , statCd="00003", parameterCd = params)

uniquePA <- unique(sites$site_no)

QUESTION 2:

How many sites had daily data from all 3 water quality variables and discharge? And where is this site, what river? (Look up the site_no online).

ANSWER = 1 site, Delaware River at Callicoon NY

## HINT: You could Use the group_by function to summarise data by site_no. Count the length() of unique() parm_cd and filter() to sites with 4. If you have another method, that's fine too. 

site_sum <- sites %>%
  group_by(site_no) %>%
  summarise(x = length(unique(parm_cd))) %>%
  filter(x > 3)

Load and reshape the data

# read in daily values from the site(s) above which had all 4 parameters
data <- readNWISdv(siteNumbers = "01427510" , parameterCd = params) 

# clean up data by changing the column names to something more readable and then filtering to only rows (i.e. days) that have data for all 4 parameters using filter() OR drop_na() function
data_clean <- data %>%
  rename(temp = X_00010_00003, q = X_00060_00003, do = X_00300_00003, chla= X_32316_00003) %>%
  drop_na("temp", "q", "do", "chla")
  #filter or drop_na

# convert the data to long format for easy plotting of all 4 variables. Look up the pivot_longer function and fill in missing arguments
data_long <- data_clean %>%
  pivot_longer(cols = c(temp, q, do, chla) , names_to= "parameter" , values_to= "count" )

FIGURE 1:

Make figure that includes a time series for all 4 parameters using facet_wrap (or something similar) and the long data frame from above. Remember to set scales=“free” in facet_wrap

Question 3:

Look at the 4 time series plots. Describe the nature of the annual trend and its variability in chl-a and dissolved oxygen, and what you think might be some important controls over chl-a and DO in this river. Make this an interactive plot if you need to to inspect the higher frequency variability (use ggplotly())

ANSWER = The annual trend in chl-a seems highly variable or flashy. Chl-a tends to start pretty low at about 1 or 2ug/L in January and increases to about 10ug/L in about April. After this initial peak, values quickly return to 1 or 2ug/L, which is then followed by 2 or 3 similar spikes in the data. In about June, values are the lowest in the year overall, with one spike in the data being seen during the summer duration. In fall, a peak can be seen recording the highest measurement over the entire year before returning to the 1 or 2ug/L value to start the year again. For the dissolved oxygen trend, measurements seem to be the highest in January, decrease (in a serious of peaks and dips…not linear) until July or August, and increase back up to the starting values of about 15 mg/L (again nonlinear) until January, where the trend starts again. Since both of these parameters tend to have a seasonal pattern, I would hypothesize that temperature and sunlight exposure would be controls over the concentrations. In fact, the dissolved oxygen and temperature graphs seem to be inverses. The chl-a and discharge graphs seem to be similar as well.

ggplot(data_long) +
  geom_line(aes(x = Date , y = count)) +
  facet_wrap(~parameter, scales = "free")

## FIGURE 2. Make a time series of daily nitrate load and daily discharge for the Potomac River in Maryland. This could be 2 separate figures or just one figure

QUESTION 4:

Looking at this figure, what is the relative role of discharge and concentration in controlling nitrate load?

ANSWER: The 2 graphs are almost identical. When daily discharge increases, daily nitrate load increases. When daily discharge decreases, daily nitrate loade decreases. This translates to peaks and dips lining up together. The only major difference is that the scales are different. The daily discharge is measured at a much higher concentration, but this is most likely due to the conversions that were performed to get nitrate load into kg/day. Therefore, I would say that there is a direct correlation between these two variables and discharge controls daily nitrate load.

## If you want to look up What sites in Maryland have daily nitrate data
#site_2 <- whatNWISdata(stateCd = "MD", service="dv", statCd="00003", parameterCd =c("99133"))

## load in daily nitrate and discharge data for the Potomac river in Washington DC. Fill in missing parameterCd
params2 <- c("00060", "99133")

data_2 <- readNWISdv(siteNumbers = "01646500", parameterCd = params2)

## clean up the columns, drop rows with missing discharge or nitrate data, and add a column that  calculates daily nitrate load in Kg/ day from the Q and concentration data. Fill in missing code.
## Note, discharge is mean daily Q in ft3/s, so we can just convert this to m3/s and assume its the mean represents the day, so we can assume mean m3/s = m3/day
## Note nitrate is in mg/L

data_2_clean <- data_2 %>%
  rename( q = X_00060_00003, nitrate= X_SUNA_99133_00003) %>%
  drop_na("q", "nitrate")%>%
  mutate(CubicMetersPerDay = (q)/35.3147 )%>%
  mutate(NitrateKgPerCubicMeter = (nitrate)* .001)%>%
  mutate(NitrateLoad = CubicMetersPerDay * NitrateKgPerCubicMeter)
  

## make a time series of daily nitrate load and daily discharge

ggplot(data_2_clean) +
  geom_line(aes(x = Date , y = NitrateLoad))+
  ylab("Nitrate Load (kg/day)")

ggplot(data_2_clean) +
  geom_line(aes(x = Date , y = CubicMetersPerDay))+
  ylab("Discharge (cubic meters/day)")