Document last updated 2021-03-15 19:26:35 by Benjamin Meyer ()

www.kenaiwatershed.org


Introduction

This draft document contains preliminary assessment of data from the Kenai River Baseline Water Quality Monitoring Program (https://kenaiwatershed.org/science-in-action/research-information/water-quality/).


Notes on data sourcing

Data is sourced from the following queries at https://waterqualitydata.us on Feb 24, 2021:

CSV download for sample data: https://www.waterqualitydata.us/portal/#bBox=-151.322501%2C60.274310%2C-149.216144%2C60.738915&mimeType=csv&dataProfile=narrowResult

CSV download for site data: https://www.waterqualitydata.us/portal/#countrycode=US&statecode=US%3A02&countycode=US%3A02%3A122&bBox=-151.322501%2C60.274310%2C-149.216144%2C60.738915&mimeType=csv

Note: these CSV files are excluded from the GitHub repository because they are too large to sync. To reproduce the analysis, download and save these files locally instead. (See the ReadMe file at data/WQX_downloads in the repository).

Using these same queries in the future will download the most current csv files.




Import data

Import data from local csv files

# import data from EPA repo

# designate download date
download_date <- "20210224"

# import sample data
dat <- read_csv(paste0("data/WQX_downloads/downloads/download_",download_date,"/narrowresult.csv"))

# import site data
site_dat <- read_csv(paste0("data/WQX_downloads/downloads/download_",download_date,"/station.csv"))

# join site and sample data
dat <- left_join(dat,site_dat) %>%
  # remove empty columns
  select_if(~!(all(is.na(.))))

# in the future, use EGRET package (http://usgs-r.github.io/EGRET/articles/EGRET.htmlmaps) for automated database query.

What organizations in the database contain the word “Kenai”?

# filter if OrganizationFormalName has the term "Kenai"
t <- dat %>%
  select(OrganizationIdentifier,OrganizationFormalName) %>%
  filter(grepl("Kenai",OrganizationFormalName)) %>%
  distinct()

datatable(t)


How many years of Kenai Watershed Forum data do we have in the waterqualitydata.us database?

t <- dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum(Volunteer)*") %>%
  filter(!is.na(ActivityStartDate)) %>%
  summarise(min_date = min(date(ActivityStartDate)),
            max_date = max(date(ActivityStartDate)))

datatable(t)


It appears that data after spring 2013 is not in the EPA database. (Confirmed w/ ADEC in spring 2020).

Temporary solution - use data compiled from PDFs on the KWF local server.

Import data from local KWF server:

# import compiled data 2014 - 2020 form local kwf server
# note: we will not need this step once everything's uploaded to WQX
kwf_dat <- read_excel("data/Compiled_KRBWQM_data_2014_2020.xlsx", sheet = "Master") %>%
    select(-Year,-Season,-ChannelType,-TestType,-Code,-Duplicate,-Lab) %>%
  
  # create and rename columns to match EPA database format
  mutate("OrganizationFormalName" = "Kenai Watershed Forum(Volunteer)*",
         "MonitoringLocationTypeName" = "River/Stream") %>%
  rename("ActivityStartDate"= "Date",
         "MonitoringLocationName" = "Site",
         "CharacteristicName" = "Parameter",
         "ResultMeasureValue" = "Result",
         "ResultMeasure/MeasureUnitCode" = "Units") %>%
  # match column formats to EPA database
  transform(ResultMeasureValue = as.double(ResultMeasureValue))

# join site coordinates to 2014-2020 data
sites <- read.csv("data/sampling_sites/site_names_matching.csv") %>%
  rename("MonitoringLocationName" = "name_2014_2020")
kwf_dat <- left_join(kwf_dat,sites)


# join kwf and epa data
dat <- bind_rows(dat,kwf_dat)


What kind of sites are present in our data set?

t <- data.frame(unique(dat$MonitoringLocationTypeName))
datatable(t)


Retain surface water sites only. Exclude well sampling sites.

# create and apply filter
surface <- c("River/Stream","Lake","River/Stream Perennial","BEACH Program Site-Ocean","BEACH Program Site-River/Stream","Lake, Reservoir, Impoundment","Stream","Spring")

dat <- dat %>%
  filter(MonitoringLocationTypeName %in% surface)


Retain a subset of useful columns

 # retain select subset of potentially useful columns
dat <- dat %>%
  select("OrganizationFormalName",
"ActivityStartDate" ,
"ActivityStartTime/Time",
"ActivityStartTime/TimeZoneCode" ,               
"MonitoringLocationIdentifier"       ,           
"CharacteristicName"                  ,          
"ResultMeasureValue"       ,                     
"ResultMeasure/MeasureUnitCode"      ,           
"ResultStatusIdentifier"         ,               
"ResultValueTypeName"         ,                  
"ResultAnalyticalMethod/MethodIdentifier"       ,
"ResultAnalyticalMethod/MethodIdentifierContext",
"ResultAnalyticalMethod/MethodName"    ,         
"MonitoringLocationName"          ,              
 "MonitoringLocationTypeName"      ,              
 "HUCEightDigitCode"         ,                    
 "DrainageAreaMeasure/MeasureValue"        ,      
 "DrainageAreaMeasure/MeasureUnitCode"      ,     
 "LatitudeMeasure"          ,                     
 "LongitudeMeasure"       ,                       
 "SourceMapScaleNumeric"    ,                     
 "HorizontalAccuracyMeasure/MeasureValue"        ,
 "HorizontalAccuracyMeasure/MeasureUnitCode"     ,
 "HorizontalCollectionMethodName"          ,      
 "HorizontalCoordinateReferenceSystemDatumName"  ,
 "VerticalMeasure/MeasureValue"       ,           
 "VerticalMeasure/MeasureUnitCode"     ,          
 "VerticalAccuracyMeasure/MeasureValue"       ,   
 "VerticalAccuracyMeasure/MeasureUnitCode"      , 
 "VerticalCollectionMethodName"         ,         
 "VerticalCoordinateReferenceSystemDatumName")   


# remove extraneous text from "Kenai Watershed Forum(Volunteer)*"
dat <- dat %>%
  mutate(OrganizationFormalName = gsub("\\s*\\([^\\)]+\\)","",as.character(dat$OrganizationFormalName))) %>%
  mutate(OrganizationFormalName = str_remove(OrganizationFormalName,"\\*"))


# to do: adapt vignette for multiple sites:
# https://github.com/USGS-R/EGRET

# potentially also useful: https://waterdata.usgs.gov/nwis/inventory?search_criteria=lat_long_bounding_box&submitted_form=introduction

# additional dataframe prep
dat <- dat %>%
  transform(ActivityStartDate = date(ActivityStartDate)) %>%
  
  # remove missing observations
  filter(!is.na(ResultMeasureValue))



Exploratory Data Analysis

Now, how many years of data do we have, including from the waterqualitydata.us database AND the KWF local server?

dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum") %>%
  summarise(min_date = min(ActivityStartDate),
            max_date = max(ActivityStartDate))
##     min_date   max_date
## 1 2000-07-18 2020-07-21


How many data points total from water quality grab samples?

exclude_parameters <- c("Temperature, air", "Temperature, water")

(z <- dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum",
         CharacteristicName %ni% exclude_parameters) %>%
  distinct() %>%
  count())
##       n
## 1 10596


How many different kinds of substances have we measured ?

dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum") %>%
  distinct(CharacteristicName) %>%
  count()
##    n
## 1 39
# note: some substances are labeled "(surr)" for surrogate. Need to research what this means; subtract form total count for now


What are the names of all of substances have we measured ?

param <- dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum") %>%
  distinct(CharacteristicName) 

datatable(param)


Create and export table of site names

# create summary table
tbl <- dat %>%
  filter(OrganizationFormalName == "Kenai Watershed Forum",
         CharacteristicName %ni% exclude_parameters) %>%
  group_by(MonitoringLocationName,LatitudeMeasure,LongitudeMeasure) %>%
  summarise(min_date = min(ActivityStartDate),
            max_date = max(ActivityStartDate)) %>%
  rename("latitude" = "LatitudeMeasure",
         "longitude" = "LongitudeMeasure")

# export csv
write.csv(tbl,"data/sampling_sites/2000_2014_sitenames.csv", row.names = F)



Map of Sample Sites


Access ArcGIS Online map at https://arcg.is/0LXGSf