Welcome!

Welcome to the Ellerbe Creek Cleanup Tutorial! To download this file and follow along, head over to our GitHub page

SECTION 01: WORKFLOW

Set up your environment

Some basics: To run a block of code, click the green arrow in the top right of the gray box.

Before we start, you need to set up your “environment.” This involves installing and loading any libraries that you need to run the code for this module. Libraries basically store a bunch of really useful functions that other people have developed to make your life easier. You can both download and load packages with the p_load function!

# Load libraries (which have all the important functions). 
# The first line of code only needs to be ran once to load the package that 
# has the p_load function! Run it by removing the "#" before clicking the play button

#install.packages("pacman")
pacman::p_load(tidyverse, lubridate, dataRetrieval)

# Ensure the computer knows where to look for your files
setwd(dirname(rstudioapi::documentPath()))

Load your data

Now that you’ve loaded all the libraries we need, the next step is to load all data for this project.

Notice that we are defining various variables. The following a <- 3 means that we are telling R that we want the variable a to equal the number 3 (we call this “assigning a variable”). You can do this with numbers, text, and datasets, you name it!

# This tells R to look for the folders and then giving them a name
data.dir <- "../01_data" # This folder has all the data
meta.dir <- "../00_meta" # This folder has all the meta data
file.name <- file.path(data.dir, "durham_data_tutorial_version.csv")# Finds file name

# Read dataset
durham_data <- read.csv(file.name)

If you are new to R, maybe you have never seen the following before: %>%. This is called piping and is a powerful way to simplify your code and make it easier to read. Here’s a wonderful tutorial.

SECTION 02: Why R?

Reproducibility

By working in R, we have a record of every change or calculation we did on a dataset. This is very important so your work can be repeated by others. Also so you can repeat it if you find you accidentally made a mistake!

Open Source = Accessible

R is open source. That means it is free and will likely always be! That means anyone can use your code and run it without paying anything for a software. This makes your work accessible to many.

SECTION 03: Cleaning Up

Open the durham_data and take a look

Now, back to the dataset at hand. Did you know that you can view data in R just like you do in Excel? Using the View() command, you can view the dataset like a spreadsheet. The head() function will show you just the top of the table (so it isn’t too overwhelming). Try it out:

head(durham_data)

Disclaimer..Although.Stormwater.Services.makes.every.effort.to.ensure.data.quality.in.the.monitoring.program	errors.do.arise.and.may.not.be.identified.at.the.time.the.data.was.uploaded.to.this.database…The.City.of.Durham.does.not.assume.any.liability.associated.with.the.use.of.these.data.and.is.not.obligated.to.notify.parties.of.modifications.or.changes.to.the.database.	X	X.1	X.2	X.3	X.4	X.5	X.6	X.7	X.8	X.9	X.10	X.11	X.12	X.13

Query Results:
Name:EL10.7EC-EL1.9EC-EL5.0EC-EL5.5GC-EL5.6EC-EL7.1EC-EL7.1SEC-EL7.6SECT-EL7.9EC-EL8.1GC-EL8.2EC-EL8.5SEC-EL8.6SECUT-EL9.9EC-EL-Englewood-EL-Knox
Medium:Water
Parameters: all
Project: Ambient

Wait a minute, something is not right. What’s going on? Welcome to the fun of coding!

Sometimes, inappropriate metadata is stored within a .csv file. CSV, or comma-separated values, should only have data in them. If you open this spreadsheet in Excel, you will see that there is a set of data on top of the original that gives valuable metadata on the collection. We want to preserve this data, as it is important, but in a more appropriate way. An easy way to do this is to read the data in three parts. Once, to grab the metadata; second, to grab the column headers; third, to grab the actual data:

# load the metadata (extra data at the top of the file) separately using "readLines"
meta <- readLines( file.name, n=10) # the metadata stops on row 10
meta <- paste( gsub(',', '', meta), collapse="\n") #remove errant commas
print(meta)

## [1] "Disclaimer: Although Stormwater Services makes every effort to ensure data quality in the monitoring program errors do arise and may not be identified at the time the data was uploaded to this database.  The City of Durham does not assume any liability associated with the use of these data and is not obligated to notify parties of modifications or changes to the database.\n\nQuery Results: \nName:EL10.7EC-EL1.9EC-EL5.0EC-EL5.5GC-EL5.6EC-EL7.1EC-EL7.1SEC-EL7.6SECT-EL7.9EC-EL8.1GC-EL8.2EC-EL8.5SEC-EL8.6SECUT-EL9.9EC-EL-Englewood-EL-Knox \nMedium:Water\nParameters: all \nProject: Ambient\nTime Period:2016-05-02 ~ 2022-05-02\nData was downloaded on May 23 2022 11:41 am\nThe data contains the QA code. "

# now load the durham_data
durham_data <- read.csv(file.name, skip=12, row.names=NULL) #durham_data is on row 12+
head(durham_data)

Station.Name	Parameter	Date.Time	Value	Unit	Comments	Rain.in.Last.24.Hours	Sky.Condition	Flow.Severity	Comp..Code	Thalweg	Lab	Project
EL8.5SEC	Alkalinity	5/11/16 8:40	94	mgCaCO3/L	Staff Gage: 0.68	No	Partly Cloudy	2	G	Yes	SDWRF	Ambient
EL8.5SEC	Alkalinity	6/15/16 8:35	88	mgCaCO3/L	Staff Gage: 0.64; Barely discernable discharge	No	Partly Cloudy	1	G	Yes	SDWRF	Ambient
EL8.5SEC	Alkalinity	7/13/16 8:35	60	mgCaCO3/L	Staff Gage: 0.73	Yes	Overcast	2	G	Yes	SDWRF	Ambient
EL8.5SEC	Alkalinity	8/10/16 8:30	90	mgCaCO3/L	Staff Gage: 0.56	Yes	Partly Cloudy	3	G	Yes	SDWRF	Ambient
EL8.5SEC	Alkalinity	9/14/16 8:40	110	mgCaCO3/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	Partly Cloudy	2	G	Yes	SDWRF	Ambient
EL8.5SEC	Alkalinity	9/14/16 8:40	110	mgCaCO3/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	Partly Cloudy	2	G	Yes	SDWRF	Ambient

Now we have the durham_data we need saved in ‘durham_data’, and the metadurham_data stored in ‘meta’. Let’s save the metadurham_data into a new file (NOT overwriting the raw durham_data).

meta.file <- file.path(meta.dir, 'metadurham_data.txt')
write.table(meta, file=meta.file, row.names=FALSE, col.names=FALSE)

For data cleaning, we’re going to cover three main issues: (1) Inconsistent Data Entry, (2) Duplicate Rows, and (3) Missing Data.

INCONSISTENT DATA ENTRY

Now we are ready to look at the real data! It can be helpful at first to look at the “unique” data values for each column, to get a sense of what you’re working with. Let’s start with “Sky.Condition”

sky_condition<-durham_data %>% 
  # group the data so all like things in "Sky.Condition" are put together
  group_by(Sky.Condition) %>% 
  # count the number of occurances for a given type of "Sky Condition" 
  count() %>% 
  # arrange the rows by highest to lowest value of "n"
  arrange(desc(n)) 

print(sky_condition) # shows the result!

## # A tibble: 9 × 2
## # Groups:   Sky.Condition [9]
##   Sky.Condition       n
##   <chr>           <int>
## 1 "Sunny"          3699
## 2 "Overcast"       2916
## 3 "Partly Cloudy"  2065
## 4 ""                 83
## 5 "overcast"          2
## 6 "outcast"           1
## 7 "party cloudy"      1
## 8 "sunny"             1
## 9 "suny"              1

Ah, it looks like somebody had some inconsistencies with spellings. Let’s deal with some of those by making everything lowercase, then substituting some spelling fixes:

durham_data <- durham_data %>%
  # First, make everything in the column lowercase with 'tolower()' 
  mutate(Sky.Condition = tolower( Sky.Condition )) %>%
  # Next, fix the spelling of "sunny". We use 'gsub(pattern, replacement, searchlist)' 
  # to replace words or phrases. Modify this code to fix the other misspelling!
  mutate(Sky.Condition = gsub("suny", "sunny", Sky.Condition)) %>%
  mutate(Sky.Condition = gsub("pattern", "replacement", Sky.Condition))

sort( unique( durham_data$Sky.Condition ) )

## [1] ""              "outcast"       "overcast"      "partly cloudy"
## [5] "party cloudy"  "sunny"

DUPLICATE durham_data

As you may have noticed from your initial exploration of the data, some rows are duplicated. Dealing with these will be a little tricky, so let’s break it into steps. First, we pull out all the duplicates and save them in a variable called dupes.

# find rows with "duplicate" in the comments, then create a new durham_data frame just with those.
inx.dup <- which( grepl( 'duplicate', tolower(durham_data$Comments) ) ) 
dupes <- durham_data[inx.dup,] 
head( dupes )

	Station.Name	Filtered	Parameter	Date.Time	Value	QA.Code	Unit	Comments	Rain.in.Last.24.Hours	Sky.Condition	Flow.Severity	Comp..Code	Thalweg	Lab	Project
5	EL8.5SEC		Alkalinity	9/14/16 8:40	110.0		mgCaCO3/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient
6	EL8.5SEC		Alkalinity	9/14/16 8:40	110.0		mgCaCO3/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient
14	EL8.5SEC	Field	Aluminum	9/14/16 8:40	6.0	J	ug/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient
15	EL8.5SEC	Field	Aluminum	9/14/16 8:40	5.6	J	ug/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient
23	EL8.5SEC		Aluminum	9/14/16 8:40	288.0		ug/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient
24	EL8.5SEC		Aluminum	9/14/16 8:40	198.0		ug/L	Duplicate Site; Staff Gage: 0.66 Duplicate	No	partly cloudy	2	G	Yes	SDWRF	Ambient

Normally, you could just get rid of duplicates using the function distinct(). distinct() simply drops the second row that is duplicated, however. Take a look at some of these values in dupes – they’re not the same! When looking at your own data, you need to make decisions on how to deal with duplicate data. For now, we’ve decided to take the average of the two values. If you want to use just the first value, distinct() is fine; otherwise, here’s how to replace duplicate values with the averages:

# Create an ID row to sort on station name, filtered, parameter, and date
dupes <- dupes %>% 
  mutate(ID = paste0(Station.Name, Filtered, Parameter, Date.Time)) %>%
  arrange(ID)

# grab the mean values for each combination
means <- dupes %>% 
  group_by(ID) %>% #group dataframe by ID
  summarize(MeanValue = mean(Value, na.rm=TRUE)) #take the mean and ignore NAs

# collapse the duplicate data frame based on ID and sort it the same way
dupes <- dupes %>% 
  distinct(Station.Name, Filtered, Parameter, Date.Time, .keep_all=TRUE)

# make sure both datasets are the same length and in the same order
nrow(dupes) == nrow(means) && all(dupes$ID == means$ID)

## [1] TRUE

# Now that we're sure, replace Value column of distinct dupes with mean values 
# and remove the ID column
dupes$Value <- means$MeanValue
dupes <- dupes %>% dplyr::select(-ID)

# Put it all together! First, remove duplicated rows from the main data frame
durham_data <- durham_data[-inx.dup,]

# Add the averaged, previously-duplicated rows to the end of the data frame
durham_data <- bind_rows(durham_data, dupes) %>%
  arrange(Station.Name, Filtered, Parameter, Date.Time) #sort data again

head(durham_data)

Station.Name	Parameter	Date.Time	Value	QA.Code	Unit	Comments	Rain.in.Last.24.Hours	Sky.Condition	Flow.Severity	Comp..Code	Thalweg	Lab	Project
EL1.9EC	Ammonia Nitrogen	1/19/17 10:30	0.08		mg/L		Yes	sunny	2	G	Yes	SDWRF	Ambient
EL1.9EC	Ammonia Nitrogen	1/2/18 9:45	0.04	U	mg/L		No	sunny	2	G	Yes	SDWRF	Ambient
EL1.9EC	Ammonia Nitrogen	1/4/22 9:00	0.23		mg/L	Winter storm over the weekend	No	sunny	4	G	Yes	SDWRF	Ambient
EL1.9EC	Ammonia Nitrogen	1/5/21 9:20	1.50	J7	mg/L		No	overcast	3	G	No	SDWRF	Ambient
EL1.9EC	Ammonia Nitrogen	1/7/20 9:15	0.05		mg/L		No	overcast	3	G	Yes	SDWRF	Ambient
EL1.9EC	Ammonia Nitrogen	1/8/19 9:50	0.11		mg/L		No	partly cloudy	3	G	Yes	SDWRF	Ambient

MISSING DATA

Sometimes, data is just plain missing. These data aren’t always an issue, but it’s good to know where the missing data is. You can check that out using summary() or is.na(); be mindful if you’re doing counts of rows or other things that would require removing data that is represented as NA, or missing. The function which() can tell you which rows have missing data, and removing them is a cinch. Just be sure to save that data frame as something other than your original, since once you remove rows you’ll have to run your code again to get them back!

summary( durham_data$Value )

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0     2.5     7.6  1587.1    78.0 76800.0       3

any( is.na(durham_data$Value) )

## [1] TRUE

which( is.na(durham_data$Value) )

## [1] 2743 4009 6771

inx.rm <- which( is.na(durham_data$Value) )
durham_data_no_na <- durham_data[-inx.rm,]

DATA FORMAT

Now it’s time to get the data ready for visualizations.

# Let's select just a few columns of data using the "select" function. 
D1 <- durham_data %>%
  dplyr::select(Station.Name, Date.Time, Parameter, Value) 

# Now let's filter for distinct rows. We can do this first by grouping by Name and Date. 
# Put simply, R will basically take all the rows from the same date, then look at rows with the same Station.Name, and finally look for duplicates and remove. 
D2<-D1 %>%
  group_by(Station.Name, Date.Time) %>%
  distinct(Station.Name, Date.Time, Parameter, .keep_all=TRUE) 

# Next we are going to make the dataset "wider". Transforming the dataset.
D3<-D2%>%
  pivot_wider(names_from=Parameter, values_from=Value) 

# And we will finish by cleaning it up
durham_data_wide<-D3 %>% 
  mutate(Date = as.Date(Date.Time, format="%m/%d/%y"),
         Year = year(Date),
         Date = as.POSIXct.Date(Date)) %>%
  relocate(Date, Year, .after=Date.Time) %>%
  ungroup()

# Remove D1, D2, and D3 since durham_data_wide is all we really want 
rm(D1, D2, D3)

SECTION 04: Visualization and Analysis

Now let’s visualize some of this data! Through visualizations, we can take complex or large datasets and simplify them to look for trends. That in turn can help guide our analysis or direct us towards interesting and new questions.

Time Series Plots and Analysis

Starting with the basics, we can think about this dataset as a time series and plot a given analyte against time.

What analytes exist? Well they are the names of the columns. Let’s look at their names

colnames(durham_data_wide)

##  [1] "Station.Name"                "Date.Time"                  
##  [3] "Date"                        "Year"                       
##  [5] "Ammonia Nitrogen"            "Biochemical Oxygen Demand"  
##  [7] "Calcium"                     "Conductivity"               
##  [9] "Copper"                      "Dissolved Oxygen"           
## [11] "Dissolved Oxygen Saturation" "Fecal Coliform"             
## [13] "Hardness"                    "Magnesium"                  
## [15] "Nitrate + Nitrite as N"      "Organic Carbon"             
## [17] "pH"                          "Temperature"                
## [19] "Total Kjeldahl Nitrogen"     "Total Phosphorus"           
## [21] "Total Suspended Solids"      "Turbidity"                  
## [23] "Zinc"                        "Gage height"                
## [25] "Alkalinity"                  "Aluminum"                   
## [27] "Cadmium"                     "Chloride"                   
## [29] "Chromium"                    "Fluoride"                   
## [31] "Iron"                        "Lead"                       
## [33] "Manganese"                   "Nickel"                     
## [35] "Potassium"                   "Sodium"                     
## [37] "Sulfate"

Let’s see how concentrations of calcium changes over time. We can color the points based on the site where the sample was taken.

ggplot() +
  geom_point(data=durham_data_wide, mapping=aes(x=Date, y=Calcium, color=Station.Name)) +
  theme_classic()

## Warning: Removed 23 rows containing missing values (geom_point).

Using similar code to above, choose two different variables and see how they compare to one another!

Hint: You can simply change the x-axis from “Date” to your variable of choosing.

# INSERT YOUR CODE HERE

Cool, we can see how concentration varies overtime, but it’s hard to tell how variation throughout the year differs between sites. So let’s look at a boxplot of this data!

ggplot(
  data=durham_data_wide, aes(x=Station.Name, y=Calcium)) +
  geom_boxplot()

## Warning: Removed 23 rows containing non-finite values (stat_boxplot).

Let’s get fancy and separate this out by year. Maybe there’s a difference between years?

ggplot(
  data=durham_data_wide, aes(x=Station.Name, y=Calcium)) +
  geom_boxplot() +
  facet_wrap(~Year)

## Warning: Removed 23 rows containing non-finite values (stat_boxplot).

Ah shoot the graph is a little messy, let’s clean it up

ggplot(
  data=durham_data_wide, aes(x=Station.Name, y=Calcium)) +
  geom_boxplot() +
  facet_wrap(~Year) +
  theme_classic()+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

## Warning: Removed 23 rows containing non-finite values (stat_boxplot).

Try doing the same visualizations now, but with a different variable. Make sure you use the exact name of the variable as it is found in the dataset!

# INSERT YOUR CODE HERE!

SECTION 05: Exploring a Federal Dataset

It just so happens that the City of Durham isn’t the only one sampling Ellerbe Creek! We also have the US Geological Survey (USGS), which has a bunch of sensors deployed in streams and rivers nationally, and two in Ellerbe Creek.

Let’s explore what they have, using USGS’ R package (dataRetrieval) to download and view the data.

First, let’s introduce you to the wonderful world of CRAN. Every package will usually create a helpful documentation outlining every function in the package. It will include a list of every function, and a help page for each (usually with examples). The dataRetrieval documentation can be found here.

# Look for all sites in NC that have measures for mean daily Discharge, cubic feet per second (code = 00060)
siteListDischarge <- whatNWISsites(stateCd="NC",parameterCd="00060") %>% 
  filter(str_detect(station_nm, "ELLERBE")) # Filter for sites that are for Ellerbe Creek

head(siteListDischarge)

agency_cd	site_no	station_nm	site_tp_cd	dec_lat_va	dec_long_va	colocated	queryTime
USGS	02128576	LITTLE MOUNTAIN CREEK NEAR ELLERBE, NC	ST	35.08904	-79.83311	FALSE	2022-10-27 09:47:31
USGS	02128500	BIG MOUNTAIN CREEK NEAR ELLERBE, NC	ST	35.12015	-79.81616	FALSE	2022-10-27 09:47:31
USGS	0208675010	ELLERBE CREEK AT CLUB BOULEVARD AT DURHAM, NC	ST	36.01939	-78.89478	FALSE	2022-10-27 09:47:31
USGS	02086849	ELLERBE CREEK NEAR GORMAN, NC	ST	36.05931	-78.83251	FALSE	2022-10-27 09:47:31

Site Search in NC

Let’s start by looking for all sites in NC that have measures for mean daily Discharge, cubic feet per second (code = 00060).

siteListDischarge <- whatNWISsites(stateCd="NC",parameterCd="00060") %>% 
  filter(str_detect(station_nm, "ELLERBE")) # Filter for sites that are for Ellerbe Creek

head(siteListDischarge)

agency_cd	site_no	station_nm	site_tp_cd	dec_lat_va	dec_long_va	colocated	queryTime
USGS	02128576	LITTLE MOUNTAIN CREEK NEAR ELLERBE, NC	ST	35.08904	-79.83311	FALSE	2022-10-27 09:47:32
USGS	02128500	BIG MOUNTAIN CREEK NEAR ELLERBE, NC	ST	35.12015	-79.81616	FALSE	2022-10-27 09:47:32
USGS	0208675010	ELLERBE CREEK AT CLUB BOULEVARD AT DURHAM, NC	ST	36.01939	-78.89478	FALSE	2022-10-27 09:47:32
USGS	02086849	ELLERBE CREEK NEAR GORMAN, NC	ST	36.05931	-78.83251	FALSE	2022-10-27 09:47:32

Download and view data from Ellerbe Creek, in Durham, NC

Sweet! Okay now let’s look at two sites in Ellerbe Creek. These are sites with stream gauges, thus lots of data. They are the sites near Club Boulevard (Durham,NC) and Gorman, NC.

Let’s start by exploring what data is available.

Club_Code<-"0208675010"
Gorman_Code<-"02086849"

siteNos<-c(Club_Code, Gorman_Code)

availableData <- whatNWISdata(siteNumber = siteNos)

View other parameters

Turns out that it isn’t just flow data that is available through dataRetrevial. Let’s see the full list of possible parameters (this will take a minute to load).

# This will take a minute to load
pcode <- readNWISpCode("all")

head(pcode)

parameter_cd	parameter_group_nm	parameter_nm	srsname	parameter_units
00001	Information	Location in cross section, distance from right bank looking upstream, feet		ft
00002	Information	Location in cross section, distance from right bank looking upstream, percent		%
00003	Information	Sampling depth, feet		ft
00004	Physical	Stream width, feet	Instream features, est. stream width	ft
00005	Information	Location in cross section, fraction of total depth, percent		%
00008	Information	Sample accounting number		nu

# See how many parameters there are
pcode %>% count()

n
24738

Let’s figure out what each of those parameters are for our sites!

parameters_at_sites<-availableData %>% 
  select(site_no, station_nm, parm_cd, begin_date, end_date, count_nu) %>% 
  na.omit() %>% 
  rename(parameter_cd = parm_cd) %>% 
  left_join(pcode)

## Joining, by = "parameter_cd"

parameters_at_sites %>% 
  arrange(desc(count_nu)) %>% 
  mutate(count_rank=row_number()) %>% 
  ggplot(aes(x= count_rank, y=count_nu, shape=station_nm, color=station_nm)) +
  geom_point()+
  facet_wrap(~station_nm) +
  scale_y_log10() +
  theme_classic() + 
  theme(legend.position = "bottom")

Turns out a vast majority of the data collected was only collected on a few days of sampling. We want to look at temporal trends, so let’s see what is regularly monitored here.

parameters_at_sites %>% 
  filter(count_nu >100) %>% 
  group_by(station_nm, parameter_cd) %>% 
  count()

station_nm	parameter_cd	n
ELLERBE CREEK AT CLUB BOULEVARD AT DURHAM, NC	00060	4
ELLERBE CREEK AT CLUB BOULEVARD AT DURHAM, NC	00065	4
ELLERBE CREEK NEAR GORMAN, NC	00010	1
ELLERBE CREEK NEAR GORMAN, NC	00060	2
ELLERBE CREEK NEAR GORMAN, NC	00061	1
ELLERBE CREEK NEAR GORMAN, NC	00065	4
ELLERBE CREEK NEAR GORMAN, NC	00095	2
ELLERBE CREEK NEAR GORMAN, NC	00191	1
ELLERBE CREEK NEAR GORMAN, NC	00400	1
ELLERBE CREEK NEAR GORMAN, NC	30209	1
ELLERBE CREEK NEAR GORMAN, NC	63160	1

Looks like it is parameter code 00060, what is that? Ah ha! It’s discharge. Makes sense since it is a gage :D

Explore Discharge data

The first site (Club) is upstream of the second. Based on this, which would you expect to have a higher flow? Let’s verify by loading in and viewing some data.

pCode <- "00060"

Discharge_Data <- readNWISdv(siteNos, pCode, "2005-01-01","2021-12-31")

View data

ggplot() +
  geom_point(data=Discharge_Data, mapping=aes(x=Date, y=X_00060_00003, color=site_no)) +
  theme_classic() +
  labs(y="Average Daily Discharge (ft^3/sec)") +
  facet_wrap(~site_no)+
  theme(legend.position = "bottom")

Yikes that is messy. Maybe we should zoom into one year, and then replace with lines and then overlay.

# Filter data
Discharge_Data_2020<-Discharge_Data %>% 
  filter(Date > "2020-01-01" & Date < "2020-12-31")

# Plot data
ggplot() +
  geom_line(data=Discharge_Data_2020, mapping=aes(x=Date, y=X_00060_00003, color=site_no)) +
  theme_classic() +
  labs(y="Average Daily Discharge (ft^3/sec)")

You try now!

Now you give it a try. Maybe select a different site, or a different measure.

If you want to get even deeper, check out this tutorial from USGS that outlines even more things that you can do with this package.

SECTION 05: Conclusion

This is only the start of all the wonderful things you can do in R! Please reach out to Jonny if you’d like more tutorials to dive deeper into the R world of fun.

Ellerbe Creek Cleanup Tutorial

Margaret Swift, Jonathan Behrens

27 October, 2022