# set up session -----------------------------------------------------------
# clear all objects from Global Environment
rm(list = ls())
# install packages
library(readxl) # install package for importing .xlsx files
library(data.table)
library(lubridate) # change date data formatting
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr) # sort data
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr) # flexible operations via commands in quotes, called strings
# set reference file repository
setwd("/Users/wware/Library/CloudStorage/OneDrive-ucsc.edu/Records/Jobs/CalTrout/Fish Info/Coastal/Central Coast Watersheds/Data")
– Data Preparation –
Import the Excel spreadsheet titled “Caltrout queries 050823.xlsx” then format it for further manipulation by reviewing and creating variables. Review the “Time” variable that contains the date and time down to seconds when fish were detected. Separating the date from the time streamlines data sorting.
# format data -------------------------------------------------------------
# import Excel file
dat<-read_excel("Caltrout queries 050823.xlsx")
# view format of "Time" variabe with date and time data
str(dat$Time) # POSIXct format is time variable set to local time
## POSIXct[1:7276], format: "2021-12-17 09:56:32" "2021-12-25 04:17:14" "2021-12-25 04:17:18" ...
# create a new "Date" variable from the "Time" variable
dat$Date <- format(dat$Time, "%m/%d/%Y")
# check whether reformatting worked
str(dat$Date) # character variable
## chr [1:7276] "12/17/2021" "12/25/2021" "12/25/2021" "12/25/2021" ...
# change "Date" to date variable
dat$Date <- as.character.Date(dat$Date)
# check again whether reformatting worked
str(dat$Date) # seemingly unchanged
## chr [1:7276] "12/17/2021" "12/25/2021" "12/25/2021" "12/25/2021" ...
# view format of "Antenna" variable
str(dat$Antenna) # character
## num [1:7276] 1 2 2 2 1 1 1 1 1 2 ...
# change "Antenna" variable from character to factorial variable
dat$Antenna <- as.factor(dat$Antenna)
# check whether reformatting worked
str(dat$Antenna) # it did
## Factor w/ 2 levels "1","2": 1 2 2 2 1 1 1 1 1 2 ...
– Data Filtering –
Create a new data frame, or data set, for only PIT numbers that were detected more than once. This will leave the parent data frame unaltered so that we can further investigate and manipulate the subset of PIT numbers with multiple detections. Our first manipulation is removing physical test tags. These are PIT numbers for tags that were not placed in fish and are used to check antenna detection efficiency and antenna ordering, such as #982000365411489 and #982000365411491.
# review detections generally -------------------------------------------------
# sort duplicate rows
dat2 <- dat[duplicated(dat$Fish.PITNum), ]
# 7241 PIT #'s with > 1 detection from 7276 total detections in "dat" dataframe
# remove physical test tags
dat3 <- dat2 %>%
filter(!Fish.PITNum %in% c(982000365411489, 982000365411491))
# remove PIT #'s 982000365411489, 982000365411491,
– “Ghost” Tags –
Other PIT tag numbers to remove are those corresponding with “ghost” tags or those that continue being detected despite no longer being associated with fish. It can be tricky to determine ghost tags in data. Indicators are when PIT tags end in “00000” because specialized PIT tags have a zinc coating that dissolves in the stomach acid of predators, prompting the end of tag codes to change to zeroes. - Step 1: select rows with PIT #’s ending in “00000”
dat3 %>% filter(str_detect(Fish.PITNum, "00000$"))
# none were found in the resulting tibble that shows 0 rows selected
# from 14 columns in dat3
dat3 %>% filter(Fish.PITNum == "900226001052951") # flagged by SW Fisheries Science Center
# not in this data set based on code response
# response 1: same number of observations between "dat2" & "dat3"
# response 2: no selection when viewing result of above code
dat4 <- dat3 %>% count(Fish.PITNum) %>%
# count detections per PIT #
filter(n >= 3)
# for PIT #'s with more than 2 detections
There are 31 PIT numbers with more than two detections and PIT #982126057446372 has more than 5960 detections. It is unlikely that a single fish was detected this many times, even if it lingered around antennas for an extended period of time. Therefore, PIT #982126057446372 can be considered a ghost tag.
dat4 %>% filter(!Fish.PITNum %in% c(982126057446372))
# remove PIT # 982126057446372 since it has over 5960 detections
dat5 <- dat3 %>% filter(Fish.PITNum %in%
# select PIT #'s with over 10 detections
c(982126057448749, # 217 detections
982126057446366, # 413 detections
982126057446381, # 73 detections
982126057448705, # 71 detections
982126057448585, # 60 detections
982126057446441, # 56 detections
982126057448453, # 55 detections
982000365411333, # 54 detections
982126057448680, # 48 detections
982126057448545, # 25 detections
900226000899861, # 23 detections
982126057446348, # 21 detections
900226000295353, # 15 detections
900226000899787, # 12 detections
982000365411671, # 11 detections
982000365411511, # 10 detections
982000365411332, # 10 detections
982126057448604)) # 10 detections
Use detection records from the above thirty PIT tag numbers to infer the difference between ghost tags, fish that were detected multiple times over a short duration, and the direction that fish swam along the creek. - Step 1: Order rows by PIT number to facilitate investigation.
dat5 <- dat5 %>% arrange(Fish.PITNum)
# numerically order detection records across antennas & other variables by PIT tag number
unique(dat5$Species)
## [1] "onmy"
# all fish with over two detections are O. mykiss
These are likely steelhead since they were tagged & measured in Pescadero Marsh, based on row values for the “FL(mm)”, “SampleMethod”, and “Reach” columns of the “dat5” dataframe.
# View(dat5)
Inference I: live fish
#900226000295353: may have swam upstream on 01/01/2022 & downstream on 04/17/2022 after original capture on 09/11/2019 #900226000899787: may have swam upstream on 03/04/2022 then downstream on 03/28/2022 after original capture on 07/16/2019 #900226000899861: may have swam upstream on 03/15/2022 after capture on 07-16-2019 & 08/13/2019 #982000365411332: may have swam upstream on 03/05/2022 after capture on 09/10/2019 #982000365411333: may have swam past the same antenna on 03/24/2022 & 12/27/2022 after capture on 10/08/2019 #982000365411511: may have swam upstream on 01/11/2022 after capture on 10/08/2019 #982000365411671: may have swam upstream on 03/20/2022 after capture on 10/08/2019 #982126057446348: may have swam upstream on 06/21/2022 after capture on 04/18/2022 & 06/08/2022
- Inference II: ghost tags
#982126057446366:
may be a ghost tag due detections across multiple days that are spaced out and do not correspond with expected steelhead migration times in mid-late December and early March.
ghost <- dat5 %>% filter(Fish.PITNum == "982126057446366")
# select the potential ghost tag from "dat5"
unique(ghost$Date)
## [1] "06/18/2022" "06/20/2022" "06/23/2022" "06/25/2022" "06/26/2022"
## [6] "07/03/2022" "07/04/2022" "07/05/2022" "07/06/2022" "07/14/2022"
# review detection dates for potential ghost tag
PIT #982126057446366 was detected 413 times across midsummer days. The tag could correspond with a fish that lingered around antennas on different days.
– Detection Efficiency –
Estimate detection efficiency, or how many fish are detected by one antenna and not both. This will be done following protocol from California Sea Grant’s Russian River Coho Monitoring Program, shown on slide 17 of the linked Power Point presentation. slides - https://caseagrant.ucsd.edu/sites/default/files/2019SRF_PITFieldTour_Presentation_1.pdf
dat6 <- dat5 %>% count(., Fish.PITNum, Antenna)
# count the number of detections per PIT number for each antenna
d.eff <- dat6 %>% group_by(Antenna) %>% summarize(Proportion = n)
## `summarise()` has grouped output by 'Antenna'. You can override using the
## `.groups` argument.
# calculate the proportion of detections at antenna 1 relative to antenna 2
– Compiling Datasets –
dat6 <- setDT(dat5)[, .(FirstDetection = min(Time),
# only select first detections
LastDetection = max(Time)),
# and last detections
by = .(Antenna, Fish.PITNum, Date)]
# per antenna, PIT #, & date
Step 2: Combine PIT tag numbers with more than two detections with those detected once and twice.
Step 3: Combine detections from “Caltrout queries 050823.xlsx” Excel workbook with previously compiled data that was reviewed in a combination of R Studio, Excel, and Google Sheets.
– Information on Markdown Notebooks –
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.