Pescadero PIT Detection Data Review

# set up session -----------------------------------------------------------
  # clear all objects from Global Environment
    rm(list = ls())
  # install packages
    library(readxl) # install package for importing .xlsx files
    library(data.table)
    library(lubridate) # change date data formatting

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

    library(dplyr) # sort data

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:data.table':
## 
##     between, first, last

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

    library(stringr) # flexible operations via commands in quotes, called strings
  # set reference file repository
    setwd("/Users/wware/Library/CloudStorage/OneDrive-ucsc.edu/Records/Jobs/CalTrout/Fish Info/Coastal/Central Coast Watersheds/Data")

– Data Preparation –

Import the Excel spreadsheet titled “Caltrout queries 050823.xlsx” then format it for further manipulation by reviewing and creating variables. Review the “Time” variable that contains the date and time down to seconds when fish were detected. Separating the date from the time streamlines data sorting.

# format data -------------------------------------------------------------
  # import Excel file
    dat<-read_excel("Caltrout queries 050823.xlsx")
  
  # view format of "Time" variabe with date and time data
    str(dat$Time) # POSIXct format is time variable set to local time

##  POSIXct[1:7276], format: "2021-12-17 09:56:32" "2021-12-25 04:17:14" "2021-12-25 04:17:18" ...

  # create a new "Date" variable from the "Time" variable
    dat$Date <- format(dat$Time, "%m/%d/%Y")
  # check whether reformatting worked
    str(dat$Date) # character variable

##  chr [1:7276] "12/17/2021" "12/25/2021" "12/25/2021" "12/25/2021" ...

  # change "Date" to date variable
    dat$Date <- as.character.Date(dat$Date)
  # check again whether reformatting worked
    str(dat$Date) # seemingly unchanged

##  chr [1:7276] "12/17/2021" "12/25/2021" "12/25/2021" "12/25/2021" ...

  # view format of "Antenna" variable
    str(dat$Antenna) # character

##  num [1:7276] 1 2 2 2 1 1 1 1 1 2 ...

  # change "Antenna" variable from character to factorial variable
    dat$Antenna <- as.factor(dat$Antenna)
  # check whether reformatting worked
    str(dat$Antenna) # it did

##  Factor w/ 2 levels "1","2": 1 2 2 2 1 1 1 1 1 2 ...

– Data Filtering –

Create a new data frame, or data set, for only PIT numbers that were detected more than once. This will leave the parent data frame unaltered so that we can further investigate and manipulate the subset of PIT numbers with multiple detections. Our first manipulation is removing physical test tags. These are PIT numbers for tags that were not placed in fish and are used to check antenna detection efficiency and antenna ordering, such as #982000365411489 and #982000365411491.

# review detections generally -------------------------------------------------
  # sort duplicate rows
    dat2 <- dat[duplicated(dat$Fish.PITNum), ] 
      # 7241 PIT #'s with > 1 detection from 7276 total detections in "dat" dataframe
  
  # remove physical test tags
    dat3 <- dat2 %>%
      filter(!Fish.PITNum %in% c(982000365411489, 982000365411491)) 
      # remove PIT #'s 982000365411489, 982000365411491,

– “Ghost” Tags –

Other PIT tag numbers to remove are those corresponding with “ghost” tags or those that continue being detected despite no longer being associated with fish. It can be tricky to determine ghost tags in data. Indicators are when PIT tags end in “00000” because specialized PIT tags have a zinc coating that dissolves in the stomach acid of predators, prompting the end of tag codes to change to zeroes. - Step 1: select rows with PIT #’s ending in “00000”

    dat3 %>% filter(str_detect(Fish.PITNum, "00000$"))

      # none were found in the resulting tibble that shows 0 rows selected 
        # from 14 columns in dat3

Step 2: Remove other select rows with PIT #’s that project partners are suspicious of

    dat3 %>% filter(Fish.PITNum == "900226001052951") # flagged by SW Fisheries Science Center

        # not in this data set based on code response
          # response 1: same number of observations between "dat2" & "dat3"
          # response 2: no selection when viewing result of above code

Step 3: Select rows with PIT numbers that were detected more than twice because ghost tags often have multiple consecutive detections as they are deposited near antennas before washing downstream.

   dat4 <- dat3 %>% count(Fish.PITNum) %>% 
        # count detections per PIT #
      filter(n >= 3)
        # for PIT #'s with more than 2 detections

There are 31 PIT numbers with more than two detections and PIT #982126057446372 has more than 5960 detections. It is unlikely that a single fish was detected this many times, even if it lingered around antennas for an extended period of time. Therefore, PIT #982126057446372 can be considered a ghost tag.

  dat4 %>% filter(!Fish.PITNum %in% c(982126057446372))

        # remove PIT # 982126057446372 since it has over 5960 detections

Step 4: Investigate the remaining 30 selected PIT #’s with over two detections across antennas.

    dat5 <- dat3 %>% filter(Fish.PITNum %in%
      # select PIT #'s with over 10 detections
          c(982126057448749, # 217 detections
            982126057446366, # 413 detections
            982126057446381, # 73 detections
            982126057448705, # 71 detections
            982126057448585, # 60 detections
            982126057446441, # 56 detections
            982126057448453, # 55 detections
            982000365411333, # 54 detections
            982126057448680, # 48 detections
            982126057448545, # 25 detections
            900226000899861, # 23 detections
            982126057446348, # 21 detections
            900226000295353, # 15 detections
            900226000899787, # 12 detections
            982000365411671, # 11 detections
            982000365411511, # 10 detections
            982000365411332, # 10 detections
            982126057448604)) # 10 detections

Use detection records from the above thirty PIT tag numbers to infer the difference between ghost tags, fish that were detected multiple times over a short duration, and the direction that fish swam along the creek. - Step 1: Order rows by PIT number to facilitate investigation.

dat5 <- dat5 %>% arrange(Fish.PITNum)
  # numerically order detection records across antennas & other variables by PIT tag number

Step 2: Review recorded species of detections to support subsequent inferences.

  unique(dat5$Species)

## [1] "onmy"

    # all fish with over two detections are O. mykiss

These are likely steelhead since they were tagged & measured in Pescadero Marsh, based on row values for the “FL(mm)”, “SampleMethod”, and “Reach” columns of the “dat5” dataframe.

Step 3: Make inferences on whether PIT tag numbers are for ghost tags or live fish.

  # View(dat5)

Inference I: live fish

#900226000295353: may have swam upstream on 01/01/2022 & downstream on 04/17/2022 after original capture on 09/11/2019 #900226000899787: may have swam upstream on 03/04/2022 then downstream on 03/28/2022 after original capture on 07/16/2019 #900226000899861: may have swam upstream on 03/15/2022 after capture on 07-16-2019 & 08/13/2019 #982000365411332: may have swam upstream on 03/05/2022 after capture on 09/10/2019 #982000365411333: may have swam past the same antenna on 03/24/2022 & 12/27/2022 after capture on 10/08/2019 #982000365411511: may have swam upstream on 01/11/2022 after capture on 10/08/2019 #982000365411671: may have swam upstream on 03/20/2022 after capture on 10/08/2019 #982126057446348: may have swam upstream on 06/21/2022 after capture on 04/18/2022 & 06/08/2022

- Inference II: ghost tags

  #982126057446366:
    may be a ghost tag due detections across multiple days that are spaced out and do not          correspond with expected steelhead migration times in mid-late December and early              March.

  ghost <- dat5 %>% filter(Fish.PITNum == "982126057446366")
    # select the potential ghost tag from "dat5"
  unique(ghost$Date)

##  [1] "06/18/2022" "06/20/2022" "06/23/2022" "06/25/2022" "06/26/2022"
##  [6] "07/03/2022" "07/04/2022" "07/05/2022" "07/06/2022" "07/14/2022"

    # review detection dates for potential ghost tag

PIT #982126057446366 was detected 413 times across midsummer days. The tag could correspond with a fish that lingered around antennas on different days.

– Detection Efficiency –

Estimate detection efficiency, or how many fish are detected by one antenna and not both. This will be done following protocol from California Sea Grant’s Russian River Coho Monitoring Program, shown on slide 17 of the linked Power Point presentation. slides - https://caseagrant.ucsd.edu/sites/default/files/2019SRF_PITFieldTour_Presentation_1.pdf

  dat6 <- dat5 %>% count(., Fish.PITNum, Antenna)
    # count the number of detections per PIT number for each antenna
  d.eff <- dat6 %>% group_by(Antenna) %>% summarize(Proportion = n)

## `summarise()` has grouped output by 'Antenna'. You can override using the
## `.groups` argument.

    # calculate the proportion of detections at antenna 1 relative to antenna 2

– Compiling Datasets –

Step 1: Select only the first and last detection time per PIT tag number, antenna, and date.

  dat6 <- setDT(dat5)[, .(FirstDetection = min(Time),
                          # only select first detections
                         LastDetection = max(Time)), 
                          # and last detections
                         by = .(Antenna, Fish.PITNum, Date)]
                          # per antenna, PIT #, & date

Step 2: Combine PIT tag numbers with more than two detections with those detected once and twice.
Step 3: Combine detections from “Caltrout queries 050823.xlsx” Excel workbook with previously compiled data that was reviewed in a combination of R Studio, Excel, and Google Sheets.

– Information on Markdown Notebooks –

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

Pescadero PIT Detection Data Review - CalTrout