Estimating harbor seal abundance from Virginia camera trap surveys, version 2

Author

Len Thomas, Michelle Guins and Deanna Rees

Published

March 30, 2025

Acknowledgements

Thank-you to Kristen Ampela for providing the seal tag data, and to Monica DeAngelis for helping us understand it. Thank-you also to Joel Bell for his support during this project. This work was funded by the US Navy Navy Fleet Forces Command and managed by Naval Facilities Engineering Systems Command (NAVFAC) Atlantic via a contract with HDR Inc.

Note

This document is for internal use only. The analyses documented here will form the basis for a report to be submitted, via HDR, to the US Navy NAVFAC Atlantic. Currently details of analyses are included that will likely not appear (or perhaps be Appendices) in the final report. Accompanying text will be somewhat different, and figures and tables may be altered/omitted.

1 Introduction

The goal of this work is to estimate population size of harbor seals in Virginia waters using remotely operated time-lapse cameras placed at haulout locations in southeastern Virginia. Details of the camera locations and methods used to count seals from the photographs are given in Guins et al. (2025) and citations therein. Seals in general spend some time ``hauled out’’ (resting) on land; they spend the remainder of their time at sea. Here we use the camera-based counts of seals that are hauled out, and we undertake a separate analysis to determine what proportion of seals are hauled out (and hence what correction factor is needed to estimate total population size). This separate analysis comes from seals in Virginia fitted with satellite telemetry tags that record temporal information about whether the tag is wet or dry. Details of the tagging, tags and data collection are given by Ampela et al. (2023). We note that our overall approach is similar to that of Sharples et al. (2009), although they used vessel-based counts rather than camera data, and the exact analysis approach were somewhat different. (More details of the differences could be given in the HDR report.)

This document is in four parts. First we provide an analysis of the camera data. We then provide an analysis of the tag data. Third we combine the two to produce estimates of population size. We close with a discussion of the results and future potential work.

This document is written in Quarto (a multi-purpose tool for research documentation and distribution) - it is a “live” document containing both text and analysis code written in R. The HDR report will be written as a standard Word document.

2 Haulout counts from camera trap data

2.1 Reading in and aggregating data

We read in data from a series of Excel spreadsheets, extracted from the Seal Camera Trap Analysis Microsoft Access database. We use data from both the the Eastern Shore (also known as ``Little Inlet’’) and Chesapeake Bay Bridge-Tunnel (CBBT) survey sites. Cameras are in place from Oct/Nov through Apr/May each year. Data from 2019 - 2024 were analyzed, i.e., 5 seasons.

Note

Check with Michelle that I’m selecting the correct columns as the layout of each Excel file is different! In particular, I should check if I should include HO_UnID or not? - I do not at present in the cases where I manually calculate SealHO (as opposed to using TotalHO).

Code
#Base year for Julian dates
origin <- as.Date("2020-01-01")

dat_folder <- file.path("./MichelleGuinsDataAndAnalyses2")

#Read in Little Inlet data
file_names <- 
  c("Little Inlet Season 1 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "Little Inlet Season 2 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "Little Inlet Season 3 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "Little Inlet Season 4 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "TimelapseData - Little Inlet S5 FINAL with Env Data.csv.xlsx")

for(i in 1:length(file_names)){
  tmp <- read_excel(path = paste0(dat_folder, "/", file_names[i]),
                    sheet = 1, guess_max = 100000)
  if(i == 1) {
    select_cols <- c(4, 6, 7, 8, 21, 24, 26, 28, 32)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", 
                     "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    tmp2$SealsHO <- rowSums(tmp[, 10:19])
    tmp2$ID <- 1:nrow(tmp2)
  }
  if(i == 2) {
    select_cols <- c(2, 6, 8, 9, 10, 24, 27, 30, 31, 35)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("ID", "CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", 
                     "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    tmp2$SealsHO <- rowSums(tmp[, 12:23])
  }
  if(i == 3){
    select_cols <- c(4, 6, 7, 8, 21, 24, 26, 28, 30, 34)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", 
                     "SealsHO", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    #This code checks that the SealsHO in this sheet doesn't include
    # GreySeal, TaggedSeal, UnObject, etc
    # It seems to check out, so I commented out the check
    #tmp2$SealsHO2 <- rowSums(tmp[, 10:19])
    #sum(tmp2$SealsHO != tmp2$SealsHO2)
    tmp2$ID <- 1:nrow(tmp2)
  }
  if(i == 4){
    select_cols <- c(1, 5, 7, 8, 9, 21, 33, 23, 25, 27, 31)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("ID", "CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", 
                     "SealsHO", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
  }
  if(i == 5){
    select_cols <- c(4, 6, 7, 8, 22, 18, 19, 20)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", 
                     "Tide_ft", "Wind_kn", "AirTemp_f")
    #No delete flag provided, so assume all are okay
    tmp2$DeleteFlag = FALSE
    tmp2$SealsHO <- rowSums(tmp[, 9:17])
    tmp2$ID <- 1:nrow(tmp2)
  }
  #Filter out values with DeleteFlag true
  tmp2 <- tmp2[!tmp2$DeleteFlag, ]
  #Note I assume the GreySeal column is GreySealsHO and that there is no count of
  # grey seals in the water (there is for CBBT)
  #Arrange the columns
  tmp2 <- tmp2 %>% select(ID, CameraID, DateTime, SealsHO, SealsIW,
                          GreySealsHO = GreySeal, Vessel, 
                          Tide_ft, Wind_kn, AirTemp_f)
  #Filter out values with no total (I think it just occurs for i==1)
  tmp2 <- tmp2[!is.na(tmp2[, "SealsHO"]), ]
  #Filter out values with no CameraID (not sure why this happens but it did in the)
  # previous iteration
  tmp2 <- tmp2[!is.na(tmp2[, "CameraID"]), ]

  if(i ==1) {
    dat <- tmp2
  } else {
    dat <- rbind(dat, tmp2)
  }
}
dat$Site <- "LI"

#Read in CBBT ddata
file_names <- 
  c("CBBT Season 1 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "CBBT Season 2 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "CBBT Season 3 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "CBBT Season 4 Timelapse Camera Data FINAL wtih environmental.xlsx",
    "TimelapseData CBBTS5 Final w Env Data.xlsx")
for(i in 1:length(file_names)){
  tmp <- read_excel(path = paste0(dat_folder, "/", file_names[i]),
                    sheet = 1, guess_max = 100000)
  if(i == 1) {
    select_cols <- c(1, 5, 8, 11, 9, 10, 17, 20, 23, 24, 15)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("ID", "CameraID", "DateTime", "Vessel", "SealsHO", 
                        "SealsIW", "GreySeal", 
                        "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
  }
  if(i == 2) {
    select_cols <- c(5, 4, 8, 6, 7, 20, 9, 12, 13, 19)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel",  "SealsHO", 
                        "SealsIW", "GreySeal", 
                        "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    tmp2$ID <- 1:nrow(tmp2)
  }
  if(i == 3){
    select_cols <- c(4, 6, 7, 23, 8, 21, 26, 28, 30, 34)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel",  "SealsHO", 
                        "SealsIW", "GreySeal", 
                        "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    tmp2$ID <- 1:nrow(tmp2)
  }
  if(i == 4){
    select_cols <- c(4, 5, 6, 7, 8, 9, 12, 14, 16, 20)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel",  "SealsHO", 
                        "SealsIW", "GreySeal", 
                        "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    tmp2$ID <- 1:nrow(tmp2)
  }
  if(i == 5){
    select_cols <- c(4, 5, 10, 6, 7, 8, 11, 12, 13)
    tmp2 <- tmp[, select_cols]
    colnames(tmp2) <- c("CameraID", "DateTime", "Vessel",  "SealsHO", 
                        "SealsIW", "GreySeal", 
                        "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")
    #No delete flag provided, so assume all are okay
    tmp2$DeleteFlag = FALSE
    tmp2$ID <- 1:nrow(tmp2)
  }
  #Filter out values with DeleteFlag true
  tmp2 <- tmp2[!tmp2$DeleteFlag, ]
  #Note I assume the GreySeal column is GreySealsHO (in some cases it's not clear)
  #Arrange the columns
  tmp2 <- tmp2 %>% select(ID, CameraID, DateTime, SealsHO, SealsIW,
                          GreySealsHO = GreySeal, Vessel, 
                          Tide_ft, Wind_kn, AirTemp_f)
  #Filter out values with no total (I think it just occurs for i==1)
  tmp2 <- tmp2[!is.na(tmp2[, "SealsHO"]), ]
  #Filter out values with no CameraID (not sure why this happens but it did in the)
  # previous iteration
  tmp2 <- tmp2[!is.na(tmp2[, "CameraID"]), ]

  if(i ==1) {
    dat2 <- tmp2
  } else {
    dat2 <- rbind(dat2, tmp2)
  }
}
dat2$Site <- "CBBT"

#Combine LI and CBBT
dat3 <- rbind(dat, dat2)

dat3$Julian <- round(as.numeric(julian(dat3$DateTime, 
                                      origin = origin)))
#Create columns for Julian day of year, and also for hour, year and dateHour
dat3$Hour <- hour(dat3$DateTime)
dat3$Year <- year(dat3$DateTime)
#Make a unique date-hour variable
dat3$DateHour <- paste(dat3$Year, month(dat3$DateTime), day(dat3$DateTime),
                      dat3$Hour, sep = "-")

Data are available every 15 minutes in most cases, but here we group by taking the mean per hour of number of seals, tide, wind and air temp.

Code
#Take the mean over each hour for each camera
dat4 <- dat3 %>% 
  group_by(DateHour, CameraID) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO), n = n())
Code
#Worth flagging cases with n>4 - means more than 4 counts per hour - maybe some
# issue here - sems to happen in about 5% of cases
ind <- dat4$n > 4

Taking the mean for each hour, there are 109347 records (i.e., camera-hours with counts).

Note

Note that some have more than 4 records, which shouldn’t happen if there are counts every 15 minutes. There are 18213 such records and this needs investigating. I currently take the mean per hour per camera and then sum over cameras. So, as long as each image has been processed and counted, this should be fine. But it would be good to understand why there are so many cases where there are more than 4 records per hour – an indeed in some cases many more than 4. As an example, here are the top 10 cases with the highest number of repeat counts per hour - the DateHour column is year, month, day, hour on 24h clock

Code
dat4[order(dat4$n, decreasing = TRUE)[1:10], ]
# A tibble: 10 × 4
# Groups:   DateHour, CameraID [10]
   DateHour     CameraID SealsHO     n
   <chr>        <chr>      <dbl> <int>
 1 2022-4-20-17 CL4            0   106
 2 2022-4-20-17 CL8            0   106
 3 2022-4-20-17 CL2            0   104
 4 2022-4-20-17 CL3            0   104
 5 2022-4-20-17 CL6            0   104
 6 2022-4-20-17 CL7            0   102
 7 2022-4-20-18 CL2            0    88
 8 2022-4-20-18 CL6            0    88
 9 2022-4-20-19 CL2            0    88
10 2022-4-20-19 CL4            0    88

Here we sum over cameras, on the assumption that cameras do not catch the same seals – there was an active effort to avoid double-counting during the data processing. We then take only hours where 8 or more cameras were operating.

Note

I used 8 as an arbitrary number at present. We need to discuss a protocol for determining what number of operating cameras we need (or which cameras) per year to be considered a complete count.

Code
#Sum over cameras
dat5 <- dat4 %>%
  group_by(DateHour) %>%
  summarize(SealsHO = sum(SealsHO), n = n())

#Take only hours where all cameras were operating
#This assumes the maximum number of cameras operating is the expected number --
# i.e., that cameras weren't added in different years (could group by year to check)
max_n_cameras <- 8 #max(dat5$n)
dat6 <- dat5[dat5$n >= max_n_cameras, ]
dat6$DateTime <- parse_date_time(dat6$DateHour, order = "ymdH")
dat6$Hour <- hour(dat6$DateTime)
dat6$Month <- month(dat6$DateTime)
dat6$Year <- year(dat6$DateTime)
dat6$Season <- dat6$Year - 2019 + ifelse(dat6$Month > 7, 1, 0)

This results in 9704 hours of data in total.

2.2 Temporal patterns in haulout counts

As initial exploration, we fit a GAM with hour of day as a smooth (assuming for now that the mean haulout count is normally distributed).

Code
mod <- bam(SealsHO ~ s(Hour), data = dat6)
plot(mod, shift = mean(dat6$SealsHO), ylab = "Total hauled-out seals")

This suggests taking hours 0800-1200 as those have the maximum count. It could be that visibility is an issue at the beginning and end of the day. Also, note that there are fewer records (i.e., hours where there are counts from all cameras) from the beginning and end of the day. This is expected as cameras only operated during daylight hours. Here is a tabulation of the number of counts per hour:

Code
knitr::kable(table(dat6$Hour), col.names = c("Hour of day", "n counts"))
Hour of day n counts
5 1
6 267
7 780
8 821
9 826
10 833
11 834
12 833
13 832
14 832
15 830
16 832
17 662
18 312
19 165
20 44

One question was on whether the diurnal pattern changes between study seasons. Below we try some models where the smooth varies by season:

Code
mod_additive_season <- bam(SealsHO ~ 
                             s(Hour) + as.factor(Season), data = dat6)
mod_interaction_season <- bam(SealsHO ~ 
                            s(Hour, by = as.factor(Season)), data = dat6)
AIC(mod, mod_additive_season, mod_interaction_season)
                              df      AIC
mod                     6.729236 76502.84
mod_additive_season    11.110839 76329.54
mod_interaction_season 16.242570 76491.80

It looks from this that a model with an additive effect of season is the best (lowest AIC) of those tried. However, as noted by Guins et al. (2025), it will probably be more insightful to derive a variable related to sunrise/sunset and use this instead of hour in these analyses.

Note

Obtain and try a diurnally-related variable. However, some discussion is required of what metric to use since we want to account for minutes relative to sunrise and sunset. Examples could be a sun angle variable, or a more linear one that is 0 at sunrise and sunset, 1 at the half-way point between sunrise and sunset and -1 at the midpoint between sunset and sunrise, with linear functions in between. It does not matter much what is chosen if smooths are used to link the covariate to the response.

We also fit a GAM to day of the survey season, where 0 is Julian day 200 (again, assuming a normal distribution on count, so predictions can go negative).

(Note that the plot here is the estimated smooth coefficient, not a prediction, so doesn’t take account of any intercept fit in the model – if we use this in the report then we would be better to show predictions and constrain the model so it cannot produce negative predictions.)

Code
#Get day of year 
dat6$DayOfYear <- yday(dat6$DateTime)
#Transform so 1st day of surveying is day 0 - 
# may need to come back and check how it deals with leap years
offset <- min(dat6$DayOfYear[dat6$DayOfYear > 200])
dat6$DayOfSeason <- ifelse(dat6$DayOfYear > 200, dat6$DayOfYear - offset, 
                          dat6$DayOfYear + 366 - offset)

mod <- bam(SealsHO ~ s(DayOfSeason), data = dat6)
plot(mod, shift =  mean(dat6$SealsHO))

2.3 Initial estimates of abundance

To obtain initial estimates of abundance (e.g., for comparison later), we used data from between 0800 and 1200 inclusive (i.e., 5 hours), and haulout probablities for 0800, 1000 and 1200 in February and March 2022 for 5 seals read from Figure 12 of Ampella et al. (2023). Combing the camera-derived counts from the previous section with these haulout probabilities gave an abundance estimate per hour; we then took the mean per month.

A plot per month per season is given below, with vertical lines showing log-normal confidence intervals.

Code
#Take just between 8 and 12 inclusive (5 hours)
#Take mean per month
dat7 <- dat6[(dat6$Hour >= 8) & (dat6$Hour <= 12),]
dat7 <- dat7 %>% 
  group_by (Year, Month, DayOfYear) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO))
dat7 <- dat7 %>% 
  group_by (Year, Month) %>%
  summarize(.groups = "keep", sd = sd(SealsHO), SealsHO = mean(SealsHO), n = n())
dat7$se <- dat7$sd / sqrt(dat7$n)
dat7$cv <- dat7$se / dat7$SealsHO
dat7$cv[is.na(dat7$cv)] <- 0


#Analysis of tag data
#haulout probabilities for 8, 10, 12am in Feb and March 2022 for 5 seals,
# taken from Fig 12 fo Ampella et al. 2023
#Their data are more precise in time than ours, which is really just to the
# nearest hour (although we could potentially do every 15 minutes)
haul_out_p <- mean(c(0.67, 0.50, 0.30, 0.50, 0.32, 0.35))
#Ampella reported seals most likely to haul out between 04:00 and 12:00

dat8 <- data.frame(Year = dat7$Year, 
                   Month = dat7$Month,
                   Nhat = dat7$SealsHO / haul_out_p,
                   CV = dat7$cv)
#Lognormal confidence intervals, from Buckland et al. 2021
c.val <- exp(qnorm(0.975) * log(1 + dat8$CV^2)) 
dat8$LCL <- dat8$Nhat / c.val
dat8$UCL <- dat8$Nhat * c.val
dat8$Year_ind <- dat8$Year - 2019 + ifelse(dat8$Month > 7, 1, 0)

#Create information to make a plot
month_labels <- c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")
month_ind <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)
year_color <- c("blue", "black", "red", "green", "brown")
x_pos <- month_ind[dat8$Month] + (dat8$Year_ind - 2) / 10

plot(x_pos, dat8$Nhat, xaxt = "n" ,
     col = year_color[dat8$Year_ind], 
     ylim = range(c(dat8$UCL, dat8$LCL)),
     ylab = "Estimated abundance", xlab = "Month",
     pch = 1) #20)
axis(1, at = 1:8, labels = month_labels)
legend("topleft", legend = c("2019-20", "2020-21", "2021-22", "2022-23", "2023-24"),
       lty = 1, pch = 1, col = year_color)
for (i in 1:nrow(dat8)){
  lines(c(x_pos[i], x_pos[i]), c(dat8$LCL[i], dat8$UCL[i]), 
        col = year_color[dat8$Year_ind[i]], lwd = 2)
  if(i > 1) {
    if(dat8$Year_ind[i] == dat8$Year_ind[i - 1]){
      lines(c(x_pos[i - 1], x_pos[i]), 
            c(dat8$Nhat[i - 1], dat8$Nhat[i]), 
            lty = 2,
            col = year_color[dat8$Year_ind[i]])
    }
  }
}

Note that this analysis treats each hour within each day as an independent replicate; an alternative would be to take daily means and then take the mean of these over each month. We also assume the haulout probability only depends on time of day, and ignore any effect on the variance estimates of covariance induced by using the same haulout estimates in all days; we also ignore uncertainty in the haulout estimates. However, the above is a useful initial estimate, with the caveat that the uncertainty estimates should be treated with caution.

If this preliminary analysis was included in our HDR report it would need a more full discussion of the uncertainty components that are not included.

3 Haulout probability from tag data

3.1 Reading in the tag data

Here we read in the tag data. Animals were all tagged in the Little Inlet area. This is the summary file, with one line for each tag:

Code
tag_dir <- "./Seal Tag Data 2018-2022"

tag_summary <- read_excel(path = paste(tag_dir, "TagDataSummary.xlsx", sep = "/"))
tag_summary$DeploymentStartDateTime <- 
  parse_date_time(tag_summary$DeploymentStartDateTime, orders = "YmdHMS")
tag_summary$DeploymentStopDateTime <- 
  parse_date_time(tag_summary$DeploymentStopDateTime, orders = "YmdHMS")

n_files <- nrow(tag_summary)
#Deployment lat long looked to have issues, as well as deployment location,so agreed not to show
knitr::kable(tag_summary[, 1:4])
TagID TagModel DeploymentStartDateTime DeploymentStopDateTime
166449 SPLASH10- 309A 2018-02-04 2018-06-29
166450 SPOT-311A 2018-02-04 2018-05-23
166451 SPOT-311A 2018-02-04 2018-05-06
166452 SPOT-311A 2018-02-04 2018-05-26
166453 SPOT-311A 2018-02-06 2018-04-09
173502 SPOT-311A 2018-02-06 2018-06-22
173503 SPOT-311A 2018-02-08 2018-04-26
177411 SPLASH10-F 297A 2020-02-26 2020-07-12
177410 SPLASH10-F 297A 2020-03-02 2020-06-10
178255 SPLASH10-F 297A 2022-02-07 2022-06-09
178256 SPLASH10-F 297A 2022-02-08 2022-06-17
178257 SPLASH10-F 297A 2022-02-09 2022-07-18
177412 SPLASH10-F 297A 2022-02-15 2022-05-25
178258 SPLASH10-F 297A 2022-02-15 2022-06-04

Below we read in data for each tag in turn. We read in the tag location file and, for each tag plot the lat long locations, joining them with black dots. Some locations have low accuracy, so we use a simple filter which is just to use locations with quality codes 3, 2 and 1 (i.e., discarding 0, A, B and Z) – lines joining these are in red. (We also discard the first day of data as a way to partially avoid “tag-on” issues.) We show the Northern and Southern border of Virigina as black horizontal dashed lines. None of the seals spend any significant time south of Virginia, but almost all migrate North at some point in the season. Since we are interested in haulout probability just while in Virginia waters, we truncate the data at the point the seals move outside Virginia – this is shown with a large red circle on the plots (may not be shown on some plots as we truncate the plot at 38.5 degrees north, so if they moved beyond that for their first record North of Virginia waters, it will not be shown).

Code
states <- map_data("state")
#North and South latitude boundaries of Virginia
virginia_lat <- c(38.027038704500775, 36.550603468154065)
#Read in the histo files
colnames <- c("DeployID", "Date", paste0("Bin", 1:24))
for(i in 1:n_files){
  histo_filename <- paste(tag_dir, tag_summary$TagID[i], 
                      paste(tag_summary$TagID[i], "Histos.csv", sep = "-"),
                      sep = "/")
  histo_file <- read.csv(histo_filename)
  ind <- histo_file$HistType %in% c("1Percent", "Percent")
  #Save just percent rows, and just the deployID, date and hour columns
  histo_file <- histo_file[ind, colnames]
  #Make sure they are all numeric
  for(j in 1:24){
    histo_file[, colnames[j + 2]] <- as.numeric(histo_file[, colnames[j + 2]])
  }
  histo_file$Date <- parse_date_time(histo_file$Date, order = "HMS dbY")
  #Remove records from before and including the day of deployment and
  # after and including the day the tag stopped
  ind <- (histo_file$Date >= tag_summary$DeploymentStartDateTime[i] + days(1)) & 
    (histo_file$Date <= tag_summary$DeploymentStopDateTime[i] - days(1))
  histo_file <- histo_file[ind, ]
  
  #Remove records from the tag after leaving Virginia waters
  #Read in the locations
  location_filename <- paste(tag_dir, tag_summary$TagID[i], 
                             paste(tag_summary$TagID[i], "Locations.csv", sep = "-"),
                             sep = "/")
  location_file <- read.csv(location_filename)
  #Keep only dates after deployment and before the tag stopped
  location_file$Date <- parse_date_time(location_file$Date, order = "HMS dbY")
  ind <- (location_file$Date >= tag_summary$DeploymentStartDateTime[i] + days(1)) & 
    (location_file$Date <= tag_summary$DeploymentStopDateTime[i] - days(1))
  location_file <- location_file[ind, ]
  
#  #Plot these data
#  plot(location_file$Longitude, location_file$Latitude, 
#       main = paste("Tag", tag_summary$TagID[i]),
#       ylim = range(c(virginia_lat, location_file$Latitude)), ylab = "Latitude",
#       xlab = "Longitude", type = "l")
#  #Add lines showing Virginia N and S
#  abline(h = virginia_lat[1], lty = 2, col = "blue")
#  abline(h = virginia_lat[2], lty = 2, col = "blue")

  #Keep only good quality locations - those with an error radius of < 10km
  ind <- location_file$Quality %in% c("3", "2", "1") 
  location_file2 <- location_file[ind, ] %>%
    arrange(Date)
#  #Add to plot
#  lines(location_file2$Longitude, location_file2$Latitude, col = "red", lty = 2)
  
  #Find the first that is north of virginia_latitude (if any)
  ind <- location_file2$Latitude > virginia_lat[1]
  if(any(ind)){
    first <- which.max(ind)
 #   #Add a point record to show where this is
#    points(location_file2$Longitude[first], location_file2$Latitude[first])
    #remove histo records from this day or later
    ind <- histo_file$Date < location_file2$Date[first]
    histo_file <- histo_file[ind, ]
  }

  #Plot the data
  p <- ggplot(data = states) +
    geom_polygon(mapping = aes(x = long, y = lat, group = group), fill = NA, color = "grey") +
    geom_hline(yintercept = virginia_lat[1], linetype = 2) +
    geom_hline(yintercept = virginia_lat[2], linetype = 2) +
    geom_point(data = location_file, mapping = aes(x = Longitude, y = Latitude)) + 
    coord_map("albers", lat0 = 45.5, lat1 = 29.5, 
              xlim = c(-77, -74), #c(range(location_file$Longitude), 
              ylim = c(36.5, 38.5)) + #range(c(location_file$Latitude, virginia_lat))) +
    geom_line(data = location_file2, mapping = aes(x = Longitude, y = Latitude, colour = "red"), show.legend = FALSE) +
    geom_point(data = location_file2[first, ], mapping = aes(x = Longitude, y = Latitude, colour = "red", size = 2), show.legend = FALSE) +
    ggtitle(paste("Tag", tag_summary$TagID[i]))
  print(p)

  #Pivot into tag, date, hour format
  tmp <- histo_file %>% 
    pivot_longer(cols = starts_with("Bin"), names_to = "hour", values_to = "perc")
  
  #Add to overall frame
  if(i == 1){
    histo <- tmp
  } else {
    histo <- rbind(histo, tmp)
  }
}

Code
#Do some post-processing
#There is one NA in perc so remove that record
histo <- histo[!is.na(histo$perc), ]
#Add columns that will be useful
histo$hour <- as.numeric(substr(histo$hour, 4, 6)) - 1
histo$month <- month(histo$Date)
histo$DeployID <- as.factor(histo$DeployID)
histo$yday <- yday(histo$Date)

We also read in the histogram files for each seal, which summarizes the proportion of each hour that the tag was dry, using the wet-dry sensor. The table below has one row for each tag. It shows the mean percentage dry, the start day of season of the record, the end day (while still in Virginia). Also shown is the number of days between start and end day, and the number of days that have records.

Code
s_seal <- histo %>%
  group_by(DeployID) %>%
  summarise(perc = mean(perc), start_day = min(yday), end_day = max(yday),
            n_days = max(yday) - min(yday) + 1,
            n_days_with_records = length(unique(yday)))
knitr::kable(s_seal)
DeployID perc start_day end_day n_days n_days_with_records
17411 20.01852 58 89 32 18
166449 21.53571 38 92 55 35
166450 27.02534 37 104 68 28
166451 28.42157 39 91 53 34
166452 18.33929 45 94 50 21
166453 30.72756 39 87 49 26
173502 15.01641 39 106 68 33
173503 20.31373 40 115 76 17
177410 24.65972 63 81 19 12
177412 29.78968 47 81 35 21
178255 23.28686 40 84 45 26
178256 31.74038 40 60 21 13
178257 25.34722 41 65 25 15
178258 20.28526 47 100 54 39

As can be seen, there are many missing days of data. This would be problematic if the satellite tags are less likely to transmit on days when the seal spends more time in the water – this will cause a strong overestimate in the haulout estimate. Correspondence with Moncia DeAngelis indicated that there was an issue with tag design/programming that caused the gaps, but that this seems unrelated to haulout probability.

With this caveat, we proceed with the analysis. Below is a histogram of the haulout percentages across all seals. Some heaping is observed at units of 10, which indicates that some tags were programmed to record only low-resolution data (which rounds to the nearest 10 plus 3 and 98). This perhaps could be investigated some more. Overall, hours are mostly fully dry or below about 25 percent dry.

Code
hist(histo$perc, breaks = seq(-0.5, 100.5, by = 1), main = "Percentage haulout per hour",
     xlab = "Percentage")

Ampela et al. (2023) assumed a haulout event occured in an hour only if the tag was 100 percent dry for that hour, but here we take 50 percent as a cutoff. As can be seen from the above histogram, results will not be very sensitive to this and indeed any number between around 30 and 100 could be used. We chose 50 percent as this presumably balances the risk of under- and over-estimation of a haulout event during an hour (recalling that the count data are taken every 15 minutes).

Below we fit a binary GAM, modelling whether a seal is hauled out or not as a function of hour of the day (as a cyclic smooth), day of the year and also a random effect for animal. Note that hour of day here is in UTC, so we need to subtract 5 to arrive at local time (depending on local daylight saving this may be 4 hours less at some times of year).

Note also that this analysis could be extended to allow autocorrelation in haulout probability, by examining correlation in residuals and, if suitable, using the order 1 correlation as the rho parameter in the bam call (and also setting an AR.start for each tag record, accounting potentially for missing days also).

Code
#Count any hour with >50% dry as a haulout
histo$haul_out <- histo$perc > 50
#Specify that 0 and 24 are the same for hour by giving end knots (otherwise it joins 0 and 23)
mod <- bam(haul_out ~ s(hour, bs = "cc") + s(yday) + s(DeployID, bs = "re"), 
           knots = list(hour = c(0, 24)), 
           data = histo, family = binomial)
summary(mod)

Family: binomial 
Link function: logit 

Formula:
haul_out ~ s(hour, bs = "cc") + s(yday) + s(DeployID, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58995    0.06086  -26.13   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq  p-value    
s(hour)     4.660  8.000 229.65  < 2e-16 ***
s(yday)     7.591  8.315 126.63  < 2e-16 ***
s(DeployID) 9.402 13.000  36.27 7.09e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.0595   Deviance explained = 6.11%
fREML =  11526  Scale est. = 1         n = 8111
Code
plot(mod, pages = 0, scale = 0)

For hour, there is a peak at 1300 UTC, which coincides with that 0800 local time peak noted by Ampela et al. (2023). We found the low point to be 0300 UTC, i.e., 2200 local time while Ampela et al. (2023) for 2022 data found it to be 1600 local time. It would be of interest to investigate potential annual differences, although there is strong confounding between seal ID and year (i.e., we don’t have seals tagged across multiple seasons), so it likely is not possible to include both because one cannot distinguish a season effect from a between-seal difference.

Note that there are few records from later in the year (given the truncation that only records from while the seals were in Virginia were used):

Code
hist(histo$yday, main = "Number of hours of wet-dry data per day", 
     xlab = "Day of year", breaks = seq(min(histo$yday) - 0.5, max(histo$yday) + 0.5, by = 1)) 

The pattern in the above GAM with year is hard to interpret, and it would be of interest to include variables such as moon phase, for example. In the absence of this we constrain the GAM to have fewer degrees of freedom for that smooth, to make a more interpretable pattern:

Code
mod2 <- bam(haul_out ~ s(hour, bs = "cc") + s(yday, k = 3) + s(DeployID, bs = "re"), 
           knots = list(hour = c(0, 24)), 
           data = histo, family = binomial)
summary(mod2)

Family: binomial 
Link function: logit 

Formula:
haul_out ~ s(hour, bs = "cc") + s(yday, k = 3) + s(DeployID, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.568      0.061   -25.7   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq  p-value    
s(hour)     4.652  8.000 227.84  < 2e-16 ***
s(yday)     1.844  1.968  60.36  < 2e-16 ***
s(DeployID) 9.552 13.000  38.61 5.92e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.0493   Deviance explained = 5.07%
fREML =  11541  Scale est. = 1         n = 8111
Code
plot(mod2, pages = 0, scale = 0)

We now see a general decline in haulout probability over the year.

Here we make predictions of haulout probability by month and time of day. We only have tagging data for February, March and April so we only make predictions over those months. For April (starting on day-of-year 91) we have rather few data points, so we do not attempt that prediction. (In future versions of this analysis we could perhaps predict by day rather than aggregating across months.)

Code
newdata <- data.frame(hour = 0, yday = c(1:28 + 31, 1:31 + (31 + 28)),
                      month = c(rep(2, 28), rep(3, 31)))
pred_haul_out <- matrix(0, 24, 2)
for(i in 1:24){
  newdata$hour <- i - 1
  #Set random effect to 0 in prediction
  tmp <- predict(mod2, newdata = newdata, type = "response", 
                 exclude = "s(DeployID)", newdata.guaranteed = TRUE)
  tmp2 <- aggregate(tmp, by = list(newdata$month), mean)
  pred_haul_out[i, ] <- tmp2$x
}
plot(0:23, pred_haul_out[, 1], ylim = range(pred_haul_out), xlab = "hour (UTC)",
     ylab = "p(haulout)", type = "l")
lines(0:23, pred_haul_out[, 2], col = "red")
legend("topleft", legend = c("Feb", "March"), lty = c(1, 1), col = c("black", "red"))

Plotting the same in local time:

Code
lhour_cols <- c(6:24, 1:5)
plot(0:23, pred_haul_out[lhour_cols, 1], ylim = range(pred_haul_out), xlab = "hour (local)",
     ylab = "p(haulout)", type = "l")
lines(0:23, pred_haul_out[lhour_cols, 2], col = "red")
legend("topright", legend = c("Feb", "March"), lty = c(1, 1), col = c("black", "red"))

Note that there is an alternative source of information about haulout events available for the SPLASH-10 tags deployed since 2020 – these have a haulout file that records the start and end time of haulout events, and also gives an indication of any missing haul out events (haulout ID cycles from 0 to 15 so missing numbers mean missing haulout events). Results from this could potentially be compared with the above, at least for the 7 SPASH-10 tags used.

4 Combined analysis to estimate abundance

Here we use count and tagging data from 0800 to 1600 local time (before and after that the amount of available data drops off, due to change in sunrise-sunset times over the season). We also only use data for February and March as that is all we predict haulout probability for (in a future analysis we should consider predicting by day rather than month, and could therefore use more of the available data).

We make no attempt at variance estimation here - we will want to build in uncertainty from the haulout analysis, and also determine what an appropriate sampling unit is for the counts.

Code
#Take just between 8 and 16 inclusive (9 hours)
#Take mean per month
cdat <- dat6[(dat6$Hour >= 8) & (dat6$Hour <= 16),]
cdat <- cdat[(cdat$Month %in% c(2, 3)), ]
cdat <- cdat %>% 
  group_by (Year, Month, Hour) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO))

#Make predicted haul out data frame for joining
cpred_haul_out <- data.frame(Hour = c(0:23, 0:23), Month = c(rep(2, 24), rep(3, 24)),
                             haul_out_p = c(pred_haul_out[lhour_cols, 1],
                                            pred_haul_out[lhour_cols, 2]))
cdat2 <- left_join(cdat, cpred_haul_out, by = c("Month", "Hour"))
cdat2$Nhat <- cdat2$SealsHO / cdat2$haul_out_p

#Combine across hours to produce similar plot to previous one
cdat3 <- cdat2 %>%
  group_by(Year, Month) %>%
  summarize(.groups = "keep", Nhat = mean(Nhat))
cdat3$Year_ind <- cdat3$Year - 2019 + ifelse(cdat3$Month > 7, 1, 0)

#Create information to make a plot
month_labels <- c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")
month_ind <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)
year_color <- c("blue", "black", "red", "green", "brown")
x_pos <- month_ind[cdat3$Month] #+ (cdat3$Year_ind - 2) / 10


plot(x_pos, cdat3$Nhat, xaxt = "n" ,
     col = year_color[cdat3$Year_ind],
     ylab = "Estimated abundance", xlab = "Month",
     pch = 1) #20)
axis(1, at = 1:8, labels = month_labels)
legend("topright", legend = c("2019-20", "2020-21", "2021-22", "2022-23", "2023-24"),
       lty = 1, pch = 1, col = year_color)
for (i in 1:nrow(cdat3)){
  if(i > 1) {
    if(cdat3$Year_ind[i] == cdat3$Year_ind[i - 1]){
      lines(c(x_pos[i - 1], x_pos[i]), 
            c(cdat3$Nhat[i - 1], cdat3$Nhat[i]), 
            lty = 2,
            col = year_color[cdat3$Year_ind[i]])
    }
  }
}

Code
#Take mean across years and months to see if there's an hourly pattern
#Needs more thinking before implemented however
#cdat4 <- cdat2 %>%
#  group_by(Hour) %>%
#  summarize(.groups = "keep", Nhat = mean(Nhat))
#plot(cdat4$Hour, cdat4$Nhat, type = "l")

Estimates are generally a little higher than the exploratory analysis shown in section Section 2.3. Correcting for decreased haulout probability in March compared with February has increased the latter numbers somewhat.

Note

One check that needs to be made is to see whether estimated population abundance changes over hour of day – this would imply that the hourly haulout correction has not been successful.

5 Discussion

There are a number of oustanding items to address with this analysis.

  • For the haulout counts
    • Models could include additional covariates such as tide - although this is not needed here and is the subject of Michelle Guin’s thesis
    • Variance estimation needs to be addressed. Options include a non-parametric bootstrap, or the delta method for combining camera and tagging data outputs.
      • One issue if we went to a finer time scale (next point) would be serial correlation in estimates over time. In general, autocorrelation is an issue that needs to be considered.
    • The temporal unit could be day rather than month
  • For the tag analysis
    • We are really just interested in haulouts that occur where the animals can be seen from the cameras, so we need to check where haulouts are taking place by cross-referending the animal locations with the wet-dry tag results, if possible
    • Other covariates could be incorporated into the analysis – these should be covariates that we can use to predict haul-out probability at the camera sites, so perhaps could be tide at those sites, time relative to sunrise, etc - i.e., the same covariates we use for the camera data
    • The haulout file could be mined for the SPLASH-10 tags, and results compared with haulout probabilities from the hourly data.
  • For the combined analysis
    • If the hauled-out probability by hour is estimated well then there should be no hourly pattern in the corrected count data; this can be checked.

Assumptions will need to be discussed:

  • Hourly tag data gaps are missing at random with respect to haulout time
  • Cameras cover all haulout locations at both the Eastern Shore and CBBT survey sites
  • Animals do not haul out elsewhere while in Virginia
    • This assumption could partly be examined by combining the wet-dry data and location data from the tags – although these are at different tempral resolutions
  • Counts from cameras have no false positives or negative

Literature cited

Ampela, K., J. Bort, R. DiGiovanni, Jr., A. Deperte, D. Jones, and D. Rees. 2023. Seal Tagging and Tracking in Virginia: 2018-2022. Prepared for U.S. Fleet Forces Command. Submitted to Naval Facilities Engineering Systems Command Atlantic, Norfolk, Virginia, under Contract No. N62470-15-8006, Task Order 19F4147, issued to HDR, Inc., Virginia Beach, Virginia. March 2023.

Guins, M.S., and D.R. Rees. 2025. Pinniped Time-lapse Camera Surveys in Chesapeake Bay and Eastern Shore, Virginia: 2022-2024. Final Report. Prepared for U.S. Fleet Forces Command, Norfolk, Virginia. January 2025.

Sharples, R.J., Mackenzie, M.L. and Hammond, P.S. 2009. Estimating seasonal abundance of a central place forager using counts and telemetry data. Marine Ecology Progress Series 378: 289–298.