Estimating harbor seal abundance from Virginia camera trap surveys

Author

Len Thomas

Published

September 29, 2024

Acknowledgements

Thank-you to Deanna Rees and Michelle Guins for discussions and suggestions; thank-you to Kristen Ampela for providing the seal tag data. This work was funded by the US Navy NAVFAC Atlantic via a contract with HDR Inc.

1 Introduction

The goal of this work is to estimate population size of harbor seals in Virginia waters using remotely operated time-lapse cameras placed at haulout locations in southeastern Virginia. Details of the camera locations and methods used to count seals from the photographs are detailed in Guins et al. (2023) and citations therein. Only hauled out seals are counted, and so an important component of estimating total population size is to estimate the proportion of seals hauled out. Here, this is derived from an analysis of seals in Virginia fitted with transponder tags that record temporal information about whether the tag is wet or dry. Details of the tagging, tags and data collection are given by Ampela et al. (2023). We note that this overall approach is similar in spirit to that of Sharples et al. (2009), although they used ship-based counts rather than camera data, and the exact analysis approach were somewhat different.

This document is in four parts. First we provide an analysis of the camera trap data. We then provide an analysis of the tag data. Third we combine the two to produce estimates of population size. We close with a discsusion of the results and future potential work.

This document is written in Quarto - it is a “live” document containing both text and analysis code written in R. It is designed to form the basis for a final report on this work that may be coauthored by others including those currently listed in the Acknowledgements. Currently details of analyses are included that will likely not appear (or perhaps be Appendices) in the final report.

2 Haul-out counts from camera trap data

2.1 Reading in and aggregating data

We read in data from the Seal Camera Trap Analysis Microsoft Access database. We use data from both the Little Inlet and Chesapeake Bay Bridge-Tunnel (CBBT) sites.

Code
#Base year for Julian dates
origin <- as.Date("2020-01-01")

db <- file.path("./Seal Camera Trap Analysis (2).accdb")
con <- odbcConnectAccess2007(db)
#This lists what tables are there:
#sqlTables(con)$TABLE_NAME


#Read in Little Inlet data
table_names <- c("TimelapseData - Little Inlet 2019/2020 (S1)",
                 "TimelapseData - Little Inlet 2020/2021 (S2)",
                 "TimelapseData - Little Inlet 2021/2022 (S3)",
                 "TimelapseData - Little Inlet 2022/2023 (S4)")
for(i in 1:length(table_names)){
  tmp <- sqlFetch(con, table_names[i])
  #Just checking TotalHO is indeed the sum of the sites
  #Note that there are different column names in different years, and that in
  # year 4 it include UnID - so presumably an unidentified site?
  #TotalHO2 <- tmp$HO_A + tmp$HO_B + tmp$HO_C1 + tmp$HO_C2 + tmp$HO_C3 + 
  #  tmp$HO_E1 + tmp$HO_E2 + tmp$HO_E3 + tmp$HO_F + tmp$HO_UnID
  #Note - I am using Total columns -- need to check that corresponds with animals
  # in wter and not those on land
  #Select just the columns I'll need, and rename
  select_cols <- c(1, 5, 7, 35, 9, 8, 22, 26, 28, 30)
  if(i == 2) select_cols <- c(1, 6, 8, 37, 10, 9, 24, 28, 29, 32)
  if(i == 3) select_cols <- c(1, 5, 7, 25, 9, 8, 22, 28, 30, 32)
  if(i == 4) select_cols <- c(1, 5, 7, 8, 11, 9, 22)
  tmp <- tmp[, select_cols]
  #No tide, wind or AirTemp for i == 4, so add filler columns
  if (i == 4) {tmp$Tide = NA; tmp$Wind <- NA; tmp$AirTemp <- NA}
  #Note I assume the GreySeal column is GreySealsHO and that there is no count of
  # grey seals in the water (there is for CBBT)
  colnames(tmp) <- c("ID", "CameraID", "DateTime", "SealsHO", "SealsIW", 
                           "Vessel", "GreySealsHO", "Tide", "Wind",
                           "AirTemp")
  #Filter out values with no total (I think it just occurs for i==1)
  tmp <- tmp[!is.na(tmp[, "SealsHO"]), ]
  #Filter out values with no CameraID (not sure why this happens but it does)
  tmp <- tmp[!is.na(tmp[, "CameraID"]), ]

  if(i ==1) {
    dat <- tmp
  } else {
    dat <- rbind(dat, tmp)
  }
}
dat$Site <- "LI"

#Read in CBBT ddata
table_names <- c("TimelapseData - CBBT 2019/2020 (S1)", 
                 "TimelapseData - CBBT 2020/2021 (S2)",
                 "TimelapseData - CBBT 2021/2022 (S3)",
                 "TimelapseData - CBBT 2022/2023 (S4)")
for(i in 1:length(table_names)){
  tmp <- sqlFetch(con, table_names[i])
  if(i == 1) 
    #Yr 1 - filter out values with no Record IDs
    tmp <- tmp[!is.na(tmp[, "Record ID"]), ]
  #Remove those that are due to be deleted
  tmp <- tmp[tmp$DeleteFlag == 0, ]
  #Select just the columns I'll need, and rename
  select_cols <- c(1, 6, 9, 10, 11, 12, 18, 22, 23, 26)
  if(i == 2) select_cols <- c(1, 6, 5, 7, 8, 9, 21, 11, 12, 15)
  if(i == 3) select_cols <- c(1, 6, 5, 7, 8, 11, 10, 14, 15, 18)
  if(i == 4) select_cols <- c(1, 5, 6, 7, 8, 12, 10)
  tmp <- tmp[, select_cols]
  #No tide, wind or AirTemp for i == 4, so add filler columns
  if (i == 4) {tmp$Tide = NA; tmp$Wind <- NA; tmp$AirTemp <- NA}
  colnames(tmp) <- c("ID", "CameraID", "DateTime", "SealsHO", "SealsIW", 
                     "Vessel", "GreySealsHO", "Tide", "Wind",
                     "AirTemp")
  if(i ==1) {
    dat2 <- tmp
  } else {
    dat2 <- rbind(dat2, tmp)
  }
}
dat2$Site <- "CBBT"

#Combine LI and CBBT
dat3 <- rbind(dat, dat2)

dat3$Julian <- round(as.numeric(julian(dat3$DateTime, 
                                      origin = origin)))
#Create columns for Julian day of year, and also for hour, year and dateHour
dat3$Hour <- hour(dat3$DateTime)
dat3$Year <- year(dat3$DateTime)
#Make a unique date-hour variable
dat3$DateHour <- paste(dat3$Year, month(dat3$DateTime), day(dat3$DateTime),
                      dat3$Hour, sep = "-")

Data are available every 15 minutes in most cases, but here we group by taking the mean each hour.

Code
#Take the mean over each hour for each camera
dat4 <- dat3 %>% 
  group_by(DateHour, CameraID) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO), n = n())
Code
#Worth flagging cases with n>4 - means more than 4 counts per hour - maybe some
# issue here - sems to happen in about 5% of cases
ind <- dat4$n > 4
#There are some where it's between 1 and 8 -- I guess this is where 
# there may have been 5 15-minute periods in an hour
# hist(dat4$n[dat4$n > 0 & dat4$n < 12])
#Perhaps some of the rest are duplicates -- needs checking at some point

There are 90247 records (i.e., camera-hours with counts). Note that some have more than 4 records, which shouldn’t happen if there are counts every 15 minutes. There are 2483 such records and this needs investigating at some point.

Here we sum over cameras, on the assumption that cameras do not catch the same seals. We then take only hours where all cameras were operating.

Code
#Sum over cameras
dat5 <- dat4 %>%
  group_by(DateHour) %>%
  summarize(SealsHO = sum(SealsHO), n = n())

#Take only hours where all cameras were operating
#This assumes the maximum number of cameras operating is the expected number --
# i.e., that cameras weren't added in different years (could group by year to check)
max_n_cameras <- max(dat5$n)
dat6 <- dat5[dat5$n == max_n_cameras, ]
dat6$DateTime <- parse_date_time(dat6$DateHour, order = "ymdH")
dat6$Hour <- hour(dat6$DateTime)
dat6$Month <- month(dat6$DateTime)
dat6$Year <- year(dat6$DateTime)

This results in 4929 hours of data in total.

2.2 Temporal patterns in haul-out counts

As initial exploration, we fit a GAM with hour of day as a smooth (assuming for now that the mean haul-out count is normally distributed).

Code
mod <- bam(SealsHO ~ s(Hour), data = dat6)
plot(mod, shift = mean(dat6$SealsHO), ylab = "Total hauled-out seals")

This suggests taking hours 8-12 as those have the maximum count. It could be that visibility is an issue at the beginning and end of the day. Also, note that there are fewer records (i.e., hours where there are counts from all cameras) from the beginning and end of the day - here is a tabulation of the number of counts per hour:

Code
knitr::kable(table(dat6$Hour), col.names = c("Hour of day", "n counts"))
Hour of day n counts
6 80
7 349
8 446
9 450
10 456
11 457
12 457
13 458
14 459
15 461
16 459
17 352
18 45

We also fit a GAM to day of the survey season, where 0 is Julian day 200 (again, assuming a normal distribution on count, so predictions can go negative):

Code
#Get day of year 
dat6$DayOfYear <- yday(dat6$DateTime)
#Transform so 1st day of surveying is day 0 - 
# may need to come back and check how it deals with leap years
offset <- min(dat6$DayOfYear[dat6$DayOfYear > 200])
dat6$DayOfSeason <- ifelse(dat6$DayOfYear > 200, dat6$DayOfYear - offset, 
                          dat6$DayOfYear + 366 - offset)

mod <- bam(SealsHO ~ s(DayOfSeason), data = dat6)
plot(mod, shift =  mean(dat6$SealsHO))

2.3 Initial estimates of abundance

To obtain initial estimates of abundance for the NAVFAC meeting in April 2024, we used data from between 8am and 12pm inclusive (i.e., 5 hours), and haul-out probablities for 8, 10 and 12 in February and March 2022 for 5 seals read from Figure 12 of Ampella et al. (2023). Combing these gave an abundance estimate per hour; we then took the mean per month.

A plot per month per year is given below, with vertical lines showing log-normal confidence intervals.

Code
#Take just between 8 and 12 inclusive (5 hours)
#Take mean per month
dat7 <- dat6[(dat6$Hour >= 8) & (dat6$Hour <= 12),]
dat7 <- dat7 %>% 
  group_by (Year, Month, DayOfYear) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO))
dat7 <- dat7 %>% 
  group_by (Year, Month) %>%
  summarize(.groups = "keep", sd = sd(SealsHO), SealsHO = mean(SealsHO), n = n())
dat7$se <- dat7$sd / sqrt(dat7$n)
dat7$cv <- dat7$se / dat7$SealsHO
dat7$cv[is.na(dat7$cv)] <- 0


#Analysis of tag data
#Haul-out probabilities for 8, 10, 12am in Feb and March 2022 for 5 seals,
# taken from Fig 12 fo Ampella et al. 2023
#Their data are more precise in time than ours, which is really just to the
# nearest hour (although we could potentially do every 15 minutes)
haul_out_p <- mean(c(0.67, 0.50, 0.30, 0.50, 0.32, 0.35))
#Ampella reported seals most likely to haul out between 04:00 and 12:00

dat8 <- data.frame(Year = dat7$Year, 
                   Month = dat7$Month,
                   Nhat = dat7$SealsHO / haul_out_p,
                   CV = dat7$cv)
#Lognormal confidence intervals, from Buckland et al. 2021
c.val <- exp(qnorm(0.975) * log(1 + dat8$CV^2)) 
dat8$LCL <- dat8$Nhat / c.val
dat8$UCL <- dat8$Nhat * c.val
dat8$Year_ind <- dat8$Year - 2019 + ifelse(dat8$Month > 7, 1, 0)

#Create information to make a plot
month_labels <- c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")
month_ind <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)
year_color <- c("blue", "black", "red", "green")
x_pos <- month_ind[dat8$Month] + (dat8$Year_ind - 2) / 10

plot(x_pos, dat8$Nhat, xaxt = "n" ,
     col = year_color[dat8$Year_ind], 
     ylim = range(c(dat8$UCL, dat8$LCL)),
     ylab = "Estimated abundance", xlab = "Month",
     pch = 1) #20)
axis(1, at = 1:8, labels = month_labels)
legend("topleft", legend = c("2019-20", "2020-21", "2021-22", "2022-23"),
       lty = 1, pch = 1, col = year_color)
for (i in 1:nrow(dat8)){
  lines(c(x_pos[i], x_pos[i]), c(dat8$LCL[i], dat8$UCL[i]), 
        col = year_color[dat8$Year_ind[i]], lwd = 2)
  if(i > 1) {
    if(dat8$Year_ind[i] == dat8$Year_ind[i - 1]){
      lines(c(x_pos[i - 1], x_pos[i]), 
            c(dat8$Nhat[i - 1], dat8$Nhat[i]), 
            lty = 2,
            col = year_color[dat8$Year_ind[i]])
    }
  }
}

Note that this analysis treats each hour within each day as an independent replicate; an alternative would be to take daily means and then take the mean of these over each month. We also assume the haul-out probability only depends on time of day, and ignore covariance induced by using the same haul-out estimates in all days; we also ignore uncertainty in the haul-out estimates. However, the above is a useful initial estimate.

3 Haul-out probability from tag data

3.1 Reading in the tag data

Here we read in the tag data. This is the summary file, with one line for each tag:

Code
tag_dir <- "./Seal Tag Data 2018-2022"

tag_summary <- read_excel(path = paste(tag_dir, "TagDataSummary.xlsx", sep = "/"))
tag_summary$DeploymentStartDateTime <- 
  parse_date_time(tag_summary$DeploymentStartDateTime, orders = "YmdHMS")
tag_summary$DeploymentStopDateTime <- 
  parse_date_time(tag_summary$DeploymentStopDateTime, orders = "YmdHMS")

n_files <- nrow(tag_summary)
knitr::kable(tag_summary)
TagID TagModel DeploymentStartDateTime DeploymentStopDateTime DeploymentLocation DeploymentLatitude DeploymentLongitude
166449 SPLASH10- 309A 2018-02-04 2018-06-29 Little Inlet, VA 37.18174 -75.84099
166450 SPOT-311A 2018-02-04 2018-05-23 Little Inlet, VA 37.18174 -75.84099
166451 SPOT-311A 2018-02-04 2018-05-06 Little Inlet, VA 37.18178 -75.84099
166452 SPOT-311A 2018-02-04 2018-05-26 Magothy Bay, VA 37.17702 -75.94940
166453 SPOT-311A 2018-02-06 2018-04-09 Magothy Bay, VA 37.17702 -75.94940
173502 SPOT-311A 2018-02-06 2018-06-22 Magothy Bay, VA 37.17702 -75.94940
173503 SPOT-311A 2018-02-08 2018-04-26 Little Inlet, VA 37.18198 -75.84065
177411 SPLASH10-F 297A 2020-02-26 2020-07-12 Smith Island, VA 37.11751 -75.84932
177410 SPLASH10-F 297A 2020-03-02 2020-06-10 Mink Island Bay, VA 37.17762 -75.84961
178255 SPLASH10-F 297A 2022-02-07 2022-06-09 Little Inlet, VA 37.18174 -75.84099
178256 SPLASH10-F 297A 2022-02-08 2022-06-17 Little Inlet, VA 37.18174 -72.84099
178257 SPLASH10-F 297A 2022-02-09 2022-07-18 Little Inlet, VA 37.18174 -75.84099
177412 SPLASH10-F 297A 2022-02-15 2022-05-25 Little Inlet, VA 37.18174 -75.84099
178258 SPLASH10-F 297A 2022-02-15 2022-06-04 Little Inlet, VA 37.18174 -75.84099

Below we read in data for each tag in turn. We read in the tag location file and, for each tag plot the lat long locations, joining them with a black line. Some locations have low accuracy, so we use a simple filter which is just to use locations with quality codes 3, 2, 1 and A (i.e., discarding B and Z) – lines joining these are in dashed red. (We also discard the first day of data as a way to partially avoid “tag-on” issues.) We also show the Northern and Southern border of Virigina as blue horizontal dashed lines. None of the seals spend any significant time south of Virginia, but almost all migrate North at some point in the season. Since we are interested in haul-out probability just while in Virginia waters, we truncate the data at the point the seals move outside Virginia – this is shown with a cricle on the plots.

Code
#North and South latitude boundaries of Virginia
virginia_lat <- c(38.027038704500775, 36.550603468154065)
#Read in the histo files
colnames <- c("DeployID", "Date", paste0("Bin", 1:24))
for(i in 1:n_files){
  histo_filename <- paste(tag_dir, tag_summary$TagID[i], 
                      paste(tag_summary$TagID[i], "Histos.csv", sep = "-"),
                      sep = "/")
  histo_file <- read.csv(histo_filename)
  ind <- histo_file$HistType %in% c("1Percent", "Percent")
  #Save just percent rows, and just the deployID, date and hour columns
  histo_file <- histo_file[ind, colnames]
  #Make sure they are all numeric
  for(j in 1:24){
    histo_file[, colnames[j + 2]] <- as.numeric(histo_file[, colnames[j + 2]])
  }
  histo_file$Date <- parse_date_time(histo_file$Date, order = "HMS dbY")
  #Remove records from before and including the day of deployment and
  # after and including the day the tag stopped
  ind <- (histo_file$Date >= tag_summary$DeploymentStartDateTime[i] + days(1)) & 
    (histo_file$Date <= tag_summary$DeploymentStopDateTime[i] - days(1))
  histo_file <- histo_file[ind, ]
  
  #Remove records from the tag after leaving Virginia waters
  #Read in the locations
  location_filename <- paste(tag_dir, tag_summary$TagID[i], 
                             paste(tag_summary$TagID[i], "Locations.csv", sep = "-"),
                             sep = "/")
  location_file <- read.csv(location_filename)
  #Keep only dates after deployment and before the tag stopped
  location_file$Date <- parse_date_time(location_file$Date, order = "HMS dbY")
  ind <- (location_file$Date >= tag_summary$DeploymentStartDateTime[i] + days(1)) & 
    (location_file$Date <= tag_summary$DeploymentStopDateTime[i] - days(1))
  location_file <- location_file[ind, ]
  
  #Plot these data
  plot(location_file$Longitude, location_file$Latitude, 
       main = paste("Tag", tag_summary$TagID[i]),
       ylim = range(c(virginia_lat, location_file$Latitude)), ylab = "Latitude",
       xlab = "Longitude", type = "l")
  #Add lines showing Virginia N and S
  abline(h = virginia_lat[1], lty = 2, col = "blue")
  abline(h = virginia_lat[2], lty = 2, col = "blue")

  #Keep only good quality locations - those with an error radius of < 10km
  ind <- location_file$Quality %in% c("3", "2", "1", "A") #location_file$Error.radius <= 10000
  location_file <- location_file[ind, ]
  #Add to plot
  lines(location_file$Longitude, location_file$Latitude, col = "red", lty = 2)
  
  #Find the first that is north of virginia_latitude (if any)
  ind <- location_file$Latitude > virginia_lat[1]
  if(any(ind)){
    first <- which.max(ind)
    #Add a point record to show where this is
    points(location_file$Longitude[first], location_file$Latitude[first])
    #remove histo records from this day or later
    ind <- histo_file$Date < location_file$Date[first]
    histo_file <- histo_file[ind, ]
  }

  #Pivot into tag, date, hour format
  tmp <- histo_file %>% 
    pivot_longer(cols = starts_with("Bin"), names_to = "hour", values_to = "perc")
  
  #Add to overall frame
  if(i == 1){
    histo <- tmp
  } else {
    histo <- rbind(histo, tmp)
  }
}

Code
#Do some post-processing
#There is one NA in perc so remove that record
histo <- histo[!is.na(histo$perc), ]
#Add columns that will be useful
histo$hour <- as.numeric(substr(histo$hour, 4, 6)) - 1
histo$month <- month(histo$Date)
histo$DeployID <- as.factor(histo$DeployID)
histo$yday <- yday(histo$Date)

We also read in the histogram files for each seal, which summarizes the proportion of each hour that the tag was dry, using the wet-dry sensor. The table below has one row for each tag. It shows the mean percentage dry, the start day of year of the record, the end day (while still in Virginia). Also shown is the number of days between start and end day, and the number of days that have records.

Code
s_seal <- histo %>%
  group_by(DeployID) %>%
  summarise(perc = mean(perc), start_day = min(yday), end_day = max(yday),
            n_days = max(yday) - min(yday) + 1,
            n_days_with_records = length(unique(yday)))
knitr::kable(s_seal)
DeployID perc start_day end_day n_days n_days_with_records
17411 20.01852 58 89 32 18
166449 21.53571 38 92 55 35
166450 27.02534 37 104 68 28
166451 28.42157 39 91 53 34
166452 18.33929 45 94 50 21
166453 30.72756 39 87 49 26
173502 15.01641 39 106 68 33
173503 20.31373 40 115 76 17
177410 24.65972 63 81 19 12
177412 29.78968 47 81 35 21
178255 23.28686 40 84 45 26
178256 31.74038 40 60 21 13
178257 25.34722 41 65 25 15
178258 20.28526 47 100 54 39

As can be seen, there are many missing days of data. This is potentially problematic if the satellite tags are less likely to transmit on days when the seal spends more time in the water – this will cause a strong overestimate in the haul-out estiamte.

With this caveat, we proceed with the analysis. Below is a histogram of the haul-out percentages across all seals. Some heaping is observed at units of 10, which indicates that some tags were programmed to record only low-resolution data (which rounds to the nearest 10 plus 3 and 98). This could be investigated some more. Overall, hours are mostly fully dry or below about 25 percent dry.

Code
hist(histo$perc, breaks = seq(-0.5, 100.5, by = 1), main = "Percentage haulout per hour",
     xlab = "Percentage")

Ampela et al. (2023) assumed a haul-out occured in an hour only if the tag was 100 percent dry for that hour, but here we take 50 percent as a cutoff. As can be seen from the above, results will not be very sensitive to this and indeed any number between around 30 and 100 could be used. We chose 50 percent as this presumably balances the risk of under- and over-estimation of haul-out during the hour (recalling that the count data is taken every 15 minutes).

Below we fit a binary GAM, modelling whether hauled out or not as a function of hour of the day (as a cyclic smooth), day of the year and also a random effect for animal. Note that hour of day here is in UTC, so we need to subtract 5 to arrive at local time (depending on local daylight saving this may be 4 at some times of year).

Note also that this analysis could be extended to allow autocorrelation in haulout probability, by examining correlation in residuals and, if suitable, using the order 1 correlation as the rho parameter in the bam call (and also setting an AR.start for each tag record, accounting potentially for missing days also).

Code
#Count any hour with >50% dry as a haul-out
histo$haul_out <- histo$perc > 50
#Specify that 0 and 24 are the same for hour by giving end knots (otherwise it joins 0 and 23)
mod <- bam(haul_out ~ s(hour, bs = "cc") + s(yday) + s(DeployID, bs = "re"), 
           knots = list(hour = c(0, 24)), 
           data = histo, family = binomial)
summary(mod)

Family: binomial 
Link function: logit 

Formula:
haul_out ~ s(hour, bs = "cc") + s(yday) + s(DeployID, bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.58995    0.06086  -26.13   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq  p-value    
s(hour)     4.660  8.000 229.65  < 2e-16 ***
s(yday)     7.591  8.315 126.63  < 2e-16 ***
s(DeployID) 9.402 13.000  36.27 7.09e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.0595   Deviance explained = 6.11%
fREML =  11526  Scale est. = 1         n = 8111
Code
plot(mod, pages = 0, scale = 0)

For hour, there is a peak at 13:00 UTC, which coincides with that 8am local time peak noted by Ampela et al. (2023). We found the low point to be 03:00 UTC, i.e., 10pm local time while Ampela et al. for 2022 data found it to be 4pm local time. It would be of interest to investiate potential annual differences, although there is strong confounding between seal ID and year, so it likely is not possible to include both.

Note that there are few records from later in the year (given the truncation that only records from while the seals were in Virginia were used):

Code
hist(histo$yday, main = "Number of hours of wet-dry data per day", 
     xlab = "Day of year", breaks = seq(min(histo$yday) - 0.5, max(histo$yday) + 0.5, by = 1)) 

The pattern in the above GAM with year is hard to interpret, and it would be of interest to include variables such as moon phase, for example. In the absence of this we constrain the GAM to have fewer degrees of freedom for that smooth, to make a more interpretable pattern:

Code
mod2 <- bam(haul_out ~ s(hour, bs = "cc") + s(yday, k = 3) + s(DeployID, bs = "re"), 
           knots = list(hour = c(0, 24)), 
           data = histo, family = binomial)
summary(mod2)

Family: binomial 
Link function: logit 

Formula:
haul_out ~ s(hour, bs = "cc") + s(yday, k = 3) + s(DeployID, 
    bs = "re")

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.568      0.061   -25.7   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq  p-value    
s(hour)     4.652  8.000 227.84  < 2e-16 ***
s(yday)     1.844  1.968  60.36  < 2e-16 ***
s(DeployID) 9.552 13.000  38.61 5.92e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.0493   Deviance explained = 5.07%
fREML =  11541  Scale est. = 1         n = 8111
Code
plot(mod2, pages = 0, scale = 0)

We now see a general decline in haul-out probablity over the year.

Here we make predictions of haul-out probablity by month and time of day. We only have data for February, March and April so we only make predictions over those months. For April (starting on day-of-year 91) we have rather few data points, so we do not attempt that prediction. (In future versions of this analysis we could perhaps predict by day rather than aggregating across months.)

Code
newdata <- data.frame(hour = 0, yday = c(1:28 + 31, 1:31 + (31 + 28)),
                      month = c(rep(2, 28), rep(3, 31)))
pred_haul_out <- matrix(0, 24, 2)
for(i in 1:24){
  newdata$hour <- i - 1
  #Set random effect to 0 in prediction
  tmp <- predict(mod2, newdata = newdata, type = "response", 
                 exclude = "s(DeployID)", newdata.guaranteed = TRUE)
  tmp2 <- aggregate(tmp, by = list(newdata$month), mean)
  pred_haul_out[i, ] <- tmp2$x
}
plot(0:23, pred_haul_out[, 1], ylim = range(pred_haul_out), xlab = "hour (UTC)",
     ylab = "p(haulout)", type = "l")
lines(0:23, pred_haul_out[, 2], col = "red")
legend("topleft", legend = c("Feb", "March"), lty = c(1, 1), col = c("black", "red"))

Plotting the same in local time:

Code
lhour_cols <- c(6:24, 1:5)
plot(0:23, pred_haul_out[lhour_cols, 1], ylim = range(pred_haul_out), xlab = "hour (local)",
     ylab = "p(haulout)", type = "l")
lines(0:23, pred_haul_out[lhour_cols, 2], col = "red")
legend("topright", legend = c("Feb", "March"), lty = c(1, 1), col = c("black", "red"))

Note that there is an alternative source of information about haul-outs available for the SPLASH-10 tags deployed since 2020 – these have a haul-out file that records the start and end time of haulouts, and also gives an indication of any missing haulouts (haulout ID cycles from 0 to 15 so missing numbers mean missing haulouts). Results from this could potentially be compared with the above, at least for the 7 SPASH-10 tags used.

4 Combined analysis

Here we use data from 8am to 4pm local time (before and after that the amount of available data drops off, possibly because of visibility issues). We also only use data for February and March as that is all we predict haul-out probability for (in a future analysis we should consider predicting by day rather than month, and could therefore use more of the available data).

We make no attempt at variance estimation here - we will want to build in uncertainty from the haul-out analysis, and also determine what an appropriate sampling unit is for the counts.

Code
#Take just between 8 and 16 inclusive (9 hours)
#Take mean per month
cdat <- dat6[(dat6$Hour >= 8) & (dat6$Hour <= 16),]
cdat <- cdat[(cdat$Month %in% c(2, 3)), ]
cdat <- cdat %>% 
  group_by (Year, Month, Hour) %>%
  summarize(.groups = "keep", SealsHO = mean(SealsHO))

#Make predicted haul out data frame for joining
cpred_haul_out <- data.frame(Hour = c(0:23, 0:23), Month = c(rep(2, 24), rep(3, 24)),
                             haul_out_p = c(pred_haul_out[lhour_cols, 1],
                                            pred_haul_out[lhour_cols, 2]))
cdat2 <- left_join(cdat, cpred_haul_out, by = c("Month", "Hour"))
cdat2$Nhat <- cdat2$SealsHO / cdat2$haul_out_p

#Combine across hours to produce similar plot to previous one
cdat3 <- cdat2 %>%
  group_by(Year, Month) %>%
  summarize(.groups = "keep", Nhat = mean(Nhat))
cdat3$Year_ind <- cdat3$Year - 2019 + ifelse(cdat3$Month > 7, 1, 0)

#Create information to make a plot
month_labels <- c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")
month_ind <- c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)
year_color <- c("blue", "black", "red", "green")
x_pos <- month_ind[cdat3$Month] #+ (cdat3$Year_ind - 2) / 10


plot(x_pos, cdat3$Nhat, xaxt = "n" ,
     col = year_color[cdat3$Year_ind],
     ylab = "Estimated abundance", xlab = "Month",
     pch = 1) #20)
axis(1, at = 1:8, labels = month_labels)
legend("topright", legend = c("2019-20", "2020-21", "2021-22", "2022-23"),
       lty = 1, pch = 1, col = year_color)
for (i in 1:nrow(cdat3)){
  if(i > 1) {
    if(cdat3$Year_ind[i] == cdat3$Year_ind[i - 1]){
      lines(c(x_pos[i - 1], x_pos[i]), 
            c(cdat3$Nhat[i - 1], cdat3$Nhat[i]), 
            lty = 2,
            col = year_color[cdat3$Year_ind[i]])
    }
  }
}

Code
#Take mean across years and months to see if there's an hourly pattern
#Needs more thinking before implemented however
#cdat4 <- cdat2 %>%
#  group_by(Hour) %>%
#  summarize(.groups = "keep", Nhat = mean(Nhat))
#plot(cdat4$Hour, cdat4$Nhat, type = "l")

Estimates are generally a little higher than the previous exploratory analysis. Correcting for decreased haul-out probability in March compared with February has brought the latter numbers up a somewhat.

One check that needs to be made is to see whether estimated populaton abundance changes over hour of day – this would imply that the hourly haul-out correction has not been successful.

5 Discussion

There are a number of oustanding items to address with this analysis.

  • For the haul-out counts
    • Models could include additional covariates such as tide - although this is not needed here and is the subject of Michelle’s thesis
    • Variance estimation needs addressed
    • The temporal unit could be day rather than month
  • For the tag analysis
    • Other covariates could be incorporated into the analysis
    • Autocorrelation could be accounted for
    • Variance estimation needs addressed
    • The haulout file could be mined for the SPLASH-10 tags, and results compared with haul-out probabilities from the hourly data.
  • For the combined analysis
    • If the hauled-out probability by hour is estimated well then there should be no hourly pattern in the corrected count data; this can be checked.

Assumptions will need to be discussed:

  • Hourly tag data gaps are missing at random with respect to haulout time
  • Cameras cover all haulouts
  • Counts from cameras have no false positives or negative

6 Literature cited

Ampela, K., J. Bort, R. DiGiovanni, Jr., A. Deperte, D. Jones, and D. Rees. 2023. Seal Tagging and Tracking in Virginia: 2018-2022. Prepared for U.S. Fleet Forces Command. Submitted to Naval Facilities Engineering Systems Command Atlantic, Norfolk, Virginia, under Contract No. N62470-15-8006, Task Order 19F4147, issued to HDR, Inc., Virginia Beach, Virginia. March 2023.

Guins, M., Rees, D., and A. Lay. 2023. Pinniped Time-lapse Camera Surveys in Southern Chesapeake Bay and Eastern Shore, Virginia: 2019-2023. Final Report. Prepared for U.S. Fleet Forces Command, Norfolk, Virginia. July 2023.

Sharples, R.J., Mackenzie, M.L. and Hammond, P.S. 2009. Estimating seasonal abundance of a central place forager using counts and telemetry data. Marine Ecology Progress Series 378: 289–298.