Estimating harbor seal abundance from Virginia camera trap surveys, version 2
Author
Len Thomas, Michelle Guins and Deanna Rees
Published
March 30, 2025
Acknowledgements
Thank-you to Kristen Ampela for providing the seal tag data, and to Monica DeAngelis for helping us understand it. Thank-you also to Joel Bell for his support during this project. This work was funded by the US Navy Navy Fleet Forces Command and managed by Naval Facilities Engineering Systems Command (NAVFAC) Atlantic via a contract with HDR Inc.
Note
This document is for internal use only. The analyses documented here will form the basis for a report to be submitted, via HDR, to the US Navy NAVFAC Atlantic. Currently details of analyses are included that will likely not appear (or perhaps be Appendices) in the final report. Accompanying text will be somewhat different, and figures and tables may be altered/omitted.
1 Introduction
The goal of this work is to estimate population size of harbor seals in Virginia waters using remotely operated time-lapse cameras placed at haulout locations in southeastern Virginia. Details of the camera locations and methods used to count seals from the photographs are given in Guins et al. (2025) and citations therein. Seals in general spend some time ``hauled out’’ (resting) on land; they spend the remainder of their time at sea. Here we use the camera-based counts of seals that are hauled out, and we undertake a separate analysis to determine what proportion of seals are hauled out (and hence what correction factor is needed to estimate total population size). This separate analysis comes from seals in Virginia fitted with satellite telemetry tags that record temporal information about whether the tag is wet or dry. Details of the tagging, tags and data collection are given by Ampela et al. (2023). We note that our overall approach is similar to that of Sharples et al. (2009), although they used vessel-based counts rather than camera data, and the exact analysis approach were somewhat different. (More details of the differences could be given in the HDR report.)
This document is in four parts. First we provide an analysis of the camera data. We then provide an analysis of the tag data. Third we combine the two to produce estimates of population size. We close with a discussion of the results and future potential work.
This document is written in Quarto (a multi-purpose tool for research documentation and distribution) - it is a “live” document containing both text and analysis code written in R. The HDR report will be written as a standard Word document.
2 Haulout counts from camera trap data
2.1 Reading in and aggregating data
We read in data from a series of Excel spreadsheets, extracted from the Seal Camera Trap Analysis Microsoft Access database. We use data from both the the Eastern Shore (also known as ``Little Inlet’’) and Chesapeake Bay Bridge-Tunnel (CBBT) survey sites. Cameras are in place from Oct/Nov through Apr/May each year. Data from 2019 - 2024 were analyzed, i.e., 5 seasons.
Note
Check with Michelle that I’m selecting the correct columns as the layout of each Excel file is different! In particular, I should check if I should include HO_UnID or not? - I do not at present in the cases where I manually calculate SealHO (as opposed to using TotalHO).
Code
#Base year for Julian datesorigin <-as.Date("2020-01-01")dat_folder <-file.path("./MichelleGuinsDataAndAnalyses2")#Read in Little Inlet datafile_names <-c("Little Inlet Season 1 Timelapse Camera Data FINAL wtih environmental.xlsx","Little Inlet Season 2 Timelapse Camera Data FINAL wtih environmental.xlsx","Little Inlet Season 3 Timelapse Camera Data FINAL wtih environmental.xlsx","Little Inlet Season 4 Timelapse Camera Data FINAL wtih environmental.xlsx","TimelapseData - Little Inlet S5 FINAL with Env Data.csv.xlsx")for(i in1:length(file_names)){ tmp <-read_excel(path =paste0(dat_folder, "/", file_names[i]),sheet =1, guess_max =100000)if(i ==1) { select_cols <-c(4, 6, 7, 8, 21, 24, 26, 28, 32) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") tmp2$SealsHO <-rowSums(tmp[, 10:19]) tmp2$ID <-1:nrow(tmp2) }if(i ==2) { select_cols <-c(2, 6, 8, 9, 10, 24, 27, 30, 31, 35) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("ID", "CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") tmp2$SealsHO <-rowSums(tmp[, 12:23]) }if(i ==3){ select_cols <-c(4, 6, 7, 8, 21, 24, 26, 28, 30, 34) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", "SealsHO", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")#This code checks that the SealsHO in this sheet doesn't include# GreySeal, TaggedSeal, UnObject, etc# It seems to check out, so I commented out the check#tmp2$SealsHO2 <- rowSums(tmp[, 10:19])#sum(tmp2$SealsHO != tmp2$SealsHO2) tmp2$ID <-1:nrow(tmp2) }if(i ==4){ select_cols <-c(1, 5, 7, 8, 9, 21, 33, 23, 25, 27, 31) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("ID", "CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", "SealsHO", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") }if(i ==5){ select_cols <-c(4, 6, 7, 8, 22, 18, 19, 20) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f")#No delete flag provided, so assume all are okay tmp2$DeleteFlag =FALSE tmp2$SealsHO <-rowSums(tmp[, 9:17]) tmp2$ID <-1:nrow(tmp2) }#Filter out values with DeleteFlag true tmp2 <- tmp2[!tmp2$DeleteFlag, ]#Note I assume the GreySeal column is GreySealsHO and that there is no count of# grey seals in the water (there is for CBBT)#Arrange the columns tmp2 <- tmp2 %>%select(ID, CameraID, DateTime, SealsHO, SealsIW,GreySealsHO = GreySeal, Vessel, Tide_ft, Wind_kn, AirTemp_f)#Filter out values with no total (I think it just occurs for i==1) tmp2 <- tmp2[!is.na(tmp2[, "SealsHO"]), ]#Filter out values with no CameraID (not sure why this happens but it did in the)# previous iteration tmp2 <- tmp2[!is.na(tmp2[, "CameraID"]), ]if(i ==1) { dat <- tmp2 } else { dat <-rbind(dat, tmp2) }}dat$Site <-"LI"#Read in CBBT ddatafile_names <-c("CBBT Season 1 Timelapse Camera Data FINAL wtih environmental.xlsx","CBBT Season 2 Timelapse Camera Data FINAL wtih environmental.xlsx","CBBT Season 3 Timelapse Camera Data FINAL wtih environmental.xlsx","CBBT Season 4 Timelapse Camera Data FINAL wtih environmental.xlsx","TimelapseData CBBTS5 Final w Env Data.xlsx")for(i in1:length(file_names)){ tmp <-read_excel(path =paste0(dat_folder, "/", file_names[i]),sheet =1, guess_max =100000)if(i ==1) { select_cols <-c(1, 5, 8, 11, 9, 10, 17, 20, 23, 24, 15) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("ID", "CameraID", "DateTime", "Vessel", "SealsHO", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") }if(i ==2) { select_cols <-c(5, 4, 8, 6, 7, 20, 9, 12, 13, 19) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsHO", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") tmp2$ID <-1:nrow(tmp2) }if(i ==3){ select_cols <-c(4, 6, 7, 23, 8, 21, 26, 28, 30, 34) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsHO", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") tmp2$ID <-1:nrow(tmp2) }if(i ==4){ select_cols <-c(4, 5, 6, 7, 8, 9, 12, 14, 16, 20) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsHO", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag") tmp2$ID <-1:nrow(tmp2) }if(i ==5){ select_cols <-c(4, 5, 10, 6, 7, 8, 11, 12, 13) tmp2 <- tmp[, select_cols]colnames(tmp2) <-c("CameraID", "DateTime", "Vessel", "SealsHO", "SealsIW", "GreySeal", "Tide_ft", "Wind_kn", "AirTemp_f", "DeleteFlag")#No delete flag provided, so assume all are okay tmp2$DeleteFlag =FALSE tmp2$ID <-1:nrow(tmp2) }#Filter out values with DeleteFlag true tmp2 <- tmp2[!tmp2$DeleteFlag, ]#Note I assume the GreySeal column is GreySealsHO (in some cases it's not clear)#Arrange the columns tmp2 <- tmp2 %>%select(ID, CameraID, DateTime, SealsHO, SealsIW,GreySealsHO = GreySeal, Vessel, Tide_ft, Wind_kn, AirTemp_f)#Filter out values with no total (I think it just occurs for i==1) tmp2 <- tmp2[!is.na(tmp2[, "SealsHO"]), ]#Filter out values with no CameraID (not sure why this happens but it did in the)# previous iteration tmp2 <- tmp2[!is.na(tmp2[, "CameraID"]), ]if(i ==1) { dat2 <- tmp2 } else { dat2 <-rbind(dat2, tmp2) }}dat2$Site <-"CBBT"#Combine LI and CBBTdat3 <-rbind(dat, dat2)dat3$Julian <-round(as.numeric(julian(dat3$DateTime, origin = origin)))#Create columns for Julian day of year, and also for hour, year and dateHourdat3$Hour <-hour(dat3$DateTime)dat3$Year <-year(dat3$DateTime)#Make a unique date-hour variabledat3$DateHour <-paste(dat3$Year, month(dat3$DateTime), day(dat3$DateTime), dat3$Hour, sep ="-")
Data are available every 15 minutes in most cases, but here we group by taking the mean per hour of number of seals, tide, wind and air temp.
Code
#Take the mean over each hour for each cameradat4 <- dat3 %>%group_by(DateHour, CameraID) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO), n =n())
Code
#Worth flagging cases with n>4 - means more than 4 counts per hour - maybe some# issue here - sems to happen in about 5% of casesind <- dat4$n >4
Taking the mean for each hour, there are 109347 records (i.e., camera-hours with counts).
Note
Note that some have more than 4 records, which shouldn’t happen if there are counts every 15 minutes. There are 18213 such records and this needs investigating. I currently take the mean per hour per camera and then sum over cameras. So, as long as each image has been processed and counted, this should be fine. But it would be good to understand why there are so many cases where there are more than 4 records per hour – an indeed in some cases many more than 4. As an example, here are the top 10 cases with the highest number of repeat counts per hour - the DateHour column is year, month, day, hour on 24h clock
Here we sum over cameras, on the assumption that cameras do not catch the same seals – there was an active effort to avoid double-counting during the data processing. We then take only hours where 8 or more cameras were operating.
Note
I used 8 as an arbitrary number at present. We need to discuss a protocol for determining what number of operating cameras we need (or which cameras) per year to be considered a complete count.
Code
#Sum over camerasdat5 <- dat4 %>%group_by(DateHour) %>%summarize(SealsHO =sum(SealsHO), n =n())#Take only hours where all cameras were operating#This assumes the maximum number of cameras operating is the expected number --# i.e., that cameras weren't added in different years (could group by year to check)max_n_cameras <-8#max(dat5$n)dat6 <- dat5[dat5$n >= max_n_cameras, ]dat6$DateTime <-parse_date_time(dat6$DateHour, order ="ymdH")dat6$Hour <-hour(dat6$DateTime)dat6$Month <-month(dat6$DateTime)dat6$Year <-year(dat6$DateTime)dat6$Season <- dat6$Year -2019+ifelse(dat6$Month >7, 1, 0)
This results in 9704 hours of data in total.
2.2 Temporal patterns in haulout counts
As initial exploration, we fit a GAM with hour of day as a smooth (assuming for now that the mean haulout count is normally distributed).
Code
mod <-bam(SealsHO ~s(Hour), data = dat6)plot(mod, shift =mean(dat6$SealsHO), ylab ="Total hauled-out seals")
This suggests taking hours 0800-1200 as those have the maximum count. It could be that visibility is an issue at the beginning and end of the day. Also, note that there are fewer records (i.e., hours where there are counts from all cameras) from the beginning and end of the day. This is expected as cameras only operated during daylight hours. Here is a tabulation of the number of counts per hour:
Code
knitr::kable(table(dat6$Hour), col.names =c("Hour of day", "n counts"))
Hour of day
n counts
5
1
6
267
7
780
8
821
9
826
10
833
11
834
12
833
13
832
14
832
15
830
16
832
17
662
18
312
19
165
20
44
One question was on whether the diurnal pattern changes between study seasons. Below we try some models where the smooth varies by season:
Code
mod_additive_season <-bam(SealsHO ~s(Hour) +as.factor(Season), data = dat6)mod_interaction_season <-bam(SealsHO ~s(Hour, by =as.factor(Season)), data = dat6)AIC(mod, mod_additive_season, mod_interaction_season)
It looks from this that a model with an additive effect of season is the best (lowest AIC) of those tried. However, as noted by Guins et al. (2025), it will probably be more insightful to derive a variable related to sunrise/sunset and use this instead of hour in these analyses.
Note
Obtain and try a diurnally-related variable. However, some discussion is required of what metric to use since we want to account for minutes relative to sunrise and sunset. Examples could be a sun angle variable, or a more linear one that is 0 at sunrise and sunset, 1 at the half-way point between sunrise and sunset and -1 at the midpoint between sunset and sunrise, with linear functions in between. It does not matter much what is chosen if smooths are used to link the covariate to the response.
We also fit a GAM to day of the survey season, where 0 is Julian day 200 (again, assuming a normal distribution on count, so predictions can go negative).
(Note that the plot here is the estimated smooth coefficient, not a prediction, so doesn’t take account of any intercept fit in the model – if we use this in the report then we would be better to show predictions and constrain the model so it cannot produce negative predictions.)
Code
#Get day of year dat6$DayOfYear <-yday(dat6$DateTime)#Transform so 1st day of surveying is day 0 - # may need to come back and check how it deals with leap yearsoffset <-min(dat6$DayOfYear[dat6$DayOfYear >200])dat6$DayOfSeason <-ifelse(dat6$DayOfYear >200, dat6$DayOfYear - offset, dat6$DayOfYear +366- offset)mod <-bam(SealsHO ~s(DayOfSeason), data = dat6)plot(mod, shift =mean(dat6$SealsHO))
2.3 Initial estimates of abundance
To obtain initial estimates of abundance (e.g., for comparison later), we used data from between 0800 and 1200 inclusive (i.e., 5 hours), and haulout probablities for 0800, 1000 and 1200 in February and March 2022 for 5 seals read from Figure 12 of Ampella et al. (2023). Combing the camera-derived counts from the previous section with these haulout probabilities gave an abundance estimate per hour; we then took the mean per month.
A plot per month per season is given below, with vertical lines showing log-normal confidence intervals.
Code
#Take just between 8 and 12 inclusive (5 hours)#Take mean per monthdat7 <- dat6[(dat6$Hour >=8) & (dat6$Hour <=12),]dat7 <- dat7 %>%group_by (Year, Month, DayOfYear) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO))dat7 <- dat7 %>%group_by (Year, Month) %>%summarize(.groups ="keep", sd =sd(SealsHO), SealsHO =mean(SealsHO), n =n())dat7$se <- dat7$sd /sqrt(dat7$n)dat7$cv <- dat7$se / dat7$SealsHOdat7$cv[is.na(dat7$cv)] <-0#Analysis of tag data#haulout probabilities for 8, 10, 12am in Feb and March 2022 for 5 seals,# taken from Fig 12 fo Ampella et al. 2023#Their data are more precise in time than ours, which is really just to the# nearest hour (although we could potentially do every 15 minutes)haul_out_p <-mean(c(0.67, 0.50, 0.30, 0.50, 0.32, 0.35))#Ampella reported seals most likely to haul out between 04:00 and 12:00dat8 <-data.frame(Year = dat7$Year, Month = dat7$Month,Nhat = dat7$SealsHO / haul_out_p,CV = dat7$cv)#Lognormal confidence intervals, from Buckland et al. 2021c.val <-exp(qnorm(0.975) *log(1+ dat8$CV^2)) dat8$LCL <- dat8$Nhat / c.valdat8$UCL <- dat8$Nhat * c.valdat8$Year_ind <- dat8$Year -2019+ifelse(dat8$Month >7, 1, 0)#Create information to make a plotmonth_labels <-c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")month_ind <-c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)year_color <-c("blue", "black", "red", "green", "brown")x_pos <- month_ind[dat8$Month] + (dat8$Year_ind -2) /10plot(x_pos, dat8$Nhat, xaxt ="n" ,col = year_color[dat8$Year_ind], ylim =range(c(dat8$UCL, dat8$LCL)),ylab ="Estimated abundance", xlab ="Month",pch =1) #20)axis(1, at =1:8, labels = month_labels)legend("topleft", legend =c("2019-20", "2020-21", "2021-22", "2022-23", "2023-24"),lty =1, pch =1, col = year_color)for (i in1:nrow(dat8)){lines(c(x_pos[i], x_pos[i]), c(dat8$LCL[i], dat8$UCL[i]), col = year_color[dat8$Year_ind[i]], lwd =2)if(i >1) {if(dat8$Year_ind[i] == dat8$Year_ind[i -1]){lines(c(x_pos[i -1], x_pos[i]), c(dat8$Nhat[i -1], dat8$Nhat[i]), lty =2,col = year_color[dat8$Year_ind[i]]) } }}
Note that this analysis treats each hour within each day as an independent replicate; an alternative would be to take daily means and then take the mean of these over each month. We also assume the haulout probability only depends on time of day, and ignore any effect on the variance estimates of covariance induced by using the same haulout estimates in all days; we also ignore uncertainty in the haulout estimates. However, the above is a useful initial estimate, with the caveat that the uncertainty estimates should be treated with caution.
If this preliminary analysis was included in our HDR report it would need a more full discussion of the uncertainty components that are not included.
3 Haulout probability from tag data
3.1 Reading in the tag data
Here we read in the tag data. Animals were all tagged in the Little Inlet area. This is the summary file, with one line for each tag:
Code
tag_dir <-"./Seal Tag Data 2018-2022"tag_summary <-read_excel(path =paste(tag_dir, "TagDataSummary.xlsx", sep ="/"))tag_summary$DeploymentStartDateTime <-parse_date_time(tag_summary$DeploymentStartDateTime, orders ="YmdHMS")tag_summary$DeploymentStopDateTime <-parse_date_time(tag_summary$DeploymentStopDateTime, orders ="YmdHMS")n_files <-nrow(tag_summary)#Deployment lat long looked to have issues, as well as deployment location,so agreed not to showknitr::kable(tag_summary[, 1:4])
TagID
TagModel
DeploymentStartDateTime
DeploymentStopDateTime
166449
SPLASH10- 309A
2018-02-04
2018-06-29
166450
SPOT-311A
2018-02-04
2018-05-23
166451
SPOT-311A
2018-02-04
2018-05-06
166452
SPOT-311A
2018-02-04
2018-05-26
166453
SPOT-311A
2018-02-06
2018-04-09
173502
SPOT-311A
2018-02-06
2018-06-22
173503
SPOT-311A
2018-02-08
2018-04-26
177411
SPLASH10-F 297A
2020-02-26
2020-07-12
177410
SPLASH10-F 297A
2020-03-02
2020-06-10
178255
SPLASH10-F 297A
2022-02-07
2022-06-09
178256
SPLASH10-F 297A
2022-02-08
2022-06-17
178257
SPLASH10-F 297A
2022-02-09
2022-07-18
177412
SPLASH10-F 297A
2022-02-15
2022-05-25
178258
SPLASH10-F 297A
2022-02-15
2022-06-04
Below we read in data for each tag in turn. We read in the tag location file and, for each tag plot the lat long locations, joining them with black dots. Some locations have low accuracy, so we use a simple filter which is just to use locations with quality codes 3, 2 and 1 (i.e., discarding 0, A, B and Z) – lines joining these are in red. (We also discard the first day of data as a way to partially avoid “tag-on” issues.) We show the Northern and Southern border of Virigina as black horizontal dashed lines. None of the seals spend any significant time south of Virginia, but almost all migrate North at some point in the season. Since we are interested in haulout probability just while in Virginia waters, we truncate the data at the point the seals move outside Virginia – this is shown with a large red circle on the plots (may not be shown on some plots as we truncate the plot at 38.5 degrees north, so if they moved beyond that for their first record North of Virginia waters, it will not be shown).
Code
states <-map_data("state")#North and South latitude boundaries of Virginiavirginia_lat <-c(38.027038704500775, 36.550603468154065)#Read in the histo filescolnames <-c("DeployID", "Date", paste0("Bin", 1:24))for(i in1:n_files){ histo_filename <-paste(tag_dir, tag_summary$TagID[i], paste(tag_summary$TagID[i], "Histos.csv", sep ="-"),sep ="/") histo_file <-read.csv(histo_filename) ind <- histo_file$HistType %in%c("1Percent", "Percent")#Save just percent rows, and just the deployID, date and hour columns histo_file <- histo_file[ind, colnames]#Make sure they are all numericfor(j in1:24){ histo_file[, colnames[j +2]] <-as.numeric(histo_file[, colnames[j +2]]) } histo_file$Date <-parse_date_time(histo_file$Date, order ="HMS dbY")#Remove records from before and including the day of deployment and# after and including the day the tag stopped ind <- (histo_file$Date >= tag_summary$DeploymentStartDateTime[i] +days(1)) & (histo_file$Date <= tag_summary$DeploymentStopDateTime[i] -days(1)) histo_file <- histo_file[ind, ]#Remove records from the tag after leaving Virginia waters#Read in the locations location_filename <-paste(tag_dir, tag_summary$TagID[i], paste(tag_summary$TagID[i], "Locations.csv", sep ="-"),sep ="/") location_file <-read.csv(location_filename)#Keep only dates after deployment and before the tag stopped location_file$Date <-parse_date_time(location_file$Date, order ="HMS dbY") ind <- (location_file$Date >= tag_summary$DeploymentStartDateTime[i] +days(1)) & (location_file$Date <= tag_summary$DeploymentStopDateTime[i] -days(1)) location_file <- location_file[ind, ]# #Plot these data# plot(location_file$Longitude, location_file$Latitude, # main = paste("Tag", tag_summary$TagID[i]),# ylim = range(c(virginia_lat, location_file$Latitude)), ylab = "Latitude",# xlab = "Longitude", type = "l")# #Add lines showing Virginia N and S# abline(h = virginia_lat[1], lty = 2, col = "blue")# abline(h = virginia_lat[2], lty = 2, col = "blue")#Keep only good quality locations - those with an error radius of < 10km ind <- location_file$Quality %in%c("3", "2", "1") location_file2 <- location_file[ind, ] %>%arrange(Date)# #Add to plot# lines(location_file2$Longitude, location_file2$Latitude, col = "red", lty = 2)#Find the first that is north of virginia_latitude (if any) ind <- location_file2$Latitude > virginia_lat[1]if(any(ind)){ first <-which.max(ind)# #Add a point record to show where this is# points(location_file2$Longitude[first], location_file2$Latitude[first])#remove histo records from this day or later ind <- histo_file$Date < location_file2$Date[first] histo_file <- histo_file[ind, ] }#Plot the data p <-ggplot(data = states) +geom_polygon(mapping =aes(x = long, y = lat, group = group), fill =NA, color ="grey") +geom_hline(yintercept = virginia_lat[1], linetype =2) +geom_hline(yintercept = virginia_lat[2], linetype =2) +geom_point(data = location_file, mapping =aes(x = Longitude, y = Latitude)) +coord_map("albers", lat0 =45.5, lat1 =29.5, xlim =c(-77, -74), #c(range(location_file$Longitude), ylim =c(36.5, 38.5)) +#range(c(location_file$Latitude, virginia_lat))) +geom_line(data = location_file2, mapping =aes(x = Longitude, y = Latitude, colour ="red"), show.legend =FALSE) +geom_point(data = location_file2[first, ], mapping =aes(x = Longitude, y = Latitude, colour ="red", size =2), show.legend =FALSE) +ggtitle(paste("Tag", tag_summary$TagID[i]))print(p)#Pivot into tag, date, hour format tmp <- histo_file %>%pivot_longer(cols =starts_with("Bin"), names_to ="hour", values_to ="perc")#Add to overall frameif(i ==1){ histo <- tmp } else { histo <-rbind(histo, tmp) }}
Code
#Do some post-processing#There is one NA in perc so remove that recordhisto <- histo[!is.na(histo$perc), ]#Add columns that will be usefulhisto$hour <-as.numeric(substr(histo$hour, 4, 6)) -1histo$month <-month(histo$Date)histo$DeployID <-as.factor(histo$DeployID)histo$yday <-yday(histo$Date)
We also read in the histogram files for each seal, which summarizes the proportion of each hour that the tag was dry, using the wet-dry sensor. The table below has one row for each tag. It shows the mean percentage dry, the start day of season of the record, the end day (while still in Virginia). Also shown is the number of days between start and end day, and the number of days that have records.
As can be seen, there are many missing days of data. This would be problematic if the satellite tags are less likely to transmit on days when the seal spends more time in the water – this will cause a strong overestimate in the haulout estimate. Correspondence with Moncia DeAngelis indicated that there was an issue with tag design/programming that caused the gaps, but that this seems unrelated to haulout probability.
With this caveat, we proceed with the analysis. Below is a histogram of the haulout percentages across all seals. Some heaping is observed at units of 10, which indicates that some tags were programmed to record only low-resolution data (which rounds to the nearest 10 plus 3 and 98). This perhaps could be investigated some more. Overall, hours are mostly fully dry or below about 25 percent dry.
Code
hist(histo$perc, breaks =seq(-0.5, 100.5, by =1), main ="Percentage haulout per hour",xlab ="Percentage")
Ampela et al. (2023) assumed a haulout event occured in an hour only if the tag was 100 percent dry for that hour, but here we take 50 percent as a cutoff. As can be seen from the above histogram, results will not be very sensitive to this and indeed any number between around 30 and 100 could be used. We chose 50 percent as this presumably balances the risk of under- and over-estimation of a haulout event during an hour (recalling that the count data are taken every 15 minutes).
Below we fit a binary GAM, modelling whether a seal is hauled out or not as a function of hour of the day (as a cyclic smooth), day of the year and also a random effect for animal. Note that hour of day here is in UTC, so we need to subtract 5 to arrive at local time (depending on local daylight saving this may be 4 hours less at some times of year).
Note also that this analysis could be extended to allow autocorrelation in haulout probability, by examining correlation in residuals and, if suitable, using the order 1 correlation as the rho parameter in the bam call (and also setting an AR.start for each tag record, accounting potentially for missing days also).
Code
#Count any hour with >50% dry as a haulouthisto$haul_out <- histo$perc >50#Specify that 0 and 24 are the same for hour by giving end knots (otherwise it joins 0 and 23)mod <-bam(haul_out ~s(hour, bs ="cc") +s(yday) +s(DeployID, bs ="re"), knots =list(hour =c(0, 24)), data = histo, family = binomial)summary(mod)
For hour, there is a peak at 1300 UTC, which coincides with that 0800 local time peak noted by Ampela et al. (2023). We found the low point to be 0300 UTC, i.e., 2200 local time while Ampela et al. (2023) for 2022 data found it to be 1600 local time. It would be of interest to investigate potential annual differences, although there is strong confounding between seal ID and year (i.e., we don’t have seals tagged across multiple seasons), so it likely is not possible to include both because one cannot distinguish a season effect from a between-seal difference.
Note that there are few records from later in the year (given the truncation that only records from while the seals were in Virginia were used):
Code
hist(histo$yday, main ="Number of hours of wet-dry data per day", xlab ="Day of year", breaks =seq(min(histo$yday) -0.5, max(histo$yday) +0.5, by =1))
The pattern in the above GAM with year is hard to interpret, and it would be of interest to include variables such as moon phase, for example. In the absence of this we constrain the GAM to have fewer degrees of freedom for that smooth, to make a more interpretable pattern:
Code
mod2 <-bam(haul_out ~s(hour, bs ="cc") +s(yday, k =3) +s(DeployID, bs ="re"), knots =list(hour =c(0, 24)), data = histo, family = binomial)summary(mod2)
We now see a general decline in haulout probability over the year.
Here we make predictions of haulout probability by month and time of day. We only have tagging data for February, March and April so we only make predictions over those months. For April (starting on day-of-year 91) we have rather few data points, so we do not attempt that prediction. (In future versions of this analysis we could perhaps predict by day rather than aggregating across months.)
Code
newdata <-data.frame(hour =0, yday =c(1:28+31, 1:31+ (31+28)),month =c(rep(2, 28), rep(3, 31)))pred_haul_out <-matrix(0, 24, 2)for(i in1:24){ newdata$hour <- i -1#Set random effect to 0 in prediction tmp <-predict(mod2, newdata = newdata, type ="response", exclude ="s(DeployID)", newdata.guaranteed =TRUE) tmp2 <-aggregate(tmp, by =list(newdata$month), mean) pred_haul_out[i, ] <- tmp2$x}plot(0:23, pred_haul_out[, 1], ylim =range(pred_haul_out), xlab ="hour (UTC)",ylab ="p(haulout)", type ="l")lines(0:23, pred_haul_out[, 2], col ="red")legend("topleft", legend =c("Feb", "March"), lty =c(1, 1), col =c("black", "red"))
Plotting the same in local time:
Code
lhour_cols <-c(6:24, 1:5)plot(0:23, pred_haul_out[lhour_cols, 1], ylim =range(pred_haul_out), xlab ="hour (local)",ylab ="p(haulout)", type ="l")lines(0:23, pred_haul_out[lhour_cols, 2], col ="red")legend("topright", legend =c("Feb", "March"), lty =c(1, 1), col =c("black", "red"))
Note that there is an alternative source of information about haulout events available for the SPLASH-10 tags deployed since 2020 – these have a haulout file that records the start and end time of haulout events, and also gives an indication of any missing haul out events (haulout ID cycles from 0 to 15 so missing numbers mean missing haulout events). Results from this could potentially be compared with the above, at least for the 7 SPASH-10 tags used.
4 Combined analysis to estimate abundance
Here we use count and tagging data from 0800 to 1600 local time (before and after that the amount of available data drops off, due to change in sunrise-sunset times over the season). We also only use data for February and March as that is all we predict haulout probability for (in a future analysis we should consider predicting by day rather than month, and could therefore use more of the available data).
We make no attempt at variance estimation here - we will want to build in uncertainty from the haulout analysis, and also determine what an appropriate sampling unit is for the counts.
Code
#Take just between 8 and 16 inclusive (9 hours)#Take mean per monthcdat <- dat6[(dat6$Hour >=8) & (dat6$Hour <=16),]cdat <- cdat[(cdat$Month %in%c(2, 3)), ]cdat <- cdat %>%group_by (Year, Month, Hour) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO))#Make predicted haul out data frame for joiningcpred_haul_out <-data.frame(Hour =c(0:23, 0:23), Month =c(rep(2, 24), rep(3, 24)),haul_out_p =c(pred_haul_out[lhour_cols, 1], pred_haul_out[lhour_cols, 2]))cdat2 <-left_join(cdat, cpred_haul_out, by =c("Month", "Hour"))cdat2$Nhat <- cdat2$SealsHO / cdat2$haul_out_p#Combine across hours to produce similar plot to previous onecdat3 <- cdat2 %>%group_by(Year, Month) %>%summarize(.groups ="keep", Nhat =mean(Nhat))cdat3$Year_ind <- cdat3$Year -2019+ifelse(cdat3$Month >7, 1, 0)#Create information to make a plotmonth_labels <-c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")month_ind <-c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)year_color <-c("blue", "black", "red", "green", "brown")x_pos <- month_ind[cdat3$Month] #+ (cdat3$Year_ind - 2) / 10plot(x_pos, cdat3$Nhat, xaxt ="n" ,col = year_color[cdat3$Year_ind],ylab ="Estimated abundance", xlab ="Month",pch =1) #20)axis(1, at =1:8, labels = month_labels)legend("topright", legend =c("2019-20", "2020-21", "2021-22", "2022-23", "2023-24"),lty =1, pch =1, col = year_color)for (i in1:nrow(cdat3)){if(i >1) {if(cdat3$Year_ind[i] == cdat3$Year_ind[i -1]){lines(c(x_pos[i -1], x_pos[i]), c(cdat3$Nhat[i -1], cdat3$Nhat[i]), lty =2,col = year_color[cdat3$Year_ind[i]]) } }}
Code
#Take mean across years and months to see if there's an hourly pattern#Needs more thinking before implemented however#cdat4 <- cdat2 %>%# group_by(Hour) %>%# summarize(.groups = "keep", Nhat = mean(Nhat))#plot(cdat4$Hour, cdat4$Nhat, type = "l")
Estimates are generally a little higher than the exploratory analysis shown in section Section 2.3. Correcting for decreased haulout probability in March compared with February has increased the latter numbers somewhat.
Note
One check that needs to be made is to see whether estimated population abundance changes over hour of day – this would imply that the hourly haulout correction has not been successful.
5 Discussion
There are a number of oustanding items to address with this analysis.
For the haulout counts
Models could include additional covariates such as tide - although this is not needed here and is the subject of Michelle Guin’s thesis
Variance estimation needs to be addressed. Options include a non-parametric bootstrap, or the delta method for combining camera and tagging data outputs.
One issue if we went to a finer time scale (next point) would be serial correlation in estimates over time. In general, autocorrelation is an issue that needs to be considered.
The temporal unit could be day rather than month
For the tag analysis
We are really just interested in haulouts that occur where the animals can be seen from the cameras, so we need to check where haulouts are taking place by cross-referending the animal locations with the wet-dry tag results, if possible
Other covariates could be incorporated into the analysis – these should be covariates that we can use to predict haul-out probability at the camera sites, so perhaps could be tide at those sites, time relative to sunrise, etc - i.e., the same covariates we use for the camera data
The haulout file could be mined for the SPLASH-10 tags, and results compared with haulout probabilities from the hourly data.
For the combined analysis
If the hauled-out probability by hour is estimated well then there should be no hourly pattern in the corrected count data; this can be checked.
Assumptions will need to be discussed:
Hourly tag data gaps are missing at random with respect to haulout time
Cameras cover all haulout locations at both the Eastern Shore and CBBT survey sites
Animals do not haul out elsewhere while in Virginia
This assumption could partly be examined by combining the wet-dry data and location data from the tags – although these are at different tempral resolutions
Counts from cameras have no false positives or negative
Literature cited
Ampela, K., J. Bort, R. DiGiovanni, Jr., A. Deperte, D. Jones, and D. Rees. 2023. Seal Tagging and Tracking in Virginia: 2018-2022. Prepared for U.S. Fleet Forces Command. Submitted to Naval Facilities Engineering Systems Command Atlantic, Norfolk, Virginia, under Contract No. N62470-15-8006, Task Order 19F4147, issued to HDR, Inc., Virginia Beach, Virginia. March 2023.
Guins, M.S., and D.R. Rees. 2025. Pinniped Time-lapse Camera Surveys in Chesapeake Bay and Eastern Shore, Virginia: 2022-2024. Final Report. Prepared for U.S. Fleet Forces Command, Norfolk, Virginia. January 2025.
Sharples, R.J., Mackenzie, M.L. and Hammond, P.S. 2009. Estimating seasonal abundance of a central place forager using counts and telemetry data. Marine Ecology Progress Series 378: 289–298.