Estimating harbor seal abundance from Virginia camera trap surveys
Author
Len Thomas
Published
September 29, 2024
Acknowledgements
Thank-you to Deanna Rees and Michelle Guins for discussions and suggestions; thank-you to Kristen Ampela for providing the seal tag data. This work was funded by the US Navy NAVFAC Atlantic via a contract with HDR Inc.
1 Introduction
The goal of this work is to estimate population size of harbor seals in Virginia waters using remotely operated time-lapse cameras placed at haulout locations in southeastern Virginia. Details of the camera locations and methods used to count seals from the photographs are detailed in Guins et al. (2023) and citations therein. Only hauled out seals are counted, and so an important component of estimating total population size is to estimate the proportion of seals hauled out. Here, this is derived from an analysis of seals in Virginia fitted with transponder tags that record temporal information about whether the tag is wet or dry. Details of the tagging, tags and data collection are given by Ampela et al. (2023). We note that this overall approach is similar in spirit to that of Sharples et al. (2009), although they used ship-based counts rather than camera data, and the exact analysis approach were somewhat different.
This document is in four parts. First we provide an analysis of the camera trap data. We then provide an analysis of the tag data. Third we combine the two to produce estimates of population size. We close with a discsusion of the results and future potential work.
This document is written in Quarto - it is a “live” document containing both text and analysis code written in R. It is designed to form the basis for a final report on this work that may be coauthored by others including those currently listed in the Acknowledgements. Currently details of analyses are included that will likely not appear (or perhaps be Appendices) in the final report.
2 Haul-out counts from camera trap data
2.1 Reading in and aggregating data
We read in data from the Seal Camera Trap Analysis Microsoft Access database. We use data from both the Little Inlet and Chesapeake Bay Bridge-Tunnel (CBBT) sites.
Code
#Base year for Julian datesorigin <-as.Date("2020-01-01")db <-file.path("./Seal Camera Trap Analysis (2).accdb")con <-odbcConnectAccess2007(db)#This lists what tables are there:#sqlTables(con)$TABLE_NAME#Read in Little Inlet datatable_names <-c("TimelapseData - Little Inlet 2019/2020 (S1)","TimelapseData - Little Inlet 2020/2021 (S2)","TimelapseData - Little Inlet 2021/2022 (S3)","TimelapseData - Little Inlet 2022/2023 (S4)")for(i in1:length(table_names)){ tmp <-sqlFetch(con, table_names[i])#Just checking TotalHO is indeed the sum of the sites#Note that there are different column names in different years, and that in# year 4 it include UnID - so presumably an unidentified site?#TotalHO2 <- tmp$HO_A + tmp$HO_B + tmp$HO_C1 + tmp$HO_C2 + tmp$HO_C3 + # tmp$HO_E1 + tmp$HO_E2 + tmp$HO_E3 + tmp$HO_F + tmp$HO_UnID#Note - I am using Total columns -- need to check that corresponds with animals# in wter and not those on land#Select just the columns I'll need, and rename select_cols <-c(1, 5, 7, 35, 9, 8, 22, 26, 28, 30)if(i ==2) select_cols <-c(1, 6, 8, 37, 10, 9, 24, 28, 29, 32)if(i ==3) select_cols <-c(1, 5, 7, 25, 9, 8, 22, 28, 30, 32)if(i ==4) select_cols <-c(1, 5, 7, 8, 11, 9, 22) tmp <- tmp[, select_cols]#No tide, wind or AirTemp for i == 4, so add filler columnsif (i ==4) {tmp$Tide =NA; tmp$Wind <-NA; tmp$AirTemp <-NA}#Note I assume the GreySeal column is GreySealsHO and that there is no count of# grey seals in the water (there is for CBBT)colnames(tmp) <-c("ID", "CameraID", "DateTime", "SealsHO", "SealsIW", "Vessel", "GreySealsHO", "Tide", "Wind","AirTemp")#Filter out values with no total (I think it just occurs for i==1) tmp <- tmp[!is.na(tmp[, "SealsHO"]), ]#Filter out values with no CameraID (not sure why this happens but it does) tmp <- tmp[!is.na(tmp[, "CameraID"]), ]if(i ==1) { dat <- tmp } else { dat <-rbind(dat, tmp) }}dat$Site <-"LI"#Read in CBBT ddatatable_names <-c("TimelapseData - CBBT 2019/2020 (S1)", "TimelapseData - CBBT 2020/2021 (S2)","TimelapseData - CBBT 2021/2022 (S3)","TimelapseData - CBBT 2022/2023 (S4)")for(i in1:length(table_names)){ tmp <-sqlFetch(con, table_names[i])if(i ==1) #Yr 1 - filter out values with no Record IDs tmp <- tmp[!is.na(tmp[, "Record ID"]), ]#Remove those that are due to be deleted tmp <- tmp[tmp$DeleteFlag ==0, ]#Select just the columns I'll need, and rename select_cols <-c(1, 6, 9, 10, 11, 12, 18, 22, 23, 26)if(i ==2) select_cols <-c(1, 6, 5, 7, 8, 9, 21, 11, 12, 15)if(i ==3) select_cols <-c(1, 6, 5, 7, 8, 11, 10, 14, 15, 18)if(i ==4) select_cols <-c(1, 5, 6, 7, 8, 12, 10) tmp <- tmp[, select_cols]#No tide, wind or AirTemp for i == 4, so add filler columnsif (i ==4) {tmp$Tide =NA; tmp$Wind <-NA; tmp$AirTemp <-NA}colnames(tmp) <-c("ID", "CameraID", "DateTime", "SealsHO", "SealsIW", "Vessel", "GreySealsHO", "Tide", "Wind","AirTemp")if(i ==1) { dat2 <- tmp } else { dat2 <-rbind(dat2, tmp) }}dat2$Site <-"CBBT"#Combine LI and CBBTdat3 <-rbind(dat, dat2)dat3$Julian <-round(as.numeric(julian(dat3$DateTime, origin = origin)))#Create columns for Julian day of year, and also for hour, year and dateHourdat3$Hour <-hour(dat3$DateTime)dat3$Year <-year(dat3$DateTime)#Make a unique date-hour variabledat3$DateHour <-paste(dat3$Year, month(dat3$DateTime), day(dat3$DateTime), dat3$Hour, sep ="-")
Data are available every 15 minutes in most cases, but here we group by taking the mean each hour.
Code
#Take the mean over each hour for each cameradat4 <- dat3 %>%group_by(DateHour, CameraID) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO), n =n())
Code
#Worth flagging cases with n>4 - means more than 4 counts per hour - maybe some# issue here - sems to happen in about 5% of casesind <- dat4$n >4#There are some where it's between 1 and 8 -- I guess this is where # there may have been 5 15-minute periods in an hour# hist(dat4$n[dat4$n > 0 & dat4$n < 12])#Perhaps some of the rest are duplicates -- needs checking at some point
There are 90247 records (i.e., camera-hours with counts). Note that some have more than 4 records, which shouldn’t happen if there are counts every 15 minutes. There are 2483 such records and this needs investigating at some point.
Here we sum over cameras, on the assumption that cameras do not catch the same seals. We then take only hours where all cameras were operating.
Code
#Sum over camerasdat5 <- dat4 %>%group_by(DateHour) %>%summarize(SealsHO =sum(SealsHO), n =n())#Take only hours where all cameras were operating#This assumes the maximum number of cameras operating is the expected number --# i.e., that cameras weren't added in different years (could group by year to check)max_n_cameras <-max(dat5$n)dat6 <- dat5[dat5$n == max_n_cameras, ]dat6$DateTime <-parse_date_time(dat6$DateHour, order ="ymdH")dat6$Hour <-hour(dat6$DateTime)dat6$Month <-month(dat6$DateTime)dat6$Year <-year(dat6$DateTime)
This results in 4929 hours of data in total.
2.2 Temporal patterns in haul-out counts
As initial exploration, we fit a GAM with hour of day as a smooth (assuming for now that the mean haul-out count is normally distributed).
Code
mod <-bam(SealsHO ~s(Hour), data = dat6)plot(mod, shift =mean(dat6$SealsHO), ylab ="Total hauled-out seals")
This suggests taking hours 8-12 as those have the maximum count. It could be that visibility is an issue at the beginning and end of the day. Also, note that there are fewer records (i.e., hours where there are counts from all cameras) from the beginning and end of the day - here is a tabulation of the number of counts per hour:
Code
knitr::kable(table(dat6$Hour), col.names =c("Hour of day", "n counts"))
Hour of day
n counts
6
80
7
349
8
446
9
450
10
456
11
457
12
457
13
458
14
459
15
461
16
459
17
352
18
45
We also fit a GAM to day of the survey season, where 0 is Julian day 200 (again, assuming a normal distribution on count, so predictions can go negative):
Code
#Get day of year dat6$DayOfYear <-yday(dat6$DateTime)#Transform so 1st day of surveying is day 0 - # may need to come back and check how it deals with leap yearsoffset <-min(dat6$DayOfYear[dat6$DayOfYear >200])dat6$DayOfSeason <-ifelse(dat6$DayOfYear >200, dat6$DayOfYear - offset, dat6$DayOfYear +366- offset)mod <-bam(SealsHO ~s(DayOfSeason), data = dat6)plot(mod, shift =mean(dat6$SealsHO))
2.3 Initial estimates of abundance
To obtain initial estimates of abundance for the NAVFAC meeting in April 2024, we used data from between 8am and 12pm inclusive (i.e., 5 hours), and haul-out probablities for 8, 10 and 12 in February and March 2022 for 5 seals read from Figure 12 of Ampella et al. (2023). Combing these gave an abundance estimate per hour; we then took the mean per month.
A plot per month per year is given below, with vertical lines showing log-normal confidence intervals.
Code
#Take just between 8 and 12 inclusive (5 hours)#Take mean per monthdat7 <- dat6[(dat6$Hour >=8) & (dat6$Hour <=12),]dat7 <- dat7 %>%group_by (Year, Month, DayOfYear) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO))dat7 <- dat7 %>%group_by (Year, Month) %>%summarize(.groups ="keep", sd =sd(SealsHO), SealsHO =mean(SealsHO), n =n())dat7$se <- dat7$sd /sqrt(dat7$n)dat7$cv <- dat7$se / dat7$SealsHOdat7$cv[is.na(dat7$cv)] <-0#Analysis of tag data#Haul-out probabilities for 8, 10, 12am in Feb and March 2022 for 5 seals,# taken from Fig 12 fo Ampella et al. 2023#Their data are more precise in time than ours, which is really just to the# nearest hour (although we could potentially do every 15 minutes)haul_out_p <-mean(c(0.67, 0.50, 0.30, 0.50, 0.32, 0.35))#Ampella reported seals most likely to haul out between 04:00 and 12:00dat8 <-data.frame(Year = dat7$Year, Month = dat7$Month,Nhat = dat7$SealsHO / haul_out_p,CV = dat7$cv)#Lognormal confidence intervals, from Buckland et al. 2021c.val <-exp(qnorm(0.975) *log(1+ dat8$CV^2)) dat8$LCL <- dat8$Nhat / c.valdat8$UCL <- dat8$Nhat * c.valdat8$Year_ind <- dat8$Year -2019+ifelse(dat8$Month >7, 1, 0)#Create information to make a plotmonth_labels <-c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")month_ind <-c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)year_color <-c("blue", "black", "red", "green")x_pos <- month_ind[dat8$Month] + (dat8$Year_ind -2) /10plot(x_pos, dat8$Nhat, xaxt ="n" ,col = year_color[dat8$Year_ind], ylim =range(c(dat8$UCL, dat8$LCL)),ylab ="Estimated abundance", xlab ="Month",pch =1) #20)axis(1, at =1:8, labels = month_labels)legend("topleft", legend =c("2019-20", "2020-21", "2021-22", "2022-23"),lty =1, pch =1, col = year_color)for (i in1:nrow(dat8)){lines(c(x_pos[i], x_pos[i]), c(dat8$LCL[i], dat8$UCL[i]), col = year_color[dat8$Year_ind[i]], lwd =2)if(i >1) {if(dat8$Year_ind[i] == dat8$Year_ind[i -1]){lines(c(x_pos[i -1], x_pos[i]), c(dat8$Nhat[i -1], dat8$Nhat[i]), lty =2,col = year_color[dat8$Year_ind[i]]) } }}
Note that this analysis treats each hour within each day as an independent replicate; an alternative would be to take daily means and then take the mean of these over each month. We also assume the haul-out probability only depends on time of day, and ignore covariance induced by using the same haul-out estimates in all days; we also ignore uncertainty in the haul-out estimates. However, the above is a useful initial estimate.
3 Haul-out probability from tag data
3.1 Reading in the tag data
Here we read in the tag data. This is the summary file, with one line for each tag:
Code
tag_dir <-"./Seal Tag Data 2018-2022"tag_summary <-read_excel(path =paste(tag_dir, "TagDataSummary.xlsx", sep ="/"))tag_summary$DeploymentStartDateTime <-parse_date_time(tag_summary$DeploymentStartDateTime, orders ="YmdHMS")tag_summary$DeploymentStopDateTime <-parse_date_time(tag_summary$DeploymentStopDateTime, orders ="YmdHMS")n_files <-nrow(tag_summary)knitr::kable(tag_summary)
TagID
TagModel
DeploymentStartDateTime
DeploymentStopDateTime
DeploymentLocation
DeploymentLatitude
DeploymentLongitude
166449
SPLASH10- 309A
2018-02-04
2018-06-29
Little Inlet, VA
37.18174
-75.84099
166450
SPOT-311A
2018-02-04
2018-05-23
Little Inlet, VA
37.18174
-75.84099
166451
SPOT-311A
2018-02-04
2018-05-06
Little Inlet, VA
37.18178
-75.84099
166452
SPOT-311A
2018-02-04
2018-05-26
Magothy Bay, VA
37.17702
-75.94940
166453
SPOT-311A
2018-02-06
2018-04-09
Magothy Bay, VA
37.17702
-75.94940
173502
SPOT-311A
2018-02-06
2018-06-22
Magothy Bay, VA
37.17702
-75.94940
173503
SPOT-311A
2018-02-08
2018-04-26
Little Inlet, VA
37.18198
-75.84065
177411
SPLASH10-F 297A
2020-02-26
2020-07-12
Smith Island, VA
37.11751
-75.84932
177410
SPLASH10-F 297A
2020-03-02
2020-06-10
Mink Island Bay, VA
37.17762
-75.84961
178255
SPLASH10-F 297A
2022-02-07
2022-06-09
Little Inlet, VA
37.18174
-75.84099
178256
SPLASH10-F 297A
2022-02-08
2022-06-17
Little Inlet, VA
37.18174
-72.84099
178257
SPLASH10-F 297A
2022-02-09
2022-07-18
Little Inlet, VA
37.18174
-75.84099
177412
SPLASH10-F 297A
2022-02-15
2022-05-25
Little Inlet, VA
37.18174
-75.84099
178258
SPLASH10-F 297A
2022-02-15
2022-06-04
Little Inlet, VA
37.18174
-75.84099
Below we read in data for each tag in turn. We read in the tag location file and, for each tag plot the lat long locations, joining them with a black line. Some locations have low accuracy, so we use a simple filter which is just to use locations with quality codes 3, 2, 1 and A (i.e., discarding B and Z) – lines joining these are in dashed red. (We also discard the first day of data as a way to partially avoid “tag-on” issues.) We also show the Northern and Southern border of Virigina as blue horizontal dashed lines. None of the seals spend any significant time south of Virginia, but almost all migrate North at some point in the season. Since we are interested in haul-out probability just while in Virginia waters, we truncate the data at the point the seals move outside Virginia – this is shown with a cricle on the plots.
Code
#North and South latitude boundaries of Virginiavirginia_lat <-c(38.027038704500775, 36.550603468154065)#Read in the histo filescolnames <-c("DeployID", "Date", paste0("Bin", 1:24))for(i in1:n_files){ histo_filename <-paste(tag_dir, tag_summary$TagID[i], paste(tag_summary$TagID[i], "Histos.csv", sep ="-"),sep ="/") histo_file <-read.csv(histo_filename) ind <- histo_file$HistType %in%c("1Percent", "Percent")#Save just percent rows, and just the deployID, date and hour columns histo_file <- histo_file[ind, colnames]#Make sure they are all numericfor(j in1:24){ histo_file[, colnames[j +2]] <-as.numeric(histo_file[, colnames[j +2]]) } histo_file$Date <-parse_date_time(histo_file$Date, order ="HMS dbY")#Remove records from before and including the day of deployment and# after and including the day the tag stopped ind <- (histo_file$Date >= tag_summary$DeploymentStartDateTime[i] +days(1)) & (histo_file$Date <= tag_summary$DeploymentStopDateTime[i] -days(1)) histo_file <- histo_file[ind, ]#Remove records from the tag after leaving Virginia waters#Read in the locations location_filename <-paste(tag_dir, tag_summary$TagID[i], paste(tag_summary$TagID[i], "Locations.csv", sep ="-"),sep ="/") location_file <-read.csv(location_filename)#Keep only dates after deployment and before the tag stopped location_file$Date <-parse_date_time(location_file$Date, order ="HMS dbY") ind <- (location_file$Date >= tag_summary$DeploymentStartDateTime[i] +days(1)) & (location_file$Date <= tag_summary$DeploymentStopDateTime[i] -days(1)) location_file <- location_file[ind, ]#Plot these dataplot(location_file$Longitude, location_file$Latitude, main =paste("Tag", tag_summary$TagID[i]),ylim =range(c(virginia_lat, location_file$Latitude)), ylab ="Latitude",xlab ="Longitude", type ="l")#Add lines showing Virginia N and Sabline(h = virginia_lat[1], lty =2, col ="blue")abline(h = virginia_lat[2], lty =2, col ="blue")#Keep only good quality locations - those with an error radius of < 10km ind <- location_file$Quality %in%c("3", "2", "1", "A") #location_file$Error.radius <= 10000 location_file <- location_file[ind, ]#Add to plotlines(location_file$Longitude, location_file$Latitude, col ="red", lty =2)#Find the first that is north of virginia_latitude (if any) ind <- location_file$Latitude > virginia_lat[1]if(any(ind)){ first <-which.max(ind)#Add a point record to show where this ispoints(location_file$Longitude[first], location_file$Latitude[first])#remove histo records from this day or later ind <- histo_file$Date < location_file$Date[first] histo_file <- histo_file[ind, ] }#Pivot into tag, date, hour format tmp <- histo_file %>%pivot_longer(cols =starts_with("Bin"), names_to ="hour", values_to ="perc")#Add to overall frameif(i ==1){ histo <- tmp } else { histo <-rbind(histo, tmp) }}
Code
#Do some post-processing#There is one NA in perc so remove that recordhisto <- histo[!is.na(histo$perc), ]#Add columns that will be usefulhisto$hour <-as.numeric(substr(histo$hour, 4, 6)) -1histo$month <-month(histo$Date)histo$DeployID <-as.factor(histo$DeployID)histo$yday <-yday(histo$Date)
We also read in the histogram files for each seal, which summarizes the proportion of each hour that the tag was dry, using the wet-dry sensor. The table below has one row for each tag. It shows the mean percentage dry, the start day of year of the record, the end day (while still in Virginia). Also shown is the number of days between start and end day, and the number of days that have records.
As can be seen, there are many missing days of data. This is potentially problematic if the satellite tags are less likely to transmit on days when the seal spends more time in the water – this will cause a strong overestimate in the haul-out estiamte.
With this caveat, we proceed with the analysis. Below is a histogram of the haul-out percentages across all seals. Some heaping is observed at units of 10, which indicates that some tags were programmed to record only low-resolution data (which rounds to the nearest 10 plus 3 and 98). This could be investigated some more. Overall, hours are mostly fully dry or below about 25 percent dry.
Code
hist(histo$perc, breaks =seq(-0.5, 100.5, by =1), main ="Percentage haulout per hour",xlab ="Percentage")
Ampela et al. (2023) assumed a haul-out occured in an hour only if the tag was 100 percent dry for that hour, but here we take 50 percent as a cutoff. As can be seen from the above, results will not be very sensitive to this and indeed any number between around 30 and 100 could be used. We chose 50 percent as this presumably balances the risk of under- and over-estimation of haul-out during the hour (recalling that the count data is taken every 15 minutes).
Below we fit a binary GAM, modelling whether hauled out or not as a function of hour of the day (as a cyclic smooth), day of the year and also a random effect for animal. Note that hour of day here is in UTC, so we need to subtract 5 to arrive at local time (depending on local daylight saving this may be 4 at some times of year).
Note also that this analysis could be extended to allow autocorrelation in haulout probability, by examining correlation in residuals and, if suitable, using the order 1 correlation as the rho parameter in the bam call (and also setting an AR.start for each tag record, accounting potentially for missing days also).
Code
#Count any hour with >50% dry as a haul-outhisto$haul_out <- histo$perc >50#Specify that 0 and 24 are the same for hour by giving end knots (otherwise it joins 0 and 23)mod <-bam(haul_out ~s(hour, bs ="cc") +s(yday) +s(DeployID, bs ="re"), knots =list(hour =c(0, 24)), data = histo, family = binomial)summary(mod)
For hour, there is a peak at 13:00 UTC, which coincides with that 8am local time peak noted by Ampela et al. (2023). We found the low point to be 03:00 UTC, i.e., 10pm local time while Ampela et al. for 2022 data found it to be 4pm local time. It would be of interest to investiate potential annual differences, although there is strong confounding between seal ID and year, so it likely is not possible to include both.
Note that there are few records from later in the year (given the truncation that only records from while the seals were in Virginia were used):
Code
hist(histo$yday, main ="Number of hours of wet-dry data per day", xlab ="Day of year", breaks =seq(min(histo$yday) -0.5, max(histo$yday) +0.5, by =1))
The pattern in the above GAM with year is hard to interpret, and it would be of interest to include variables such as moon phase, for example. In the absence of this we constrain the GAM to have fewer degrees of freedom for that smooth, to make a more interpretable pattern:
Code
mod2 <-bam(haul_out ~s(hour, bs ="cc") +s(yday, k =3) +s(DeployID, bs ="re"), knots =list(hour =c(0, 24)), data = histo, family = binomial)summary(mod2)
We now see a general decline in haul-out probablity over the year.
Here we make predictions of haul-out probablity by month and time of day. We only have data for February, March and April so we only make predictions over those months. For April (starting on day-of-year 91) we have rather few data points, so we do not attempt that prediction. (In future versions of this analysis we could perhaps predict by day rather than aggregating across months.)
Code
newdata <-data.frame(hour =0, yday =c(1:28+31, 1:31+ (31+28)),month =c(rep(2, 28), rep(3, 31)))pred_haul_out <-matrix(0, 24, 2)for(i in1:24){ newdata$hour <- i -1#Set random effect to 0 in prediction tmp <-predict(mod2, newdata = newdata, type ="response", exclude ="s(DeployID)", newdata.guaranteed =TRUE) tmp2 <-aggregate(tmp, by =list(newdata$month), mean) pred_haul_out[i, ] <- tmp2$x}plot(0:23, pred_haul_out[, 1], ylim =range(pred_haul_out), xlab ="hour (UTC)",ylab ="p(haulout)", type ="l")lines(0:23, pred_haul_out[, 2], col ="red")legend("topleft", legend =c("Feb", "March"), lty =c(1, 1), col =c("black", "red"))
Plotting the same in local time:
Code
lhour_cols <-c(6:24, 1:5)plot(0:23, pred_haul_out[lhour_cols, 1], ylim =range(pred_haul_out), xlab ="hour (local)",ylab ="p(haulout)", type ="l")lines(0:23, pred_haul_out[lhour_cols, 2], col ="red")legend("topright", legend =c("Feb", "March"), lty =c(1, 1), col =c("black", "red"))
Note that there is an alternative source of information about haul-outs available for the SPLASH-10 tags deployed since 2020 – these have a haul-out file that records the start and end time of haulouts, and also gives an indication of any missing haulouts (haulout ID cycles from 0 to 15 so missing numbers mean missing haulouts). Results from this could potentially be compared with the above, at least for the 7 SPASH-10 tags used.
4 Combined analysis
Here we use data from 8am to 4pm local time (before and after that the amount of available data drops off, possibly because of visibility issues). We also only use data for February and March as that is all we predict haul-out probability for (in a future analysis we should consider predicting by day rather than month, and could therefore use more of the available data).
We make no attempt at variance estimation here - we will want to build in uncertainty from the haul-out analysis, and also determine what an appropriate sampling unit is for the counts.
Code
#Take just between 8 and 16 inclusive (9 hours)#Take mean per monthcdat <- dat6[(dat6$Hour >=8) & (dat6$Hour <=16),]cdat <- cdat[(cdat$Month %in%c(2, 3)), ]cdat <- cdat %>%group_by (Year, Month, Hour) %>%summarize(.groups ="keep", SealsHO =mean(SealsHO))#Make predicted haul out data frame for joiningcpred_haul_out <-data.frame(Hour =c(0:23, 0:23), Month =c(rep(2, 24), rep(3, 24)),haul_out_p =c(pred_haul_out[lhour_cols, 1], pred_haul_out[lhour_cols, 2]))cdat2 <-left_join(cdat, cpred_haul_out, by =c("Month", "Hour"))cdat2$Nhat <- cdat2$SealsHO / cdat2$haul_out_p#Combine across hours to produce similar plot to previous onecdat3 <- cdat2 %>%group_by(Year, Month) %>%summarize(.groups ="keep", Nhat =mean(Nhat))cdat3$Year_ind <- cdat3$Year -2019+ifelse(cdat3$Month >7, 1, 0)#Create information to make a plotmonth_labels <-c("Oct", "Nov", "Dec", "Jan", "Feb", "Mar", "Apr", "May")month_ind <-c(4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3)year_color <-c("blue", "black", "red", "green")x_pos <- month_ind[cdat3$Month] #+ (cdat3$Year_ind - 2) / 10plot(x_pos, cdat3$Nhat, xaxt ="n" ,col = year_color[cdat3$Year_ind],ylab ="Estimated abundance", xlab ="Month",pch =1) #20)axis(1, at =1:8, labels = month_labels)legend("topright", legend =c("2019-20", "2020-21", "2021-22", "2022-23"),lty =1, pch =1, col = year_color)for (i in1:nrow(cdat3)){if(i >1) {if(cdat3$Year_ind[i] == cdat3$Year_ind[i -1]){lines(c(x_pos[i -1], x_pos[i]), c(cdat3$Nhat[i -1], cdat3$Nhat[i]), lty =2,col = year_color[cdat3$Year_ind[i]]) } }}
Code
#Take mean across years and months to see if there's an hourly pattern#Needs more thinking before implemented however#cdat4 <- cdat2 %>%# group_by(Hour) %>%# summarize(.groups = "keep", Nhat = mean(Nhat))#plot(cdat4$Hour, cdat4$Nhat, type = "l")
Estimates are generally a little higher than the previous exploratory analysis. Correcting for decreased haul-out probability in March compared with February has brought the latter numbers up a somewhat.
One check that needs to be made is to see whether estimated populaton abundance changes over hour of day – this would imply that the hourly haul-out correction has not been successful.
5 Discussion
There are a number of oustanding items to address with this analysis.
For the haul-out counts
Models could include additional covariates such as tide - although this is not needed here and is the subject of Michelle’s thesis
Variance estimation needs addressed
The temporal unit could be day rather than month
For the tag analysis
Other covariates could be incorporated into the analysis
Autocorrelation could be accounted for
Variance estimation needs addressed
The haulout file could be mined for the SPLASH-10 tags, and results compared with haul-out probabilities from the hourly data.
For the combined analysis
If the hauled-out probability by hour is estimated well then there should be no hourly pattern in the corrected count data; this can be checked.
Assumptions will need to be discussed:
Hourly tag data gaps are missing at random with respect to haulout time
Cameras cover all haulouts
Counts from cameras have no false positives or negative
6 Literature cited
Ampela, K., J. Bort, R. DiGiovanni, Jr., A. Deperte, D. Jones, and D. Rees. 2023. Seal Tagging and Tracking in Virginia: 2018-2022. Prepared for U.S. Fleet Forces Command. Submitted to Naval Facilities Engineering Systems Command Atlantic, Norfolk, Virginia, under Contract No. N62470-15-8006, Task Order 19F4147, issued to HDR, Inc., Virginia Beach, Virginia. March 2023.
Guins, M., Rees, D., and A. Lay. 2023. Pinniped Time-lapse Camera Surveys in Southern Chesapeake Bay and Eastern Shore, Virginia: 2019-2023. Final Report. Prepared for U.S. Fleet Forces Command, Norfolk, Virginia. July 2023.
Sharples, R.J., Mackenzie, M.L. and Hammond, P.S. 2009. Estimating seasonal abundance of a central place forager using counts and telemetry data. Marine Ecology Progress Series 378: 289–298.