CALCOFI line transect analysis

Author

Len Thomas

Published

December 11, 2024

1 Introduction

This is a working document for the analysis of marine mammal visual line transect data from the CALCOFI surveys 2004-2023. Michaela Alksne provided the data and is helping troubleshoot. The finalized results are due to be used in two papers:

Temporal trends, led by Simone Bauman-Pickering. This is in some ways an update of Campbell et al. (2015), which looked at 2004-2013.
Relationship between survey-level density estimates and microbial + small plankton eDNA, led by Trevor Ruiz.

Note

Notes like are used throughout the text where there are questions for Michaela and Simone.

2 Effort data

The following code reads in the effort data file and creates two tables: - sample_table, containing one line per transect per cruise - region_table, containing one line per cruise (with Area = 1 otherwise later code complains)

Note, we are currently treating the transect as the sample unit, but we may change to treat a day of effort as the sample unit as was done by Campbell et al. (2015).

#Read in the effort data
effort <- read_excel("./fromMichaela/CalCOFI_2021_2023_Effort_OnTransectOnEffortONLY.xlsx", 
                     guess_max = 7000, na = c("", "NA"))

#Prepare tables for Distance analysis
sample_table <- effort %>% 
  rename(year = Year, season = Season, Region.Label = Cruise, transect = Transect, 
         segment_length = SegmentLength_km_) %>%
  group_by(Sample.Label = paste(Region.Label, transect, sep = "-")) %>%
  summarise(Effort = sum(segment_length), year = first(year), season = first(season),
            Region.Label = first(Region.Label), transect = first(transect))
region_table <- sample_table %>%
  group_by(Region.Label) %>%
  summarise(Region.Label = first(Region.Label), year = first(year), season = first(season), k = n(), L = sum(Effort)) %>%
  mutate(Area = 1)

Here is a copy of the region_table, i.e., with one line per cruise. \(k\) is the number of samples (transects) and \(L\) is the total line length.

region_table %>% kbl() %>% kable_classic(full_width = FALSE)

Region.Label	year	season	k	L	Area
CC2004-01	2004	winter	1	0.0000	1
CC2004-04	2004	spring	1	0.0000	1
CC2004-07	2004	summer	33	1446.8834	1
CC2004-11	2004	fall	23	931.6503	1
CC2005-01	2005	winter	26	903.8109	1
CC2005-04	2005	spring	34	1278.9707	1
CC2005-07	2005	summer	33	1304.0305	1
CC2005-11	2005	fall	27	927.4696	1
CC2006-02	2006	winter	25	1044.4506	1
CC2006-04	2006	spring	31	1383.5765	1
CC2006-07	2006	summer	33	1009.4270	1
CC2006-10	2006	fall	25	1028.8196	1
CC2007-01	2007	winter	25	949.0365	1
CC2007-04	2007	spring	22	768.5812	1
CC2007-07	2007	summer	34	1005.2511	1
CC2007-11	2007	fall	23	640.6153	1
CC2008-01	2008	winter	23	790.1258	1
CC2008-03	2008	spring	21	499.1738	1
CC2008-08	2008	summer	33	1061.9797	1
CC2008-10	2008	fall	26	907.0628	1
CC2009-01	2009	winter	32	980.7786	1
CC2009-03	2009	spring	26	785.0440	1
CC2009-07	2009	summer	37	1287.6075	1
CC2009-11	2009	fall	29	1037.7810	1
CC2010-01	2010	winter	29	875.1260	1
CC2010-04	2010	spring	1	0.0000	1
CC2010-08	2010	summer	28	1086.1751	1
CC2010-11	2010	fall	23	774.4909	1
CC2011-01	2011	winter	31	1052.1534	1
CC2011-04	2004	spring	30	1238.2236	1
CC2011-08	2011	summer	30	1341.8873	1
CC2011-10	2011	fall	24	1044.0281	1
CC2012-02	2012	winter	15	453.5515	1
CC2012-03	2012	spring	17	782.0651	1
CC2012-07	2012	summer	31	1480.9625	1
CC2012-10	2012	fall	24	950.4777	1
CC2013-01	2013	winter	23	848.6145	1
CC2013-04	2013	spring	18	590.5558	1
CC2013-07	2013	summer	30	1326.3615	1
CC2013-11	2013	fall	27	926.1534	1
CC2014-02	2014	winter	13	457.5277	1
CC2014-04	2014	spring	22	827.0975	1
CC2014-07	2014	summer	36	1256.0396	1
CC2014-11	2014	fall	30	1156.5048	1
CC2015-01	2015	winter	30	1121.4828	1
CC2015-04	2015	spring	25	1133.8400	1
CC2015-07	2015	summer	29	1297.2557	1
CC2015-11	2015	fall	27	1067.4050	1
CC2016-01	2016	winter	26	875.7938	1
CC2016-04	2016	spring	26	1026.1933	1
CC2016-07	2016	summer	31	1154.7612	1
CC2016-11	2016	fall	22	849.3831	1
CC2017-01	2017	winter	28	1027.7116	1
CC2017-04	2017	spring	32	1265.2516	1
CC2017-08	2017	summer	29	1284.8930	1
CC2017-11	2017	fall	22	946.4949	1
CC2018-02	2018	winter	18	631.6046	1
CC2018-04	2018	spring	25	1109.4123	1
CC2018-06	2018	summer	32	1198.7060	1
CC2018-10	2018	fall	24	1082.9915	1
CC2019-02	2019	winter	11	370.8165	1
CC2019-04	2019	spring	20	920.8476	1
CC2019-07	2019	summer	27	1251.9319	1
CC2019-11	2019	fall	24	1072.9128	1
CC2020-01	2020	winter	25	960.1047	1
CC2020-04	2020	spring	1	0.0000	1
CC2020-08	2020	summer	1	0.0000	1
CC2020-11	2020	fall	1	0.0000	1
CC2021-01	2021	winter	1	0.0000	1
CC2021-04	2021	spring	1	0.0000	1
CC2021-07	2021	summer	26	1258.6472	1
CC2021-11	2021	fall	19	826.4556	1
CC2204SH	2022	spring	16	576.5156	1
CC2208	2022	summer	47	790.1335	1
CC2211SR	2022	fall	19	705.2526	1
CC2301RL	2023	winter	13	594.6193	1
CC2303SH	2023	spring	20	801.7941	1
CC2311SR	2023	fall	21	929.0986	1

Two things are apparent:

some cruises have 0 survey effort and only 1 transect.
the protocol for naming cruises changed in spring 2022. Before that they had a 4 digit year, then a dash then the month the survey started; after that they have a 2-digit year and then 2 letters afterwards except for summer. No big deal I think, just thought it’s worth noting.

I will delete the cruises with 0 survey effort.

no_effort <- region_table[region_table$L == 0, ]$Region.Label
region_table <- region_table %>% filter(!(Region.Label %in% no_effort))
sample_table <- sample_table %>% filter(!(Region.Label %in% no_effort))

Note

Is it expected that these cruises have 0 survey effort? Is it okay to delete them?

3 Sightings data

In this section we read in the sightings data; we create and check the columns we need for distance sampling detection function analysis.

#Read in sightings data
#Increase guess_max to more than number of rows so it gets the column types right
sightings <- read_excel("./fromMichaela/CalCOFI_2004_2023_CombinedSightings_add_weather.xlsx", 
                        guess_max = 7000, na = c("", "NA"))

The following variables will be needed for the distance sampling analysis:

perpendicular distance – see Section 3.1
whether on effort and on transect - see Section 3.2
transect (i.e., the transect the sighting was made on) - see Section 3.3
species - will use the Species1 column - see Section 3.4
school size - will use the best column - see Section 3.5
other potential covariates for the detection function - see Section 3.6

3.1 Perpendicular distances

Perpendicular distances are required for all observations. These were not supplied for 2022 and 2023 cruises, so will be calculated below. Before this, we check for the records before 2022 that there are perpendular distances for each record:

#Create an indicator for whether the record is 2022 or later
sightings$Year <- year(sightings$Date_Time_Local_)
ind <- sightings$Year >= 2022

#Count the number of records from before 2022 that have missing perpendicular distances
ind2 <- !ind & is.na(sightings$perp_dist_m)
sum(ind2)

[1] 0

There are 0 records with missing perpendicular distances prior to 2022. So no problem there.

Now to obtaining perpendicular distances for 2022 and 2023 cruises. We use the binocular reticle readings (field BinoReticle), observer height and earth radius to calculate radial along-water distance (using formulae in Lerczak and Hobbs 1997). In some cases, the field distance_m was filled, which we take to mean that a radial distance was estimated directly (by eye) so we use this in place of any reticle measurement (in the cases we checked the reticle field was 0).

We then use ships bearing (ShipHeadingTru) and observaton bearing (SightingBearingTru) to get the observation angle and hence perpendicular distance.

#Warnings suppressed - they arise in cases where there is no reticle observation or 
# sighting bearing (which can be the case when a direct radial distance is recorded)

#Get BinoReticle to radians conversion
sightings$BinoReticle <- as.numeric(sightings$BinoReticle)
sightings$ret_radians <- as.numeric(sightings$ret_radians)
ret_to_rad <- sightings$ret_radians[1] / sightings$BinoReticle[1]
sightings$ret_radians2 <- sightings$BinoReticle * ret_to_rad

radius_earth <- sightings$radius_earth[1]

#The following formulae are from Lerczak and Hobbs (1997)
#Get angle from horizontal to horizon
sightings$alpha <- atan(sqrt(2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2) / radius_earth)
#Get angle from marine mammal to platform
sightings$beta <- pi / 2 - sightings$alpha - sightings$ret_radians2
#Get distance along earth's surface to horizon
sightings$H <- sightings$alpha * radius_earth
#Get line-of-sight distance to marine mammal
sightings$D0 <- (radius_earth + sightings$ObsHeight_m_) * cos(sightings$beta) - 
  sqrt((radius_earth + sightings$ObsHeight_m_) ^ 2 * (cos(sightings$beta)) ^ 2 - 
       (2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2))
#Get central arc from marine mammal to platform
sightings$delta <- asin(sin(sightings$beta) * sightings$D0 / radius_earth)
#Get distance to marine mammal along the surface of the ocean
sightings$D <- sightings$delta * radius_earth

#If Distance_m_ column is filled in, use that instead
sightings$D2 <- sightings$D
ind2 <- sightings$Distance_m_ > 0
ind2[is.na(ind2)] <- FALSE
sightings$D2[ind2] <- sightings$Distance_m_[ind2]

#Get horizontal angle (in degrees) from trackline
tmp <- sightings$ShipHeadingTru - as.numeric(sightings$SightingBearingTru)
#Trim clearly wrong values
tmp[abs(tmp) > 360] <- NA
#Transform into the 0-90 degree quadrat (will be useful later for looking at forward distances)
sightings$angle2 <- asin(abs(sin(tmp/360 * (2 * pi)))) * 360 / (2 * pi)

#Convert to perpendicular distance
sightings$perp_dist_m2 <- sightings$D2 * abs(sin(sightings$angle2 / 360 * 2 * pi))

One check that can be performed on the above is to check the calculated radial distances using Lerczak and Hobbs against those given in the sightings spreadsheet for sightings earlier than 2022. Here is a histogram of the distance differences:

ind3 <- sightings$dist_calc_m_ > 0
hist(sightings$D[ind3] - sightings$dist_calc_m_[ind3], xlab = "Difference in calculated radial distances (m)", main = "Difference between spreadsheet and Lerczak & Hobbs", breaks = 100)

As can be seen, in the great majority of cases the difference is very small - but there are some larger differences.

Note

Worth investigating further?

One other thing to note is that there are 443 records that don’t have a valid number in either the ShipHeadingTru or the SightingBearingTru column of the sightings spreadsheet.

Note

Worth investigating further?

We now check whether any 2022/2023 data still have missing distances:

#Calculate how many missing distances we have from 2022 onwards
ind3 <- ind & is.na(sightings$perp_dist_m2)
tmp <- sightings[ind3, ]

There are 19 of these.

Note

I checked into the causes and emailed Michaela about them on 29th Nov.

For now I will drop these records …

sightings <- sightings[!ind3, ]

… and copy over the newly calculated perp distances for 2022 onwards into the perp_dist_m field used by all the other records.

#Copy the perp_dist_m2 for these records to perp_dist_m
ind <- sightings$Year >= 2022
sightings$perp_dist_m[ind] <- sightings$perp_dist_m2[ind]

3.2 Getting on-effort on-transect data

We wish to select only sightings data from when observers were on effort and the vessel was on transect. There are various potentially relevant fields here, so we should check which to use.

Note

Check with Michaela and Simone which effort fields to use.

Three relevant-looking fiels are AdjustedBothOnEffortAndOnTransect, OriginalEffortFlag_ON_OFF_ and OriginalTransect_ON_OFF_. However, the former has some NA records, so it seems prudent to use a combination of the latter two.

#Check for NAs
sum(is.na(sightings$AdjustedBothOnEffortAndOnTransect))

[1] 58

sum(is.na(sightings$OriginalEffortFlag_ON_OFF_))

[1] 0

sum(is.na(sightings$OriginalTransect_ON_OFF_))

[1] 0

#Turn to upper case as currently it's a mixture of both
sightings$AdjustedBothOnEffortAndOnTransect <- toupper(sightings$AdjustedBothOnEffortAndOnTransect)
sightings$OriginalEffortFlag_ON_OFF_ <- toupper(sightings$OriginalEffortFlag_ON_OFF_)
sightings$OriginalTransect_ON_OFF_ <- toupper(sightings$OriginalTransect_ON_OFF_)

#Create logical variables
sightings$AdjustedOnEffortAndOnTransect <- sightings$AdjustedBothOnEffortAndOnTransect == "ON"
sightings$OriginalOnEffort <- sightings$OriginalEffortFlag_ON_OFF_ == "ON"
sightings$OriginalOnTransect <- sightings$OriginalTransect_ON_OFF_ == "ON"

#Select data where OnEffort and OnTransect are ON
ind <- sightings$OriginalOnEffort & sightings$OriginalOnTransect

Of the 6287 sighting records, the number made when both on effort and on transect according to the above criteria are 3388. We select just these data for futher processing.

sightings <- sightings[ind, ]

Note

Check with Michaela and Simone whether this level of data “loss” is okay. We could potentially use off transect observations for fitting the detection function if the crew were on effort – but this risks compromizing pooling robustness. Need to have a conversation about this.

3.3 Getting transect numbers for each sighting

Transect numbers were not provided with the sightings, but we can obtain them if we compare the time of the sighting with the start and end times of the effort segments. The sightings spreadsheet has a column Date_Time_Local_ and the effort spreadsheet has columns StartDateTimeLocal and EndDateTimeLocal.

The following code does this, and also collates any sightings that fall outside the effort segment times. (It also creates two new fields Region.Label and Sample.Label that will be used to join the effort data to these data in the Distance analysis later. Region.Label is simply the Cruise column from the effort spreadsheet – we have to copy it over because although Cruise is recorded in the sightings spreadsheet it is formatted differently. Sample.Label is the cruise code, a dash and then the transect number.)

n <- nrow(sightings)
first_reject <- TRUE; rejects <- NULL
sightings <- sightings %>% mutate(Region.Label = "", Sample.Label = "")
for (i in 1:n){
  #Find records in effort file where start time is less than obs time and 
  # end time is more than obs time
  ind <- (effort$StartDateTimeLocal <= sightings$Date_Time_Local_[i]) & 
    (effort$EndDateTimeLocal >= sightings$Date_Time_Local_[i])
  which_ind <- which(ind)
  if(length(which_ind)==1){
    #Copy cruise from effort table - it's formatted differently in the sightings
    sightings$Region.Label[i] <- effort$Cruise[which_ind]
    #Create Sample.Label
    sightings$Sample.Label[i] <- paste(effort$Cruise[which_ind], effort$Transect[which_ind], sep = "-")
  } else{
    if(first_reject) {
      rejects <- sightings[i, ]
      first_reject <- FALSE
    } else {
      rejects <- rbind(rejects, sightings[i, ])
    }
  }
}

There are 537 sightings that fall outside of effort segment times. This is much more than I would expect, so I suspect that the on-effort-on-transect designation of the previous section has not worked as intended. On investigation, I also found a few bugs in the sightings Date_Time_Local_ field.

Note

I emailed Michaela on 30th Nov for suggestions. Will need to clear up before proceeding with a final analysis.

For now I will remove all of the records that fall outside of effort segment times.

ind <- (sightings$Sample.Label == "")
sightings <- sightings[!ind, ]

3.4 Species

Here is a printout of the sample size (of groups) in the sightings data, by species. The table doesn’t look fancy as the standard table formatting (using kbl) didn’t work with some of the characters in the cetacean codes CSV file (there are some odd line feed characters). Note also some codes are missing - but the general idea is there.

#Get species names
sp_names <- read.csv("./fromMichaela/CalCOFI_cetacean_codes.csv")

#Tabulate sightings and add in species names
sightings$Species1 <- toupper(sightings$Species1)
tmp <- tibble(data.frame(sort(table(sightings$Species1), decreasing = TRUE))) %>%
  rename(Code = Var1) %>% 
  left_join(sp_names, by = join_by(Code == CalCOFI.NOAA.Species.Abbreviation))
print(tmp, n = Inf)

# A tibble: 31 × 3
   Code    Freq Common.Name                 
   <chr>  <int> <chr>                       
 1 DD       544 Short-beaked common dolphin 
 2 ULW      536 Unidentified Large Whale    
 3 DSP      310 Unidentified common dolphin 
 4 BP       248 Fin Whale                   
 5 MN       211 Humpback Whale              
 6 UD       167 Unidentified Dolphin        
 7 BM       159 Blue Whale                  
 8 DC        90 Long-beaked common dolphin  
 9 PD        86 Dall's porpoise             
10 LO        77 Pacific white-sided dolphin 
11 ER        71 Gray Whale                  
12 GG        66 Risso's dolphin             
13 TT        63 Bottlenose dolphin          
14 UC        56 Unidentified cetacean       
15 PM        47 Sperm Whale                 
16 BA        24 Minke Whale                 
17 LB        22 Northern right-whale dolphin
18 OTH       19 <NA>                        
19 OO        12 Killer Whale                
20 USC       11 Unidentified�small cetacean 
21 SC         9 Striped dolphin             
22 ZICA       9 Cuvier's beaked whale       
23 BBO        2 <NA>                        
24 GM         2 Short-finned pilot whale    
25 UBW        2 <NA>                        
26 USW        2 Unidentified�small whale    
27 BB         1 Sei Whale                   
28 BEBA       1 �Baird's beaked whale�      
29 PC         1 False Killer whale          
30 UNZIPH     1 Unidentified ziphid         
31 UO         1 Unidentified odontocete

For the purposes of the analyses undertaken here, we may be interested only in blue, fin and humpback whale; for the trends paper we will likely be interested in more species – but will want some sample size cutoff. Note also we will need to determine how to deal with the unidentified whale sightings (there are many, because the survey was undertaken in passing mode, with relatively inexperienced observers).

3.5 School size

In this section, we check a school size (column Best) is present for all sightings.

ind <- (is.na(sightings$Best))

There are 1 records with missing school size. These are removed in the code below.

Note

I checked with Michaela and she could not locate the school size so OK to drop.

sightings <- sightings[!ind, ]

Note that there are some very large school sizes, and we will need to return to examine these before a sensible analysis can be done on the species involved.

summary(sightings$Best)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    4.00   39.23   22.00 4600.00

Here is a summary of the number of records of species with school sizes more than 100:

which_ind <- which(sightings$Best > 100)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)


 DD DSP  DC  UD  LB  GG  TT  LO OTH  SC 
103  50  37  18   3   2   2   1   1   1

and more than 1000:

which_ind <- which(sightings$Best > 1000)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)


 DD  DC  UD DSP 
  5   4   2   1

These are all one of the two species of common dolphin, or UID dolphin.

3.6 Other covariates for the detection function

In Campbell et al. (2015), they used: “Beaufort sea state (numerical: 0–5), ship (categorical), swell (numerical: wave height in m), school size (numerical), and, for short-beaked common dolphins, school size class (categorical).” We have already looked at school size, but we consider the others (and some alternatives) below.

In addition, it was suggested that observer would be a potentially relevant covariate, but there are too many for this to be possible; this is also true for cruise. Another possible covariate not considered by Campbell et al. is season. It could be argued that some of the sightability variation here would be captured by sea state and swell – but there are potentially animal behavioural (and other) factors that could be included with season so it seems worth adding.

Cue is also recorded as a variable, and could be useful - but there are many missing values so we do not pursue it here.

sum(is.na(sightings$Cue))

[1] 596

There are also a suite of observational variables such as Quality, Visiblity, Precipitation, Could, Glare - but these all have many missing values.

3.6.1 Sea state and swell

For sea state and swell, in an initial round of exploratory analysis we found many missing records. On investigation, Michaela found there were not contemporaneous records for these, and filled in the values with the closest (in time) available record.

Note

Note to self - check if these are both Beaufort and swell or just Beaufort that were filled in by Michaela.

Check there are no missing records:

#Look at beaufort and swell
sum(is.na(sightings$Bft))

[1] 0

sum(is.na(sightings$Swell_ft_))

[1] 0

We were advised to consider truncating records too far away in time. Here is a histogram of the time delays (note that only records that were filled in are shown here, as the others have a Time_Diff_minutes of NA).

hist(sightings$Time_Diff_minutes/60, main = "Time delay between sighting and sea conditions", xlab = "Time delay (hours)", breaks = 100)

There are a small number that are very large and will clearly need truncated – here is a histogram with these removed.

ind <- abs(sightings$Time_Diff_minutes) <= 6000
hist(sightings$Time_Diff_minutes[ind]/60, main = "Time delay between sightings and sea conditions (truncated)", xlab = "Time delay (hours)", breaks = 100)

Following Michela’s suggestion, we will truncate any records with a time delay of more than 8 hours.

ind <- abs(sightings$Time_Diff_minutes) > 8 * 60
ind[is.na(ind)] <- FALSE

This will result in truncating 66 out of 2850 remaining records. Note that some of these are sightings with small perpendicular distances, so it is a shame to truncate them as they will affect the analysis. However, it seems the best choice in the circumstances.

sightings <- sightings[!ind, ]

There are the frequency of the different Beaufort values – note this is in the sightings database, so it does not reflect the proportion of time surveying in the different conditions if there are fewer sightings in higher Beauforts (which may reflect that there was little surveying done in such Beauforts, and/or that animals are hard to sight in those conditions):

table(sightings$Bft)


  0   1   2   3   4   5   6 
 13 220 698 824 544 480   5

and here is a histogram of swell height

hist(sightings$Swell_ft_, xlab = "Swell height (ft)", main = "Swell height in sightings records")

One thing to check about Beaufort and swell is whether they are strongly correlated – if so it may not be wise to try them both as covariates in the detection function modelling. However (and perhaps surprisingly), the correlation is not very strong:

cor(sightings$Bft, sightings$Swell_ft_)

[1] 0.5623662

3.6.2 Ship / observer height

Regarding the possible ship covariate, we do not have this in the dataset – but we do have observer height, which is strongly related and potentially relevant.

Note

Check if we don’t have ship?

First to check that observer height is not missing for any records:

sum(is.na(sightings$ObsHeight_m_))

[1] 0

Looking at the distribution of observer heights, there are only a few values and so it does not seem suitable for a continuous numerical covariate:

hist(sightings$ObsHeight_m_, breaks = 100, xlab = "Observer height (m)", main = "Observer height in sightings records")

It seems reasonable to group them into three classes: 9m or less, 9-12m and more than 12m (coded as A, B and C).

Note

Check with Simone and Michaela if this is reasonable, as they likely know the sighting conditions associated with the different ships at these heights.

sightings$fObsHeight <- as.factor(ifelse(sightings$ObsHeight_m_ <= 9, "A",
                               ifelse(sightings$ObsHeight_m_ <= 12, "B", "C")))

Note that Distance does not allow ordinal classes, so we won’t be able to constrain estimated sightability so that the detecability of C is more than B and this is more than A. We will have to see how things come out in the estimates.

4 Exploratory analysis of distances

After the various data selection noted above, there are a total of 2784 records remaining in the sightings data. Here we explore some aspects of the distance data, to try to identify any issues that might affect the detection function modelling.

First, we plot a histogram of the distances (pooled across species)

hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)", 
     main = "Perpendicular distances")

We see a large peak at the lowest distance bin, and a very long right tail, out to 12km. It is typical to truncate the data (in Campbell et al. this was at 2400m for large cetaceans and 500 or 600m for small ones); to focus the plots we truncate off the largest 10% of distances.

trunc_90 <- ceiling(quantile(sightings$perp_dist_m, 0.9))
hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)", 
     main = "Perpendicular distances (truncated)",
     xlim = c(0, trunc_90))

The large peak at lowest distances could be due to rounding of perpendicular distances to zero. Because these data are from radial distance and angle, this could come from rounding angles to zero. Of the 2784 sightings, 36 have exactly 0 distance, so it seems unlikely there is a strong problem with rounding of angles to zero. Nevertheless, here is a histogram of the angles we have (recall from earlier that some of the angles are missing), with the quadrants folded so that all angles are between 0 and 90 degrees:

hist(sightings$angle2, breaks = seq(-0.5, 90.5, by = 1), xlab = "Sighting angle (degrees)", main = "Sighting angles")

There is clear evidence of rounding to the nearest 5 degrees, and this could account for some of the spike at zero - but likely not much of it as there are only 36 zero angles.

As an aside, it seems strange to me that we could have rounding of angles to the nearest 5 degrees if the bearing to the animal and ship’s bearing were measured separately – presumably they were not measured separately (or perhaps they were both rounded to the nearest 5 degrees).

Note

Discuss how bearings were obtained, and whether an angle board or similar was used.

More insight could be gained by viewing a 2D map of the sightings (projected into the upper right quadrant). For this we need to know forward as well as perpendicular distances, so we can only do it for the observations where we have an observation angle (note that this is not all the animals). Here is a 2D plot for all of these observations

#Look at forward distances
sightings$forward_distance <- sightings$perp_dist_m / tan(sightings$angle2 / 360 * (2 * pi))
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)")

We can clearly see bands that correspond with rounding of radial distances (perhaps acutal distances, or reticle marks). However, our primary interest is in the detections close to zero perpendicular distance. Here we plot all sightings within 100m perpendicular distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 100), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 100], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

We can see diagonal lines of observations running out at 1 degree, 2 degrees, etc (first 5 degrees out highlighted with dashed lines). This is to be expected and not a problem.

Here is the same plot truncated at 50m perpendicular distance.

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 50), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 50], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

It looks like many of the small perpendicular distance observations were made at angles close to 90 degrees from the transect line – animals that were almost abeam of the ship.

This can be seen even more clearly when truncating at 20m perpendicular distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 20), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 20], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

We also see that all of the sightings at 2m or less perpendicular distance were made at small forward distances, except one sighting at 200m forward distance. These detections may have been bow-riding animals.

We make one final 2D plot looking only at animals detected within 20m perpendicular and forward distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 20), 
     ylim = c(0, 20))

We now look into which species are involved. Here we look at the species detected within 20m perpendicular distance:

table(sightings$Species1[sightings$perp_dist_m < 20])


 BA  BM  BP  DC  DD DSP  ER  GG  LB  LO  MN  PD  TT  UD ULW 
  4   3  12   6  72  20   2   2   5  12   7  12   3   4   7

and within 20m perpendicular distance and 20m forward distance (note this includes only those detections with an angle and therefore a forward distance)

table(sightings$Species1[sightings$perp_dist_m < 20 & sightings$forward_distance < 20])


 BA  BP  DC  DD DSP  GG  LB  LO  MN  PD  UD ULW 
  1   2   3  41   7   1   2   5   1   4   3   2

Short-beaked common dolphin (DD) dominates with UID common dolphin (DSP) a close second.

However, if we look at perpendicular distances out to (say) 200m by 10m for DD & UID …

ind <- sightings$Species1 %in% c("DD", "DSP")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) DD and DSP")

and for the other species in the dataset…

hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) other species")

we see that both have the spike at very small perpendicular distances (0-10m in these plots).

We can try separating all dolphin (and porpoise) species from the others:

ind <- sightings$Species1 %in% c("DD", "DSP", "UD", "DC", "PD", "LO", "GG", "TT", "LB", "USC", "SC")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) dolphins")

hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) non-dolphins")

So, although bow-riding may play a part, it seems to be something that affects non-dolphin species to some extent also.

Here we break down by species, for species with more than 60 detections. The first plot is out to 2400m in increments of 100m, while the second is out to 500m in increments of 20m.

tmp <- table(sightings$Species1)
sp <- names(tmp[tmp>=60])

p <- sightings %>% filter(Species1 %in% sp) %>%
  ggplot (aes(perp_dist_m)) + 
  geom_histogram(breaks = seq(0, 2400, by = 100)) + 
  facet_wrap(~Species1, scales = "free")
p

p <- sightings %>% filter(Species1 %in% sp) %>%
  ggplot (aes(perp_dist_m)) + 
  geom_histogram(breaks = seq(0, 500, by = 20)) + 
  facet_wrap(~Species1, scales = "free")
p

In these plots, we see a spike at small distances in DD, DSP, LO, PD and TT – all dolphin (and related small) species.

We could envisage further exploratory analyses, but here we move on to detection function modelling.

5 Detection function modelling

5.1 Create observation table

In the code below we pull out from the sightings table just the variables that will be used in the distance analysis. As a check we print a summary of the resulting dataset.

obs_table <- sightings %>%
  rename(sighting_no = SightingNo, date_time = Date_Time_Local_, 
         season = Season, species = Species1, distance = perp_dist_m, ss = Bft, 
         swell = Swell_ft_, size = Best, obs_height = ObsHeight_m_, f_obs_height = fObsHeight, 
         season = Season) %>%
  mutate(date = date(date_time), year = year(date), object = 1:n()) %>%
  select(object, Region.Label, Sample.Label, sighting_no, date, date_time, year, season, species, distance, ss, 
         swell, size, obs_height, f_obs_height)
summary(obs_table)

     object       Region.Label       Sample.Label       sighting_no       
 Min.   :   1.0   Length:2784        Length:2784        Length:2784       
 1st Qu.: 696.8   Class :character   Class :character   Class :character  
 Median :1392.5   Mode  :character   Mode  :character   Mode  :character  
 Mean   :1392.5                                                           
 3rd Qu.:2088.2                                                           
 Max.   :2784.0                                                           
      date              date_time                           year     
 Min.   :2004-07-13   Min.   :2004-07-13 05:34:00.00   Min.   :2004  
 1st Qu.:2009-03-13   1st Qu.:2009-03-14 05:07:00.00   1st Qu.:2009  
 Median :2014-04-11   Median :2014-04-11 12:29:30.00   Median :2014  
 Mean   :2014-01-12   Mean   :2014-01-12 20:26:40.66   Mean   :2014  
 3rd Qu.:2018-04-16   3rd Qu.:2018-04-16 18:35:56.00   3rd Qu.:2018  
 Max.   :2023-11-18   Max.   :2023-11-18 06:41:00.00   Max.   :2023  
    season            species             distance             ss       
 Length:2784        Length:2784        Min.   :    0.0   Min.   :0.000  
 Class :character   Class :character   1st Qu.:  191.3   1st Qu.:2.000  
 Mode  :character   Mode  :character   Median :  552.8   Median :3.000  
                                       Mean   :  946.4   Mean   :3.123  
                                       3rd Qu.: 1245.3   3rd Qu.:4.000  
                                       Max.   :11827.0   Max.   :6.000  
     swell             size          obs_height    f_obs_height
 Min.   : 0.000   Min.   :   0.0   Min.   : 7.10   A:1175      
 1st Qu.: 3.000   1st Qu.:   1.0   1st Qu.: 8.10   B: 844      
 Median : 4.000   Median :   4.0   Median :10.36   C: 765      
 Mean   : 3.944   Mean   :  39.4   Mean   :10.38               
 3rd Qu.: 5.000   3rd Qu.:  22.0   3rd Qu.:13.30               
 Max.   :12.000   Max.   :4600.0   Max.   :17.00

We proceed now with detection function modelling by species.

Text goes here - by species.

6 Density estimation

7 References

Campbell et al. (2015) Inter-annual and seasonal trends in cetacean distribution, density and abundance off southern California. doi:10.1016/j.dsr2.2014.10.008
Lerzack and Hobbs (1997) Calculating sighting distances from angular readings during shipboard, aerial, and shore-based marine mamaml surveys. doi:10.1111/j.1748-7692.1998.tb00745.x