CALCOFI line transect analysis

Author

Len Thomas

Published

December 11, 2024

1 Introduction

This is a working document for the analysis of marine mammal visual line transect data from the CALCOFI surveys 2004-2023. Michaela Alksne provided the data and is helping troubleshoot. The finalized results are due to be used in two papers:

  • Temporal trends, led by Simone Bauman-Pickering. This is in some ways an update of Campbell et al. (2015), which looked at 2004-2013.
  • Relationship between survey-level density estimates and microbial + small plankton eDNA, led by Trevor Ruiz.
Note

Notes like are used throughout the text where there are questions for Michaela and Simone.

2 Effort data

The following code reads in the effort data file and creates two tables: - sample_table, containing one line per transect per cruise - region_table, containing one line per cruise (with Area = 1 otherwise later code complains)

Note, we are currently treating the transect as the sample unit, but we may change to treat a day of effort as the sample unit as was done by Campbell et al. (2015).

#Read in the effort data
effort <- read_excel("./fromMichaela/CalCOFI_2021_2023_Effort_OnTransectOnEffortONLY.xlsx", 
                     guess_max = 7000, na = c("", "NA"))

#Prepare tables for Distance analysis
sample_table <- effort %>% 
  rename(year = Year, season = Season, Region.Label = Cruise, transect = Transect, 
         segment_length = SegmentLength_km_) %>%
  group_by(Sample.Label = paste(Region.Label, transect, sep = "-")) %>%
  summarise(Effort = sum(segment_length), year = first(year), season = first(season),
            Region.Label = first(Region.Label), transect = first(transect))
region_table <- sample_table %>%
  group_by(Region.Label) %>%
  summarise(Region.Label = first(Region.Label), year = first(year), season = first(season), k = n(), L = sum(Effort)) %>%
  mutate(Area = 1)

Here is a copy of the region_table, i.e., with one line per cruise. \(k\) is the number of samples (transects) and \(L\) is the total line length.

region_table %>% kbl() %>% kable_classic(full_width = FALSE)
Region.Label year season k L Area
CC2004-01 2004 winter 1 0.0000 1
CC2004-04 2004 spring 1 0.0000 1
CC2004-07 2004 summer 33 1446.8834 1
CC2004-11 2004 fall 23 931.6503 1
CC2005-01 2005 winter 26 903.8109 1
CC2005-04 2005 spring 34 1278.9707 1
CC2005-07 2005 summer 33 1304.0305 1
CC2005-11 2005 fall 27 927.4696 1
CC2006-02 2006 winter 25 1044.4506 1
CC2006-04 2006 spring 31 1383.5765 1
CC2006-07 2006 summer 33 1009.4270 1
CC2006-10 2006 fall 25 1028.8196 1
CC2007-01 2007 winter 25 949.0365 1
CC2007-04 2007 spring 22 768.5812 1
CC2007-07 2007 summer 34 1005.2511 1
CC2007-11 2007 fall 23 640.6153 1
CC2008-01 2008 winter 23 790.1258 1
CC2008-03 2008 spring 21 499.1738 1
CC2008-08 2008 summer 33 1061.9797 1
CC2008-10 2008 fall 26 907.0628 1
CC2009-01 2009 winter 32 980.7786 1
CC2009-03 2009 spring 26 785.0440 1
CC2009-07 2009 summer 37 1287.6075 1
CC2009-11 2009 fall 29 1037.7810 1
CC2010-01 2010 winter 29 875.1260 1
CC2010-04 2010 spring 1 0.0000 1
CC2010-08 2010 summer 28 1086.1751 1
CC2010-11 2010 fall 23 774.4909 1
CC2011-01 2011 winter 31 1052.1534 1
CC2011-04 2004 spring 30 1238.2236 1
CC2011-08 2011 summer 30 1341.8873 1
CC2011-10 2011 fall 24 1044.0281 1
CC2012-02 2012 winter 15 453.5515 1
CC2012-03 2012 spring 17 782.0651 1
CC2012-07 2012 summer 31 1480.9625 1
CC2012-10 2012 fall 24 950.4777 1
CC2013-01 2013 winter 23 848.6145 1
CC2013-04 2013 spring 18 590.5558 1
CC2013-07 2013 summer 30 1326.3615 1
CC2013-11 2013 fall 27 926.1534 1
CC2014-02 2014 winter 13 457.5277 1
CC2014-04 2014 spring 22 827.0975 1
CC2014-07 2014 summer 36 1256.0396 1
CC2014-11 2014 fall 30 1156.5048 1
CC2015-01 2015 winter 30 1121.4828 1
CC2015-04 2015 spring 25 1133.8400 1
CC2015-07 2015 summer 29 1297.2557 1
CC2015-11 2015 fall 27 1067.4050 1
CC2016-01 2016 winter 26 875.7938 1
CC2016-04 2016 spring 26 1026.1933 1
CC2016-07 2016 summer 31 1154.7612 1
CC2016-11 2016 fall 22 849.3831 1
CC2017-01 2017 winter 28 1027.7116 1
CC2017-04 2017 spring 32 1265.2516 1
CC2017-08 2017 summer 29 1284.8930 1
CC2017-11 2017 fall 22 946.4949 1
CC2018-02 2018 winter 18 631.6046 1
CC2018-04 2018 spring 25 1109.4123 1
CC2018-06 2018 summer 32 1198.7060 1
CC2018-10 2018 fall 24 1082.9915 1
CC2019-02 2019 winter 11 370.8165 1
CC2019-04 2019 spring 20 920.8476 1
CC2019-07 2019 summer 27 1251.9319 1
CC2019-11 2019 fall 24 1072.9128 1
CC2020-01 2020 winter 25 960.1047 1
CC2020-04 2020 spring 1 0.0000 1
CC2020-08 2020 summer 1 0.0000 1
CC2020-11 2020 fall 1 0.0000 1
CC2021-01 2021 winter 1 0.0000 1
CC2021-04 2021 spring 1 0.0000 1
CC2021-07 2021 summer 26 1258.6472 1
CC2021-11 2021 fall 19 826.4556 1
CC2204SH 2022 spring 16 576.5156 1
CC2208 2022 summer 47 790.1335 1
CC2211SR 2022 fall 19 705.2526 1
CC2301RL 2023 winter 13 594.6193 1
CC2303SH 2023 spring 20 801.7941 1
CC2311SR 2023 fall 21 929.0986 1

Two things are apparent:

  • some cruises have 0 survey effort and only 1 transect.
  • the protocol for naming cruises changed in spring 2022. Before that they had a 4 digit year, then a dash then the month the survey started; after that they have a 2-digit year and then 2 letters afterwards except for summer. No big deal I think, just thought it’s worth noting.

I will delete the cruises with 0 survey effort.

no_effort <- region_table[region_table$L == 0, ]$Region.Label
region_table <- region_table %>% filter(!(Region.Label %in% no_effort))
sample_table <- sample_table %>% filter(!(Region.Label %in% no_effort))
Note

Is it expected that these cruises have 0 survey effort? Is it okay to delete them?

3 Sightings data

In this section we read in the sightings data; we create and check the columns we need for distance sampling detection function analysis.

#Read in sightings data
#Increase guess_max to more than number of rows so it gets the column types right
sightings <- read_excel("./fromMichaela/CalCOFI_2004_2023_CombinedSightings_add_weather.xlsx", 
                        guess_max = 7000, na = c("", "NA"))

The following variables will be needed for the distance sampling analysis:

  • perpendicular distance – see Section 3.1
  • whether on effort and on transect - see Section 3.2
  • transect (i.e., the transect the sighting was made on) - see Section 3.3
  • species - will use the Species1 column - see Section 3.4
  • school size - will use the best column - see Section 3.5
  • other potential covariates for the detection function - see Section 3.6

3.1 Perpendicular distances

Perpendicular distances are required for all observations. These were not supplied for 2022 and 2023 cruises, so will be calculated below. Before this, we check for the records before 2022 that there are perpendular distances for each record:

#Create an indicator for whether the record is 2022 or later
sightings$Year <- year(sightings$Date_Time_Local_)
ind <- sightings$Year >= 2022

#Count the number of records from before 2022 that have missing perpendicular distances
ind2 <- !ind & is.na(sightings$perp_dist_m)
sum(ind2)
[1] 0

There are 0 records with missing perpendicular distances prior to 2022. So no problem there.

Now to obtaining perpendicular distances for 2022 and 2023 cruises. We use the binocular reticle readings (field BinoReticle), observer height and earth radius to calculate radial along-water distance (using formulae in Lerczak and Hobbs 1997). In some cases, the field distance_m was filled, which we take to mean that a radial distance was estimated directly (by eye) so we use this in place of any reticle measurement (in the cases we checked the reticle field was 0).

We then use ships bearing (ShipHeadingTru) and observaton bearing (SightingBearingTru) to get the observation angle and hence perpendicular distance.

#Warnings suppressed - they arise in cases where there is no reticle observation or 
# sighting bearing (which can be the case when a direct radial distance is recorded)

#Get BinoReticle to radians conversion
sightings$BinoReticle <- as.numeric(sightings$BinoReticle)
sightings$ret_radians <- as.numeric(sightings$ret_radians)
ret_to_rad <- sightings$ret_radians[1] / sightings$BinoReticle[1]
sightings$ret_radians2 <- sightings$BinoReticle * ret_to_rad

radius_earth <- sightings$radius_earth[1]

#The following formulae are from Lerczak and Hobbs (1997)
#Get angle from horizontal to horizon
sightings$alpha <- atan(sqrt(2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2) / radius_earth)
#Get angle from marine mammal to platform
sightings$beta <- pi / 2 - sightings$alpha - sightings$ret_radians2
#Get distance along earth's surface to horizon
sightings$H <- sightings$alpha * radius_earth
#Get line-of-sight distance to marine mammal
sightings$D0 <- (radius_earth + sightings$ObsHeight_m_) * cos(sightings$beta) - 
  sqrt((radius_earth + sightings$ObsHeight_m_) ^ 2 * (cos(sightings$beta)) ^ 2 - 
       (2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2))
#Get central arc from marine mammal to platform
sightings$delta <- asin(sin(sightings$beta) * sightings$D0 / radius_earth)
#Get distance to marine mammal along the surface of the ocean
sightings$D <- sightings$delta * radius_earth

#If Distance_m_ column is filled in, use that instead
sightings$D2 <- sightings$D
ind2 <- sightings$Distance_m_ > 0
ind2[is.na(ind2)] <- FALSE
sightings$D2[ind2] <- sightings$Distance_m_[ind2]

#Get horizontal angle (in degrees) from trackline
tmp <- sightings$ShipHeadingTru - as.numeric(sightings$SightingBearingTru)
#Trim clearly wrong values
tmp[abs(tmp) > 360] <- NA
#Transform into the 0-90 degree quadrat (will be useful later for looking at forward distances)
sightings$angle2 <- asin(abs(sin(tmp/360 * (2 * pi)))) * 360 / (2 * pi)

#Convert to perpendicular distance
sightings$perp_dist_m2 <- sightings$D2 * abs(sin(sightings$angle2 / 360 * 2 * pi))

One check that can be performed on the above is to check the calculated radial distances using Lerczak and Hobbs against those given in the sightings spreadsheet for sightings earlier than 2022. Here is a histogram of the distance differences:

ind3 <- sightings$dist_calc_m_ > 0
hist(sightings$D[ind3] - sightings$dist_calc_m_[ind3], xlab = "Difference in calculated radial distances (m)", main = "Difference between spreadsheet and Lerczak & Hobbs", breaks = 100)

As can be seen, in the great majority of cases the difference is very small - but there are some larger differences.

Note

Worth investigating further?

One other thing to note is that there are 443 records that don’t have a valid number in either the ShipHeadingTru or the SightingBearingTru column of the sightings spreadsheet.

Note

Worth investigating further?

We now check whether any 2022/2023 data still have missing distances:

#Calculate how many missing distances we have from 2022 onwards
ind3 <- ind & is.na(sightings$perp_dist_m2)
tmp <- sightings[ind3, ]       

There are 19 of these.

Note

I checked into the causes and emailed Michaela about them on 29th Nov.

For now I will drop these records …

sightings <- sightings[!ind3, ]

… and copy over the newly calculated perp distances for 2022 onwards into the perp_dist_m field used by all the other records.

#Copy the perp_dist_m2 for these records to perp_dist_m
ind <- sightings$Year >= 2022
sightings$perp_dist_m[ind] <- sightings$perp_dist_m2[ind]

3.2 Getting on-effort on-transect data

We wish to select only sightings data from when observers were on effort and the vessel was on transect. There are various potentially relevant fields here, so we should check which to use.

Note

Check with Michaela and Simone which effort fields to use.

Three relevant-looking fiels are AdjustedBothOnEffortAndOnTransect, OriginalEffortFlag_ON_OFF_ and OriginalTransect_ON_OFF_. However, the former has some NA records, so it seems prudent to use a combination of the latter two.

#Check for NAs
sum(is.na(sightings$AdjustedBothOnEffortAndOnTransect))
[1] 58
sum(is.na(sightings$OriginalEffortFlag_ON_OFF_))
[1] 0
sum(is.na(sightings$OriginalTransect_ON_OFF_))
[1] 0
#Turn to upper case as currently it's a mixture of both
sightings$AdjustedBothOnEffortAndOnTransect <- toupper(sightings$AdjustedBothOnEffortAndOnTransect)
sightings$OriginalEffortFlag_ON_OFF_ <- toupper(sightings$OriginalEffortFlag_ON_OFF_)
sightings$OriginalTransect_ON_OFF_ <- toupper(sightings$OriginalTransect_ON_OFF_)

#Create logical variables
sightings$AdjustedOnEffortAndOnTransect <- sightings$AdjustedBothOnEffortAndOnTransect == "ON"
sightings$OriginalOnEffort <- sightings$OriginalEffortFlag_ON_OFF_ == "ON"
sightings$OriginalOnTransect <- sightings$OriginalTransect_ON_OFF_ == "ON"

#Select data where OnEffort and OnTransect are ON
ind <- sightings$OriginalOnEffort & sightings$OriginalOnTransect

Of the 6287 sighting records, the number made when both on effort and on transect according to the above criteria are 3388. We select just these data for futher processing.

sightings <- sightings[ind, ]
Note

Check with Michaela and Simone whether this level of data “loss” is okay. We could potentially use off transect observations for fitting the detection function if the crew were on effort – but this risks compromizing pooling robustness. Need to have a conversation about this.

3.3 Getting transect numbers for each sighting

Transect numbers were not provided with the sightings, but we can obtain them if we compare the time of the sighting with the start and end times of the effort segments. The sightings spreadsheet has a column Date_Time_Local_ and the effort spreadsheet has columns StartDateTimeLocal and EndDateTimeLocal.

The following code does this, and also collates any sightings that fall outside the effort segment times. (It also creates two new fields Region.Label and Sample.Label that will be used to join the effort data to these data in the Distance analysis later. Region.Label is simply the Cruise column from the effort spreadsheet – we have to copy it over because although Cruise is recorded in the sightings spreadsheet it is formatted differently. Sample.Label is the cruise code, a dash and then the transect number.)

n <- nrow(sightings)
first_reject <- TRUE; rejects <- NULL
sightings <- sightings %>% mutate(Region.Label = "", Sample.Label = "")
for (i in 1:n){
  #Find records in effort file where start time is less than obs time and 
  # end time is more than obs time
  ind <- (effort$StartDateTimeLocal <= sightings$Date_Time_Local_[i]) & 
    (effort$EndDateTimeLocal >= sightings$Date_Time_Local_[i])
  which_ind <- which(ind)
  if(length(which_ind)==1){
    #Copy cruise from effort table - it's formatted differently in the sightings
    sightings$Region.Label[i] <- effort$Cruise[which_ind]
    #Create Sample.Label
    sightings$Sample.Label[i] <- paste(effort$Cruise[which_ind], effort$Transect[which_ind], sep = "-")
  } else{
    if(first_reject) {
      rejects <- sightings[i, ]
      first_reject <- FALSE
    } else {
      rejects <- rbind(rejects, sightings[i, ])
    }
  }
}

There are 537 sightings that fall outside of effort segment times. This is much more than I would expect, so I suspect that the on-effort-on-transect designation of the previous section has not worked as intended. On investigation, I also found a few bugs in the sightings Date_Time_Local_ field.

Note

I emailed Michaela on 30th Nov for suggestions. Will need to clear up before proceeding with a final analysis.

For now I will remove all of the records that fall outside of effort segment times.

ind <- (sightings$Sample.Label == "")
sightings <- sightings[!ind, ]

3.4 Species

Here is a printout of the sample size (of groups) in the sightings data, by species. The table doesn’t look fancy as the standard table formatting (using kbl) didn’t work with some of the characters in the cetacean codes CSV file (there are some odd line feed characters). Note also some codes are missing - but the general idea is there.

#Get species names
sp_names <- read.csv("./fromMichaela/CalCOFI_cetacean_codes.csv")

#Tabulate sightings and add in species names
sightings$Species1 <- toupper(sightings$Species1)
tmp <- tibble(data.frame(sort(table(sightings$Species1), decreasing = TRUE))) %>%
  rename(Code = Var1) %>% 
  left_join(sp_names, by = join_by(Code == CalCOFI.NOAA.Species.Abbreviation))
print(tmp, n = Inf)
# A tibble: 31 × 3
   Code    Freq Common.Name                 
   <chr>  <int> <chr>                       
 1 DD       544 Short-beaked common dolphin 
 2 ULW      536 Unidentified Large Whale    
 3 DSP      310 Unidentified common dolphin 
 4 BP       248 Fin Whale                   
 5 MN       211 Humpback Whale              
 6 UD       167 Unidentified Dolphin        
 7 BM       159 Blue Whale                  
 8 DC        90 Long-beaked common dolphin  
 9 PD        86 Dall's porpoise             
10 LO        77 Pacific white-sided dolphin 
11 ER        71 Gray Whale                  
12 GG        66 Risso's dolphin             
13 TT        63 Bottlenose dolphin          
14 UC        56 Unidentified cetacean       
15 PM        47 Sperm Whale                 
16 BA        24 Minke Whale                 
17 LB        22 Northern right-whale dolphin
18 OTH       19 <NA>                        
19 OO        12 Killer Whale                
20 USC       11 Unidentified�small cetacean 
21 SC         9 Striped dolphin             
22 ZICA       9 Cuvier's beaked whale       
23 BBO        2 <NA>                        
24 GM         2 Short-finned pilot whale    
25 UBW        2 <NA>                        
26 USW        2 Unidentified�small whale    
27 BB         1 Sei Whale                   
28 BEBA       1 �Baird's beaked whale�      
29 PC         1 False Killer whale          
30 UNZIPH     1 Unidentified ziphid         
31 UO         1 Unidentified odontocete     

For the purposes of the analyses undertaken here, we may be interested only in blue, fin and humpback whale; for the trends paper we will likely be interested in more species – but will want some sample size cutoff. Note also we will need to determine how to deal with the unidentified whale sightings (there are many, because the survey was undertaken in passing mode, with relatively inexperienced observers).

3.5 School size

In this section, we check a school size (column Best) is present for all sightings.

ind <- (is.na(sightings$Best))

There are 1 records with missing school size. These are removed in the code below.

Note

I checked with Michaela and she could not locate the school size so OK to drop.

sightings <- sightings[!ind, ]

Note that there are some very large school sizes, and we will need to return to examine these before a sensible analysis can be done on the species involved.

summary(sightings$Best)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    1.00    4.00   39.23   22.00 4600.00 

Here is a summary of the number of records of species with school sizes more than 100:

which_ind <- which(sightings$Best > 100)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)

 DD DSP  DC  UD  LB  GG  TT  LO OTH  SC 
103  50  37  18   3   2   2   1   1   1 

and more than 1000:

which_ind <- which(sightings$Best > 1000)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)

 DD  DC  UD DSP 
  5   4   2   1 

These are all one of the two species of common dolphin, or UID dolphin.

3.6 Other covariates for the detection function

In Campbell et al. (2015), they used: “Beaufort sea state (numerical: 0–5), ship (categorical), swell (numerical: wave height in m), school size (numerical), and, for short-beaked common dolphins, school size class (categorical).” We have already looked at school size, but we consider the others (and some alternatives) below.

In addition, it was suggested that observer would be a potentially relevant covariate, but there are too many for this to be possible; this is also true for cruise. Another possible covariate not considered by Campbell et al. is season. It could be argued that some of the sightability variation here would be captured by sea state and swell – but there are potentially animal behavioural (and other) factors that could be included with season so it seems worth adding.

Cue is also recorded as a variable, and could be useful - but there are many missing values so we do not pursue it here.

sum(is.na(sightings$Cue))
[1] 596

There are also a suite of observational variables such as Quality, Visiblity, Precipitation, Could, Glare - but these all have many missing values.

3.6.1 Sea state and swell

For sea state and swell, in an initial round of exploratory analysis we found many missing records. On investigation, Michaela found there were not contemporaneous records for these, and filled in the values with the closest (in time) available record.

Note

Note to self - check if these are both Beaufort and swell or just Beaufort that were filled in by Michaela.

Check there are no missing records:

#Look at beaufort and swell
sum(is.na(sightings$Bft))
[1] 0
sum(is.na(sightings$Swell_ft_))
[1] 0

We were advised to consider truncating records too far away in time. Here is a histogram of the time delays (note that only records that were filled in are shown here, as the others have a Time_Diff_minutes of NA).

hist(sightings$Time_Diff_minutes/60, main = "Time delay between sighting and sea conditions", xlab = "Time delay (hours)", breaks = 100)

There are a small number that are very large and will clearly need truncated – here is a histogram with these removed.

ind <- abs(sightings$Time_Diff_minutes) <= 6000
hist(sightings$Time_Diff_minutes[ind]/60, main = "Time delay between sightings and sea conditions (truncated)", xlab = "Time delay (hours)", breaks = 100)

Following Michela’s suggestion, we will truncate any records with a time delay of more than 8 hours.

ind <- abs(sightings$Time_Diff_minutes) > 8 * 60
ind[is.na(ind)] <- FALSE

This will result in truncating 66 out of 2850 remaining records. Note that some of these are sightings with small perpendicular distances, so it is a shame to truncate them as they will affect the analysis. However, it seems the best choice in the circumstances.

sightings <- sightings[!ind, ]

There are the frequency of the different Beaufort values – note this is in the sightings database, so it does not reflect the proportion of time surveying in the different conditions if there are fewer sightings in higher Beauforts (which may reflect that there was little surveying done in such Beauforts, and/or that animals are hard to sight in those conditions):

table(sightings$Bft)

  0   1   2   3   4   5   6 
 13 220 698 824 544 480   5 

and here is a histogram of swell height

hist(sightings$Swell_ft_, xlab = "Swell height (ft)", main = "Swell height in sightings records")

One thing to check about Beaufort and swell is whether they are strongly correlated – if so it may not be wise to try them both as covariates in the detection function modelling. However (and perhaps surprisingly), the correlation is not very strong:

cor(sightings$Bft, sightings$Swell_ft_)
[1] 0.5623662

3.6.2 Ship / observer height

Regarding the possible ship covariate, we do not have this in the dataset – but we do have observer height, which is strongly related and potentially relevant.

Note

Check if we don’t have ship?

First to check that observer height is not missing for any records:

sum(is.na(sightings$ObsHeight_m_))
[1] 0

Looking at the distribution of observer heights, there are only a few values and so it does not seem suitable for a continuous numerical covariate:

hist(sightings$ObsHeight_m_, breaks = 100, xlab = "Observer height (m)", main = "Observer height in sightings records")

It seems reasonable to group them into three classes: 9m or less, 9-12m and more than 12m (coded as A, B and C).

Note

Check with Simone and Michaela if this is reasonable, as they likely know the sighting conditions associated with the different ships at these heights.

sightings$fObsHeight <- as.factor(ifelse(sightings$ObsHeight_m_ <= 9, "A",
                               ifelse(sightings$ObsHeight_m_ <= 12, "B", "C")))

Note that Distance does not allow ordinal classes, so we won’t be able to constrain estimated sightability so that the detecability of C is more than B and this is more than A. We will have to see how things come out in the estimates.

4 Exploratory analysis of distances

After the various data selection noted above, there are a total of 2784 records remaining in the sightings data. Here we explore some aspects of the distance data, to try to identify any issues that might affect the detection function modelling.

First, we plot a histogram of the distances (pooled across species)

hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)", 
     main = "Perpendicular distances")

We see a large peak at the lowest distance bin, and a very long right tail, out to 12km. It is typical to truncate the data (in Campbell et al. this was at 2400m for large cetaceans and 500 or 600m for small ones); to focus the plots we truncate off the largest 10% of distances.

trunc_90 <- ceiling(quantile(sightings$perp_dist_m, 0.9))
hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)", 
     main = "Perpendicular distances (truncated)",
     xlim = c(0, trunc_90))

The large peak at lowest distances could be due to rounding of perpendicular distances to zero. Because these data are from radial distance and angle, this could come from rounding angles to zero. Of the 2784 sightings, 36 have exactly 0 distance, so it seems unlikely there is a strong problem with rounding of angles to zero. Nevertheless, here is a histogram of the angles we have (recall from earlier that some of the angles are missing), with the quadrants folded so that all angles are between 0 and 90 degrees:

hist(sightings$angle2, breaks = seq(-0.5, 90.5, by = 1), xlab = "Sighting angle (degrees)", main = "Sighting angles")

There is clear evidence of rounding to the nearest 5 degrees, and this could account for some of the spike at zero - but likely not much of it as there are only 36 zero angles.

As an aside, it seems strange to me that we could have rounding of angles to the nearest 5 degrees if the bearing to the animal and ship’s bearing were measured separately – presumably they were not measured separately (or perhaps they were both rounded to the nearest 5 degrees).

Note

Discuss how bearings were obtained, and whether an angle board or similar was used.

More insight could be gained by viewing a 2D map of the sightings (projected into the upper right quadrant). For this we need to know forward as well as perpendicular distances, so we can only do it for the observations where we have an observation angle (note that this is not all the animals). Here is a 2D plot for all of these observations

#Look at forward distances
sightings$forward_distance <- sightings$perp_dist_m / tan(sightings$angle2 / 360 * (2 * pi))
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)")

We can clearly see bands that correspond with rounding of radial distances (perhaps acutal distances, or reticle marks). However, our primary interest is in the detections close to zero perpendicular distance. Here we plot all sightings within 100m perpendicular distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 100), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 100], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

We can see diagonal lines of observations running out at 1 degree, 2 degrees, etc (first 5 degrees out highlighted with dashed lines). This is to be expected and not a problem.

Here is the same plot truncated at 50m perpendicular distance.

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 50), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 50], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

It looks like many of the small perpendicular distance observations were made at angles close to 90 degrees from the transect line – animals that were almost abeam of the ship.

This can be seen even more clearly when truncating at 20m perpendicular distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 20), 
     ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 20], na.rm = TRUE)))
for(i in 1:5){
  abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}

We also see that all of the sightings at 2m or less perpendicular distance were made at small forward distances, except one sighting at 200m forward distance. These detections may have been bow-riding animals.

We make one final 2D plot looking only at animals detected within 20m perpendicular and forward distance:

plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)", 
     ylab = "Forward distance (m)", xlim = c(0, 20), 
     ylim = c(0, 20))

We now look into which species are involved. Here we look at the species detected within 20m perpendicular distance:

table(sightings$Species1[sightings$perp_dist_m < 20])

 BA  BM  BP  DC  DD DSP  ER  GG  LB  LO  MN  PD  TT  UD ULW 
  4   3  12   6  72  20   2   2   5  12   7  12   3   4   7 

and within 20m perpendicular distance and 20m forward distance (note this includes only those detections with an angle and therefore a forward distance)

table(sightings$Species1[sightings$perp_dist_m < 20 & sightings$forward_distance < 20])

 BA  BP  DC  DD DSP  GG  LB  LO  MN  PD  UD ULW 
  1   2   3  41   7   1   2   5   1   4   3   2 

Short-beaked common dolphin (DD) dominates with UID common dolphin (DSP) a close second.

However, if we look at perpendicular distances out to (say) 200m by 10m for DD & UID …

ind <- sightings$Species1 %in% c("DD", "DSP")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) DD and DSP")

and for the other species in the dataset…

hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) other species")

we see that both have the spike at very small perpendicular distances (0-10m in these plots).

We can try separating all dolphin (and porpoise) species from the others:

ind <- sightings$Species1 %in% c("DD", "DSP", "UD", "DC", "PD", "LO", "GG", "TT", "LB", "USC", "SC")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) dolphins")

hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10), 
     xlab = "Perpendicular distance (m)", 
     main = "Pdist (truncated) non-dolphins")

So, although bow-riding may play a part, it seems to be something that affects non-dolphin species to some extent also.

Here we break down by species, for species with more than 60 detections. The first plot is out to 2400m in increments of 100m, while the second is out to 500m in increments of 20m.

tmp <- table(sightings$Species1)
sp <- names(tmp[tmp>=60])

p <- sightings %>% filter(Species1 %in% sp) %>%
  ggplot (aes(perp_dist_m)) + 
  geom_histogram(breaks = seq(0, 2400, by = 100)) + 
  facet_wrap(~Species1, scales = "free")
p

p <- sightings %>% filter(Species1 %in% sp) %>%
  ggplot (aes(perp_dist_m)) + 
  geom_histogram(breaks = seq(0, 500, by = 20)) + 
  facet_wrap(~Species1, scales = "free")
p

In these plots, we see a spike at small distances in DD, DSP, LO, PD and TT – all dolphin (and related small) species.

We could envisage further exploratory analyses, but here we move on to detection function modelling.

5 Detection function modelling

5.1 Create observation table

In the code below we pull out from the sightings table just the variables that will be used in the distance analysis. As a check we print a summary of the resulting dataset.

obs_table <- sightings %>%
  rename(sighting_no = SightingNo, date_time = Date_Time_Local_, 
         season = Season, species = Species1, distance = perp_dist_m, ss = Bft, 
         swell = Swell_ft_, size = Best, obs_height = ObsHeight_m_, f_obs_height = fObsHeight, 
         season = Season) %>%
  mutate(date = date(date_time), year = year(date), object = 1:n()) %>%
  select(object, Region.Label, Sample.Label, sighting_no, date, date_time, year, season, species, distance, ss, 
         swell, size, obs_height, f_obs_height)
summary(obs_table)
     object       Region.Label       Sample.Label       sighting_no       
 Min.   :   1.0   Length:2784        Length:2784        Length:2784       
 1st Qu.: 696.8   Class :character   Class :character   Class :character  
 Median :1392.5   Mode  :character   Mode  :character   Mode  :character  
 Mean   :1392.5                                                           
 3rd Qu.:2088.2                                                           
 Max.   :2784.0                                                           
      date              date_time                           year     
 Min.   :2004-07-13   Min.   :2004-07-13 05:34:00.00   Min.   :2004  
 1st Qu.:2009-03-13   1st Qu.:2009-03-14 05:07:00.00   1st Qu.:2009  
 Median :2014-04-11   Median :2014-04-11 12:29:30.00   Median :2014  
 Mean   :2014-01-12   Mean   :2014-01-12 20:26:40.66   Mean   :2014  
 3rd Qu.:2018-04-16   3rd Qu.:2018-04-16 18:35:56.00   3rd Qu.:2018  
 Max.   :2023-11-18   Max.   :2023-11-18 06:41:00.00   Max.   :2023  
    season            species             distance             ss       
 Length:2784        Length:2784        Min.   :    0.0   Min.   :0.000  
 Class :character   Class :character   1st Qu.:  191.3   1st Qu.:2.000  
 Mode  :character   Mode  :character   Median :  552.8   Median :3.000  
                                       Mean   :  946.4   Mean   :3.123  
                                       3rd Qu.: 1245.3   3rd Qu.:4.000  
                                       Max.   :11827.0   Max.   :6.000  
     swell             size          obs_height    f_obs_height
 Min.   : 0.000   Min.   :   0.0   Min.   : 7.10   A:1175      
 1st Qu.: 3.000   1st Qu.:   1.0   1st Qu.: 8.10   B: 844      
 Median : 4.000   Median :   4.0   Median :10.36   C: 765      
 Mean   : 3.944   Mean   :  39.4   Mean   :10.38               
 3rd Qu.: 5.000   3rd Qu.:  22.0   3rd Qu.:13.30               
 Max.   :12.000   Max.   :4600.0   Max.   :17.00               

We proceed now with detection function modelling by species.

Text goes here - by species.

6 Density estimation

7 References

  • Campbell et al. (2015) Inter-annual and seasonal trends in cetacean distribution, density and abundance off southern California. doi:10.1016/j.dsr2.2014.10.008
  • Lerzack and Hobbs (1997) Calculating sighting distances from angular readings during shipboard, aerial, and shore-based marine mamaml surveys. doi:10.1111/j.1748-7692.1998.tb00745.x