#Read in the effort data
effort <- read_excel("./fromMichaela/CalCOFI_2021_2023_Effort_OnTransectOnEffortONLY.xlsx",
guess_max = 7000, na = c("", "NA"))
#Prepare tables for Distance analysis
sample_table <- effort %>%
rename(year = Year, season = Season, Region.Label = Cruise, transect = Transect,
segment_length = SegmentLength_km_) %>%
group_by(Sample.Label = paste(Region.Label, transect, sep = "-")) %>%
summarise(Effort = sum(segment_length), year = first(year), season = first(season),
Region.Label = first(Region.Label), transect = first(transect))
region_table <- sample_table %>%
group_by(Region.Label) %>%
summarise(Region.Label = first(Region.Label), year = first(year), season = first(season), k = n(), L = sum(Effort)) %>%
mutate(Area = 1)CALCOFI line transect analysis
1 Introduction
This is a working document for the analysis of marine mammal visual line transect data from the CALCOFI surveys 2004-2023. Michaela Alksne provided the data and is helping troubleshoot. The finalized results are due to be used in two papers:
- Temporal trends, led by Simone Bauman-Pickering. This is in some ways an update of Campbell et al. (2015), which looked at 2004-2013.
- Relationship between survey-level density estimates and microbial + small plankton eDNA, led by Trevor Ruiz.
Notes like are used throughout the text where there are questions for Michaela and Simone.
2 Effort data
The following code reads in the effort data file and creates two tables: - sample_table, containing one line per transect per cruise - region_table, containing one line per cruise (with Area = 1 otherwise later code complains)
Note, we are currently treating the transect as the sample unit, but we may change to treat a day of effort as the sample unit as was done by Campbell et al. (2015).
Here is a copy of the region_table, i.e., with one line per cruise. \(k\) is the number of samples (transects) and \(L\) is the total line length.
region_table %>% kbl() %>% kable_classic(full_width = FALSE)| Region.Label | year | season | k | L | Area |
|---|---|---|---|---|---|
| CC2004-01 | 2004 | winter | 1 | 0.0000 | 1 |
| CC2004-04 | 2004 | spring | 1 | 0.0000 | 1 |
| CC2004-07 | 2004 | summer | 33 | 1446.8834 | 1 |
| CC2004-11 | 2004 | fall | 23 | 931.6503 | 1 |
| CC2005-01 | 2005 | winter | 26 | 903.8109 | 1 |
| CC2005-04 | 2005 | spring | 34 | 1278.9707 | 1 |
| CC2005-07 | 2005 | summer | 33 | 1304.0305 | 1 |
| CC2005-11 | 2005 | fall | 27 | 927.4696 | 1 |
| CC2006-02 | 2006 | winter | 25 | 1044.4506 | 1 |
| CC2006-04 | 2006 | spring | 31 | 1383.5765 | 1 |
| CC2006-07 | 2006 | summer | 33 | 1009.4270 | 1 |
| CC2006-10 | 2006 | fall | 25 | 1028.8196 | 1 |
| CC2007-01 | 2007 | winter | 25 | 949.0365 | 1 |
| CC2007-04 | 2007 | spring | 22 | 768.5812 | 1 |
| CC2007-07 | 2007 | summer | 34 | 1005.2511 | 1 |
| CC2007-11 | 2007 | fall | 23 | 640.6153 | 1 |
| CC2008-01 | 2008 | winter | 23 | 790.1258 | 1 |
| CC2008-03 | 2008 | spring | 21 | 499.1738 | 1 |
| CC2008-08 | 2008 | summer | 33 | 1061.9797 | 1 |
| CC2008-10 | 2008 | fall | 26 | 907.0628 | 1 |
| CC2009-01 | 2009 | winter | 32 | 980.7786 | 1 |
| CC2009-03 | 2009 | spring | 26 | 785.0440 | 1 |
| CC2009-07 | 2009 | summer | 37 | 1287.6075 | 1 |
| CC2009-11 | 2009 | fall | 29 | 1037.7810 | 1 |
| CC2010-01 | 2010 | winter | 29 | 875.1260 | 1 |
| CC2010-04 | 2010 | spring | 1 | 0.0000 | 1 |
| CC2010-08 | 2010 | summer | 28 | 1086.1751 | 1 |
| CC2010-11 | 2010 | fall | 23 | 774.4909 | 1 |
| CC2011-01 | 2011 | winter | 31 | 1052.1534 | 1 |
| CC2011-04 | 2004 | spring | 30 | 1238.2236 | 1 |
| CC2011-08 | 2011 | summer | 30 | 1341.8873 | 1 |
| CC2011-10 | 2011 | fall | 24 | 1044.0281 | 1 |
| CC2012-02 | 2012 | winter | 15 | 453.5515 | 1 |
| CC2012-03 | 2012 | spring | 17 | 782.0651 | 1 |
| CC2012-07 | 2012 | summer | 31 | 1480.9625 | 1 |
| CC2012-10 | 2012 | fall | 24 | 950.4777 | 1 |
| CC2013-01 | 2013 | winter | 23 | 848.6145 | 1 |
| CC2013-04 | 2013 | spring | 18 | 590.5558 | 1 |
| CC2013-07 | 2013 | summer | 30 | 1326.3615 | 1 |
| CC2013-11 | 2013 | fall | 27 | 926.1534 | 1 |
| CC2014-02 | 2014 | winter | 13 | 457.5277 | 1 |
| CC2014-04 | 2014 | spring | 22 | 827.0975 | 1 |
| CC2014-07 | 2014 | summer | 36 | 1256.0396 | 1 |
| CC2014-11 | 2014 | fall | 30 | 1156.5048 | 1 |
| CC2015-01 | 2015 | winter | 30 | 1121.4828 | 1 |
| CC2015-04 | 2015 | spring | 25 | 1133.8400 | 1 |
| CC2015-07 | 2015 | summer | 29 | 1297.2557 | 1 |
| CC2015-11 | 2015 | fall | 27 | 1067.4050 | 1 |
| CC2016-01 | 2016 | winter | 26 | 875.7938 | 1 |
| CC2016-04 | 2016 | spring | 26 | 1026.1933 | 1 |
| CC2016-07 | 2016 | summer | 31 | 1154.7612 | 1 |
| CC2016-11 | 2016 | fall | 22 | 849.3831 | 1 |
| CC2017-01 | 2017 | winter | 28 | 1027.7116 | 1 |
| CC2017-04 | 2017 | spring | 32 | 1265.2516 | 1 |
| CC2017-08 | 2017 | summer | 29 | 1284.8930 | 1 |
| CC2017-11 | 2017 | fall | 22 | 946.4949 | 1 |
| CC2018-02 | 2018 | winter | 18 | 631.6046 | 1 |
| CC2018-04 | 2018 | spring | 25 | 1109.4123 | 1 |
| CC2018-06 | 2018 | summer | 32 | 1198.7060 | 1 |
| CC2018-10 | 2018 | fall | 24 | 1082.9915 | 1 |
| CC2019-02 | 2019 | winter | 11 | 370.8165 | 1 |
| CC2019-04 | 2019 | spring | 20 | 920.8476 | 1 |
| CC2019-07 | 2019 | summer | 27 | 1251.9319 | 1 |
| CC2019-11 | 2019 | fall | 24 | 1072.9128 | 1 |
| CC2020-01 | 2020 | winter | 25 | 960.1047 | 1 |
| CC2020-04 | 2020 | spring | 1 | 0.0000 | 1 |
| CC2020-08 | 2020 | summer | 1 | 0.0000 | 1 |
| CC2020-11 | 2020 | fall | 1 | 0.0000 | 1 |
| CC2021-01 | 2021 | winter | 1 | 0.0000 | 1 |
| CC2021-04 | 2021 | spring | 1 | 0.0000 | 1 |
| CC2021-07 | 2021 | summer | 26 | 1258.6472 | 1 |
| CC2021-11 | 2021 | fall | 19 | 826.4556 | 1 |
| CC2204SH | 2022 | spring | 16 | 576.5156 | 1 |
| CC2208 | 2022 | summer | 47 | 790.1335 | 1 |
| CC2211SR | 2022 | fall | 19 | 705.2526 | 1 |
| CC2301RL | 2023 | winter | 13 | 594.6193 | 1 |
| CC2303SH | 2023 | spring | 20 | 801.7941 | 1 |
| CC2311SR | 2023 | fall | 21 | 929.0986 | 1 |
Two things are apparent:
- some cruises have 0 survey effort and only 1 transect.
- the protocol for naming cruises changed in spring 2022. Before that they had a 4 digit year, then a dash then the month the survey started; after that they have a 2-digit year and then 2 letters afterwards except for summer. No big deal I think, just thought it’s worth noting.
I will delete the cruises with 0 survey effort.
no_effort <- region_table[region_table$L == 0, ]$Region.Label
region_table <- region_table %>% filter(!(Region.Label %in% no_effort))
sample_table <- sample_table %>% filter(!(Region.Label %in% no_effort))Is it expected that these cruises have 0 survey effort? Is it okay to delete them?
3 Sightings data
In this section we read in the sightings data; we create and check the columns we need for distance sampling detection function analysis.
#Read in sightings data
#Increase guess_max to more than number of rows so it gets the column types right
sightings <- read_excel("./fromMichaela/CalCOFI_2004_2023_CombinedSightings_add_weather.xlsx",
guess_max = 7000, na = c("", "NA"))The following variables will be needed for the distance sampling analysis:
- perpendicular distance – see Section 3.1
- whether on effort and on transect - see Section 3.2
- transect (i.e., the transect the sighting was made on) - see Section 3.3
- species - will use the
Species1column - see Section 3.4 - school size - will use the
bestcolumn - see Section 3.5 - other potential covariates for the detection function - see Section 3.6
3.1 Perpendicular distances
Perpendicular distances are required for all observations. These were not supplied for 2022 and 2023 cruises, so will be calculated below. Before this, we check for the records before 2022 that there are perpendular distances for each record:
#Create an indicator for whether the record is 2022 or later
sightings$Year <- year(sightings$Date_Time_Local_)
ind <- sightings$Year >= 2022
#Count the number of records from before 2022 that have missing perpendicular distances
ind2 <- !ind & is.na(sightings$perp_dist_m)
sum(ind2)[1] 0
There are 0 records with missing perpendicular distances prior to 2022. So no problem there.
Now to obtaining perpendicular distances for 2022 and 2023 cruises. We use the binocular reticle readings (field BinoReticle), observer height and earth radius to calculate radial along-water distance (using formulae in Lerczak and Hobbs 1997). In some cases, the field distance_m was filled, which we take to mean that a radial distance was estimated directly (by eye) so we use this in place of any reticle measurement (in the cases we checked the reticle field was 0).
We then use ships bearing (ShipHeadingTru) and observaton bearing (SightingBearingTru) to get the observation angle and hence perpendicular distance.
#Warnings suppressed - they arise in cases where there is no reticle observation or
# sighting bearing (which can be the case when a direct radial distance is recorded)
#Get BinoReticle to radians conversion
sightings$BinoReticle <- as.numeric(sightings$BinoReticle)
sightings$ret_radians <- as.numeric(sightings$ret_radians)
ret_to_rad <- sightings$ret_radians[1] / sightings$BinoReticle[1]
sightings$ret_radians2 <- sightings$BinoReticle * ret_to_rad
radius_earth <- sightings$radius_earth[1]
#The following formulae are from Lerczak and Hobbs (1997)
#Get angle from horizontal to horizon
sightings$alpha <- atan(sqrt(2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2) / radius_earth)
#Get angle from marine mammal to platform
sightings$beta <- pi / 2 - sightings$alpha - sightings$ret_radians2
#Get distance along earth's surface to horizon
sightings$H <- sightings$alpha * radius_earth
#Get line-of-sight distance to marine mammal
sightings$D0 <- (radius_earth + sightings$ObsHeight_m_) * cos(sightings$beta) -
sqrt((radius_earth + sightings$ObsHeight_m_) ^ 2 * (cos(sightings$beta)) ^ 2 -
(2 * sightings$ObsHeight_m_ * radius_earth + sightings$ObsHeight_m_ ^ 2))
#Get central arc from marine mammal to platform
sightings$delta <- asin(sin(sightings$beta) * sightings$D0 / radius_earth)
#Get distance to marine mammal along the surface of the ocean
sightings$D <- sightings$delta * radius_earth
#If Distance_m_ column is filled in, use that instead
sightings$D2 <- sightings$D
ind2 <- sightings$Distance_m_ > 0
ind2[is.na(ind2)] <- FALSE
sightings$D2[ind2] <- sightings$Distance_m_[ind2]
#Get horizontal angle (in degrees) from trackline
tmp <- sightings$ShipHeadingTru - as.numeric(sightings$SightingBearingTru)
#Trim clearly wrong values
tmp[abs(tmp) > 360] <- NA
#Transform into the 0-90 degree quadrat (will be useful later for looking at forward distances)
sightings$angle2 <- asin(abs(sin(tmp/360 * (2 * pi)))) * 360 / (2 * pi)
#Convert to perpendicular distance
sightings$perp_dist_m2 <- sightings$D2 * abs(sin(sightings$angle2 / 360 * 2 * pi))One check that can be performed on the above is to check the calculated radial distances using Lerczak and Hobbs against those given in the sightings spreadsheet for sightings earlier than 2022. Here is a histogram of the distance differences:
ind3 <- sightings$dist_calc_m_ > 0
hist(sightings$D[ind3] - sightings$dist_calc_m_[ind3], xlab = "Difference in calculated radial distances (m)", main = "Difference between spreadsheet and Lerczak & Hobbs", breaks = 100)As can be seen, in the great majority of cases the difference is very small - but there are some larger differences.
Worth investigating further?
One other thing to note is that there are 443 records that don’t have a valid number in either the ShipHeadingTru or the SightingBearingTru column of the sightings spreadsheet.
Worth investigating further?
We now check whether any 2022/2023 data still have missing distances:
#Calculate how many missing distances we have from 2022 onwards
ind3 <- ind & is.na(sightings$perp_dist_m2)
tmp <- sightings[ind3, ] There are 19 of these.
I checked into the causes and emailed Michaela about them on 29th Nov.
For now I will drop these records …
sightings <- sightings[!ind3, ]… and copy over the newly calculated perp distances for 2022 onwards into the perp_dist_m field used by all the other records.
#Copy the perp_dist_m2 for these records to perp_dist_m
ind <- sightings$Year >= 2022
sightings$perp_dist_m[ind] <- sightings$perp_dist_m2[ind]3.2 Getting on-effort on-transect data
We wish to select only sightings data from when observers were on effort and the vessel was on transect. There are various potentially relevant fields here, so we should check which to use.
Check with Michaela and Simone which effort fields to use.
Three relevant-looking fiels are AdjustedBothOnEffortAndOnTransect, OriginalEffortFlag_ON_OFF_ and OriginalTransect_ON_OFF_. However, the former has some NA records, so it seems prudent to use a combination of the latter two.
#Check for NAs
sum(is.na(sightings$AdjustedBothOnEffortAndOnTransect))[1] 58
sum(is.na(sightings$OriginalEffortFlag_ON_OFF_))[1] 0
sum(is.na(sightings$OriginalTransect_ON_OFF_))[1] 0
#Turn to upper case as currently it's a mixture of both
sightings$AdjustedBothOnEffortAndOnTransect <- toupper(sightings$AdjustedBothOnEffortAndOnTransect)
sightings$OriginalEffortFlag_ON_OFF_ <- toupper(sightings$OriginalEffortFlag_ON_OFF_)
sightings$OriginalTransect_ON_OFF_ <- toupper(sightings$OriginalTransect_ON_OFF_)
#Create logical variables
sightings$AdjustedOnEffortAndOnTransect <- sightings$AdjustedBothOnEffortAndOnTransect == "ON"
sightings$OriginalOnEffort <- sightings$OriginalEffortFlag_ON_OFF_ == "ON"
sightings$OriginalOnTransect <- sightings$OriginalTransect_ON_OFF_ == "ON"
#Select data where OnEffort and OnTransect are ON
ind <- sightings$OriginalOnEffort & sightings$OriginalOnTransectOf the 6287 sighting records, the number made when both on effort and on transect according to the above criteria are 3388. We select just these data for futher processing.
sightings <- sightings[ind, ]Check with Michaela and Simone whether this level of data “loss” is okay. We could potentially use off transect observations for fitting the detection function if the crew were on effort – but this risks compromizing pooling robustness. Need to have a conversation about this.
3.3 Getting transect numbers for each sighting
Transect numbers were not provided with the sightings, but we can obtain them if we compare the time of the sighting with the start and end times of the effort segments. The sightings spreadsheet has a column Date_Time_Local_ and the effort spreadsheet has columns StartDateTimeLocal and EndDateTimeLocal.
The following code does this, and also collates any sightings that fall outside the effort segment times. (It also creates two new fields Region.Label and Sample.Label that will be used to join the effort data to these data in the Distance analysis later. Region.Label is simply the Cruise column from the effort spreadsheet – we have to copy it over because although Cruise is recorded in the sightings spreadsheet it is formatted differently. Sample.Label is the cruise code, a dash and then the transect number.)
n <- nrow(sightings)
first_reject <- TRUE; rejects <- NULL
sightings <- sightings %>% mutate(Region.Label = "", Sample.Label = "")
for (i in 1:n){
#Find records in effort file where start time is less than obs time and
# end time is more than obs time
ind <- (effort$StartDateTimeLocal <= sightings$Date_Time_Local_[i]) &
(effort$EndDateTimeLocal >= sightings$Date_Time_Local_[i])
which_ind <- which(ind)
if(length(which_ind)==1){
#Copy cruise from effort table - it's formatted differently in the sightings
sightings$Region.Label[i] <- effort$Cruise[which_ind]
#Create Sample.Label
sightings$Sample.Label[i] <- paste(effort$Cruise[which_ind], effort$Transect[which_ind], sep = "-")
} else{
if(first_reject) {
rejects <- sightings[i, ]
first_reject <- FALSE
} else {
rejects <- rbind(rejects, sightings[i, ])
}
}
}There are 537 sightings that fall outside of effort segment times. This is much more than I would expect, so I suspect that the on-effort-on-transect designation of the previous section has not worked as intended. On investigation, I also found a few bugs in the sightings Date_Time_Local_ field.
I emailed Michaela on 30th Nov for suggestions. Will need to clear up before proceeding with a final analysis.
For now I will remove all of the records that fall outside of effort segment times.
ind <- (sightings$Sample.Label == "")
sightings <- sightings[!ind, ]3.4 Species
Here is a printout of the sample size (of groups) in the sightings data, by species. The table doesn’t look fancy as the standard table formatting (using kbl) didn’t work with some of the characters in the cetacean codes CSV file (there are some odd line feed characters). Note also some codes are missing - but the general idea is there.
#Get species names
sp_names <- read.csv("./fromMichaela/CalCOFI_cetacean_codes.csv")
#Tabulate sightings and add in species names
sightings$Species1 <- toupper(sightings$Species1)
tmp <- tibble(data.frame(sort(table(sightings$Species1), decreasing = TRUE))) %>%
rename(Code = Var1) %>%
left_join(sp_names, by = join_by(Code == CalCOFI.NOAA.Species.Abbreviation))
print(tmp, n = Inf)# A tibble: 31 × 3
Code Freq Common.Name
<chr> <int> <chr>
1 DD 544 Short-beaked common dolphin
2 ULW 536 Unidentified Large Whale
3 DSP 310 Unidentified common dolphin
4 BP 248 Fin Whale
5 MN 211 Humpback Whale
6 UD 167 Unidentified Dolphin
7 BM 159 Blue Whale
8 DC 90 Long-beaked common dolphin
9 PD 86 Dall's porpoise
10 LO 77 Pacific white-sided dolphin
11 ER 71 Gray Whale
12 GG 66 Risso's dolphin
13 TT 63 Bottlenose dolphin
14 UC 56 Unidentified cetacean
15 PM 47 Sperm Whale
16 BA 24 Minke Whale
17 LB 22 Northern right-whale dolphin
18 OTH 19 <NA>
19 OO 12 Killer Whale
20 USC 11 Unidentified�small cetacean
21 SC 9 Striped dolphin
22 ZICA 9 Cuvier's beaked whale
23 BBO 2 <NA>
24 GM 2 Short-finned pilot whale
25 UBW 2 <NA>
26 USW 2 Unidentified�small whale
27 BB 1 Sei Whale
28 BEBA 1 �Baird's beaked whale�
29 PC 1 False Killer whale
30 UNZIPH 1 Unidentified ziphid
31 UO 1 Unidentified odontocete
For the purposes of the analyses undertaken here, we may be interested only in blue, fin and humpback whale; for the trends paper we will likely be interested in more species – but will want some sample size cutoff. Note also we will need to determine how to deal with the unidentified whale sightings (there are many, because the survey was undertaken in passing mode, with relatively inexperienced observers).
3.5 School size
In this section, we check a school size (column Best) is present for all sightings.
ind <- (is.na(sightings$Best))There are 1 records with missing school size. These are removed in the code below.
I checked with Michaela and she could not locate the school size so OK to drop.
sightings <- sightings[!ind, ]Note that there are some very large school sizes, and we will need to return to examine these before a sensible analysis can be done on the species involved.
summary(sightings$Best) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 4.00 39.23 22.00 4600.00
Here is a summary of the number of records of species with school sizes more than 100:
which_ind <- which(sightings$Best > 100)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)
DD DSP DC UD LB GG TT LO OTH SC
103 50 37 18 3 2 2 1 1 1
and more than 1000:
which_ind <- which(sightings$Best > 1000)
sort(table(sightings[which_ind, ]$Species1), decreasing = TRUE)
DD DC UD DSP
5 4 2 1
These are all one of the two species of common dolphin, or UID dolphin.
3.6 Other covariates for the detection function
In Campbell et al. (2015), they used: “Beaufort sea state (numerical: 0–5), ship (categorical), swell (numerical: wave height in m), school size (numerical), and, for short-beaked common dolphins, school size class (categorical).” We have already looked at school size, but we consider the others (and some alternatives) below.
In addition, it was suggested that observer would be a potentially relevant covariate, but there are too many for this to be possible; this is also true for cruise. Another possible covariate not considered by Campbell et al. is season. It could be argued that some of the sightability variation here would be captured by sea state and swell – but there are potentially animal behavioural (and other) factors that could be included with season so it seems worth adding.
Cue is also recorded as a variable, and could be useful - but there are many missing values so we do not pursue it here.
sum(is.na(sightings$Cue))[1] 596
There are also a suite of observational variables such as Quality, Visiblity, Precipitation, Could, Glare - but these all have many missing values.
3.6.1 Sea state and swell
For sea state and swell, in an initial round of exploratory analysis we found many missing records. On investigation, Michaela found there were not contemporaneous records for these, and filled in the values with the closest (in time) available record.
Note to self - check if these are both Beaufort and swell or just Beaufort that were filled in by Michaela.
Check there are no missing records:
#Look at beaufort and swell
sum(is.na(sightings$Bft))[1] 0
sum(is.na(sightings$Swell_ft_))[1] 0
We were advised to consider truncating records too far away in time. Here is a histogram of the time delays (note that only records that were filled in are shown here, as the others have a Time_Diff_minutes of NA).
hist(sightings$Time_Diff_minutes/60, main = "Time delay between sighting and sea conditions", xlab = "Time delay (hours)", breaks = 100)There are a small number that are very large and will clearly need truncated – here is a histogram with these removed.
ind <- abs(sightings$Time_Diff_minutes) <= 6000
hist(sightings$Time_Diff_minutes[ind]/60, main = "Time delay between sightings and sea conditions (truncated)", xlab = "Time delay (hours)", breaks = 100)Following Michela’s suggestion, we will truncate any records with a time delay of more than 8 hours.
ind <- abs(sightings$Time_Diff_minutes) > 8 * 60
ind[is.na(ind)] <- FALSEThis will result in truncating 66 out of 2850 remaining records. Note that some of these are sightings with small perpendicular distances, so it is a shame to truncate them as they will affect the analysis. However, it seems the best choice in the circumstances.
sightings <- sightings[!ind, ]There are the frequency of the different Beaufort values – note this is in the sightings database, so it does not reflect the proportion of time surveying in the different conditions if there are fewer sightings in higher Beauforts (which may reflect that there was little surveying done in such Beauforts, and/or that animals are hard to sight in those conditions):
table(sightings$Bft)
0 1 2 3 4 5 6
13 220 698 824 544 480 5
and here is a histogram of swell height
hist(sightings$Swell_ft_, xlab = "Swell height (ft)", main = "Swell height in sightings records")One thing to check about Beaufort and swell is whether they are strongly correlated – if so it may not be wise to try them both as covariates in the detection function modelling. However (and perhaps surprisingly), the correlation is not very strong:
cor(sightings$Bft, sightings$Swell_ft_)[1] 0.5623662
3.6.2 Ship / observer height
Regarding the possible ship covariate, we do not have this in the dataset – but we do have observer height, which is strongly related and potentially relevant.
Check if we don’t have ship?
First to check that observer height is not missing for any records:
sum(is.na(sightings$ObsHeight_m_))[1] 0
Looking at the distribution of observer heights, there are only a few values and so it does not seem suitable for a continuous numerical covariate:
hist(sightings$ObsHeight_m_, breaks = 100, xlab = "Observer height (m)", main = "Observer height in sightings records")It seems reasonable to group them into three classes: 9m or less, 9-12m and more than 12m (coded as A, B and C).
Check with Simone and Michaela if this is reasonable, as they likely know the sighting conditions associated with the different ships at these heights.
sightings$fObsHeight <- as.factor(ifelse(sightings$ObsHeight_m_ <= 9, "A",
ifelse(sightings$ObsHeight_m_ <= 12, "B", "C")))Note that Distance does not allow ordinal classes, so we won’t be able to constrain estimated sightability so that the detecability of C is more than B and this is more than A. We will have to see how things come out in the estimates.
4 Exploratory analysis of distances
After the various data selection noted above, there are a total of 2784 records remaining in the sightings data. Here we explore some aspects of the distance data, to try to identify any issues that might affect the detection function modelling.
First, we plot a histogram of the distances (pooled across species)
hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)",
main = "Perpendicular distances")We see a large peak at the lowest distance bin, and a very long right tail, out to 12km. It is typical to truncate the data (in Campbell et al. this was at 2400m for large cetaceans and 500 or 600m for small ones); to focus the plots we truncate off the largest 10% of distances.
trunc_90 <- ceiling(quantile(sightings$perp_dist_m, 0.9))
hist(sightings$perp_dist_m, breaks = 200, xlab = "Perpendicular distance (m)",
main = "Perpendicular distances (truncated)",
xlim = c(0, trunc_90))The large peak at lowest distances could be due to rounding of perpendicular distances to zero. Because these data are from radial distance and angle, this could come from rounding angles to zero. Of the 2784 sightings, 36 have exactly 0 distance, so it seems unlikely there is a strong problem with rounding of angles to zero. Nevertheless, here is a histogram of the angles we have (recall from earlier that some of the angles are missing), with the quadrants folded so that all angles are between 0 and 90 degrees:
hist(sightings$angle2, breaks = seq(-0.5, 90.5, by = 1), xlab = "Sighting angle (degrees)", main = "Sighting angles")There is clear evidence of rounding to the nearest 5 degrees, and this could account for some of the spike at zero - but likely not much of it as there are only 36 zero angles.
As an aside, it seems strange to me that we could have rounding of angles to the nearest 5 degrees if the bearing to the animal and ship’s bearing were measured separately – presumably they were not measured separately (or perhaps they were both rounded to the nearest 5 degrees).
Discuss how bearings were obtained, and whether an angle board or similar was used.
More insight could be gained by viewing a 2D map of the sightings (projected into the upper right quadrant). For this we need to know forward as well as perpendicular distances, so we can only do it for the observations where we have an observation angle (note that this is not all the animals). Here is a 2D plot for all of these observations
#Look at forward distances
sightings$forward_distance <- sightings$perp_dist_m / tan(sightings$angle2 / 360 * (2 * pi))
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)",
ylab = "Forward distance (m)")We can clearly see bands that correspond with rounding of radial distances (perhaps acutal distances, or reticle marks). However, our primary interest is in the detections close to zero perpendicular distance. Here we plot all sightings within 100m perpendicular distance:
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)",
ylab = "Forward distance (m)", xlim = c(0, 100),
ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 100], na.rm = TRUE)))
for(i in 1:5){
abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}We can see diagonal lines of observations running out at 1 degree, 2 degrees, etc (first 5 degrees out highlighted with dashed lines). This is to be expected and not a problem.
Here is the same plot truncated at 50m perpendicular distance.
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)",
ylab = "Forward distance (m)", xlim = c(0, 50),
ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 50], na.rm = TRUE)))
for(i in 1:5){
abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}It looks like many of the small perpendicular distance observations were made at angles close to 90 degrees from the transect line – animals that were almost abeam of the ship.
This can be seen even more clearly when truncating at 20m perpendicular distance:
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)",
ylab = "Forward distance (m)", xlim = c(0, 20),
ylim = c(0, max(sightings$forward_distance[sightings$perp_dist_m < 20], na.rm = TRUE)))
for(i in 1:5){
abline(a = 0, b = tan((90 - i) / 360 * 2 * pi), lty = 2)
}We also see that all of the sightings at 2m or less perpendicular distance were made at small forward distances, except one sighting at 200m forward distance. These detections may have been bow-riding animals.
We make one final 2D plot looking only at animals detected within 20m perpendicular and forward distance:
plot(sightings$perp_dist_m, sightings$forward_distance, xlab = "Perpendicular distance (m)",
ylab = "Forward distance (m)", xlim = c(0, 20),
ylim = c(0, 20))We now look into which species are involved. Here we look at the species detected within 20m perpendicular distance:
table(sightings$Species1[sightings$perp_dist_m < 20])
BA BM BP DC DD DSP ER GG LB LO MN PD TT UD ULW
4 3 12 6 72 20 2 2 5 12 7 12 3 4 7
and within 20m perpendicular distance and 20m forward distance (note this includes only those detections with an angle and therefore a forward distance)
table(sightings$Species1[sightings$perp_dist_m < 20 & sightings$forward_distance < 20])
BA BP DC DD DSP GG LB LO MN PD UD ULW
1 2 3 41 7 1 2 5 1 4 3 2
Short-beaked common dolphin (DD) dominates with UID common dolphin (DSP) a close second.
However, if we look at perpendicular distances out to (say) 200m by 10m for DD & UID …
ind <- sightings$Species1 %in% c("DD", "DSP")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10),
xlab = "Perpendicular distance (m)",
main = "Pdist (truncated) DD and DSP")and for the other species in the dataset…
hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10),
xlab = "Perpendicular distance (m)",
main = "Pdist (truncated) other species")we see that both have the spike at very small perpendicular distances (0-10m in these plots).
We can try separating all dolphin (and porpoise) species from the others:
ind <- sightings$Species1 %in% c("DD", "DSP", "UD", "DC", "PD", "LO", "GG", "TT", "LB", "USC", "SC")
ind2 <- sightings$perp_dist_m <= 200
hist(sightings$perp_dist_m[ind & ind2], breaks = seq(0, 200, by = 10),
xlab = "Perpendicular distance (m)",
main = "Pdist (truncated) dolphins")hist(sightings$perp_dist_m[!ind & ind2], breaks = seq(0, 200, by = 10),
xlab = "Perpendicular distance (m)",
main = "Pdist (truncated) non-dolphins")So, although bow-riding may play a part, it seems to be something that affects non-dolphin species to some extent also.
Here we break down by species, for species with more than 60 detections. The first plot is out to 2400m in increments of 100m, while the second is out to 500m in increments of 20m.
tmp <- table(sightings$Species1)
sp <- names(tmp[tmp>=60])
p <- sightings %>% filter(Species1 %in% sp) %>%
ggplot (aes(perp_dist_m)) +
geom_histogram(breaks = seq(0, 2400, by = 100)) +
facet_wrap(~Species1, scales = "free")
pp <- sightings %>% filter(Species1 %in% sp) %>%
ggplot (aes(perp_dist_m)) +
geom_histogram(breaks = seq(0, 500, by = 20)) +
facet_wrap(~Species1, scales = "free")
pIn these plots, we see a spike at small distances in DD, DSP, LO, PD and TT – all dolphin (and related small) species.
We could envisage further exploratory analyses, but here we move on to detection function modelling.
5 Detection function modelling
5.1 Create observation table
In the code below we pull out from the sightings table just the variables that will be used in the distance analysis. As a check we print a summary of the resulting dataset.
obs_table <- sightings %>%
rename(sighting_no = SightingNo, date_time = Date_Time_Local_,
season = Season, species = Species1, distance = perp_dist_m, ss = Bft,
swell = Swell_ft_, size = Best, obs_height = ObsHeight_m_, f_obs_height = fObsHeight,
season = Season) %>%
mutate(date = date(date_time), year = year(date), object = 1:n()) %>%
select(object, Region.Label, Sample.Label, sighting_no, date, date_time, year, season, species, distance, ss,
swell, size, obs_height, f_obs_height)
summary(obs_table) object Region.Label Sample.Label sighting_no
Min. : 1.0 Length:2784 Length:2784 Length:2784
1st Qu.: 696.8 Class :character Class :character Class :character
Median :1392.5 Mode :character Mode :character Mode :character
Mean :1392.5
3rd Qu.:2088.2
Max. :2784.0
date date_time year
Min. :2004-07-13 Min. :2004-07-13 05:34:00.00 Min. :2004
1st Qu.:2009-03-13 1st Qu.:2009-03-14 05:07:00.00 1st Qu.:2009
Median :2014-04-11 Median :2014-04-11 12:29:30.00 Median :2014
Mean :2014-01-12 Mean :2014-01-12 20:26:40.66 Mean :2014
3rd Qu.:2018-04-16 3rd Qu.:2018-04-16 18:35:56.00 3rd Qu.:2018
Max. :2023-11-18 Max. :2023-11-18 06:41:00.00 Max. :2023
season species distance ss
Length:2784 Length:2784 Min. : 0.0 Min. :0.000
Class :character Class :character 1st Qu.: 191.3 1st Qu.:2.000
Mode :character Mode :character Median : 552.8 Median :3.000
Mean : 946.4 Mean :3.123
3rd Qu.: 1245.3 3rd Qu.:4.000
Max. :11827.0 Max. :6.000
swell size obs_height f_obs_height
Min. : 0.000 Min. : 0.0 Min. : 7.10 A:1175
1st Qu.: 3.000 1st Qu.: 1.0 1st Qu.: 8.10 B: 844
Median : 4.000 Median : 4.0 Median :10.36 C: 765
Mean : 3.944 Mean : 39.4 Mean :10.38
3rd Qu.: 5.000 3rd Qu.: 22.0 3rd Qu.:13.30
Max. :12.000 Max. :4600.0 Max. :17.00
We proceed now with detection function modelling by species.
Text goes here - by species.
6 Density estimation
7 References
- Campbell et al. (2015) Inter-annual and seasonal trends in cetacean distribution, density and abundance off southern California. doi:10.1016/j.dsr2.2014.10.008
- Lerzack and Hobbs (1997) Calculating sighting distances from angular readings during shipboard, aerial, and shore-based marine mamaml surveys. doi:10.1111/j.1748-7692.1998.tb00745.x