Introduction

This document outlines the analysis of marine mammal (Sea Otter - SEOT, Harbor Seal - HASE) behavior in response to human activities within the study harbor. Data were collected using Instantaneous Scan Sampling (ISS) to capture broad presence and co-occurrence patterns, and Instantaneous Focal Follows (IFF) to detail behavioral states, durations, and specific interactions. The goal is to understand how human presence, activity type, and proximity influence marine mammal behavior, providing insights relevant to local wildlife management and conservation.

Methods

(write up methods)

Data Loading and Preparation

First, we load necessary libraries and the datasets.

# Load core libraries for data manipulation and visualization
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr) # For nice tables
# Load datasets (ensure CSV files are in the project directory or provide full path)
ISS <- read.csv("ISS.csv", stringsAsFactors = TRUE, na.strings=c("NA", "")) # Treat blank cells as NA
IFF <- read.csv("IFF.csv", stringsAsFactors = TRUE, na.strings=c("NA", ""))

# Initial inspection
# print("ISS Head:")
# kable(head(ISS))
# print("IFF Head:")
# kable(head(IFF))

# print("ISS Structure:")
# str(ISS)
# print("IFF Structure:")
# str(IFF)

Data Cleaning and Preparation

We will prepare the datasets by creating necessary helper columns and handling missing values where appropriate for specific analyses.

# ISS Data Preparation
# Convert SEOT/HASE counts to presence/absence (0 or 1)
ISS <- ISS %>%
  mutate(
    SEOT_present = ifelse(SEOT > 0, 1, 0),
    HASE_present = ifelse(HASE > 0, 1, 0),
    MM_present = ifelse(SEOT_present > 0 | HASE_present > 0, 1, 0),
    Humans_present = ifelse(Humans > 0, 1, 0)
  )

# Define human activity columns for easier reference
human_activity_cols_iss <- c("ST", "WA", "WADO", "No.dogs", "BO", "ONBO", "TEBO", "PH", "OT")

# Check Db column for potential issues (e.g., if needed for Q13)
# summary(ISS$Db)
# sum(is.na(ISS$Db)) # Count NAs in Db - Note: Many NAs observed in sample data. Will limit noise analysis.
# IFF Data Preparation
# Check data types and potential issues
# summary(IFF)
# str(IFF)

# Convert relevant columns to factors if not already
IFF$Species <- factor(IFF$Species)
IFF$Behavior <- factor(IFF$Behavior)
IFF$Activity <- factor(IFF$Activity)
IFF$HWI <- factor(IFF$HWI)
IFF$Dist.bin <- factor(IFF$Dist.bin, levels = c("1", "2", "3", "4"), ordered = TRUE) # Ensure ordinal factor

# Filter out rows with NA state duration as they are not useful for duration analysis
IFF <- IFF %>% filter(!is.na(State.Duration.sec))

# Create separate datasets for SEOT and HASE focal follows for species-specific analysis
IFF_SEOT <- IFF %>% filter(Species == "SEOT")
IFF_HASE <- IFF %>% filter(Species == "HASE")

# Define key behavioral states for analysis (adjust as needed)
key_behaviors <- c("resting", "grooming", "swimming", "foraging", "alert", "periscoping", "dive", "locomoting", "mating", "socializing", "hauled out") # Based on IFF unique values

Analytical Approach

We will use a combination of descriptive statistics, contingency table analysis (Chi-squared tests), non-parametric tests for comparing distributions (Kruskal-Wallis, Wilcoxon rank-sum), and potentially generalized linear models (GLMs) where appropriate for count or duration data. Analyses will be performed separately for SEOT and HASE where relevant to investigate species-specific responses. Significance level is set at α = 0.05.

Results

Question 1: Baseline Patterns and Co-occurrence

Q1a/Q6: How often do humans and Marine Mammals (MMs) co-occur in the harbor?

Hypothesis:

  • H0: The presence of marine mammals (SEOT or HASE) is independent of the presence of humans in the harbor scan sites.
  • H1: The presence of marine mammals is associated (positively or negatively) with the presence of humans.

Analysis: We use the ISS dataset to calculate co-occurrence frequencies and perform Chi-squared tests for independence.

# Overall Human and MM Presence (from ISS)
total_scans <- nrow(ISS)
scans_with_humans <- sum(ISS$Humans_present, na.rm = TRUE)
scans_with_seot <- sum(ISS$SEOT_present, na.rm = TRUE)
scans_with_hase <- sum(ISS$HASE_present, na.rm = TRUE)
scans_with_mm <- sum(ISS$MM_present, na.rm = TRUE)

cat("Total Scans:", total_scans, "\n")
## Total Scans: 1457
cat("Scans with Humans:", scans_with_humans, "(", round(100*scans_with_humans/total_scans, 1), "%)\n")
## Scans with Humans: 769 ( 52.8 %)
cat("Scans with SEOT:", scans_with_seot, "(", round(100*scans_with_seot/total_scans, 1), "%)\n")
## Scans with SEOT: 103 ( 7.1 %)
cat("Scans with HASE:", scans_with_hase, "(", round(100*scans_with_hase/total_scans, 1), "%)\n")
## Scans with HASE: 56 ( 3.8 %)
cat("Scans with Any MM:", scans_with_mm, "(", round(100*scans_with_mm/total_scans, 1), "%)\n")
## Scans with Any MM: 156 ( 10.7 %)
# Contingency Table: SEOT vs Human Presence
seot_human_table <- table(SEOT_Present = ISS$SEOT_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: SEOT vs Humans\n")
## 
## Contingency Table: SEOT vs Humans
kable(seot_human_table)
0 1
0 645 709
1 43 60
# Proportion Table: SEOT given Human Presence/Absence
cat("\nProportion of Scans with SEOT (Given Human Presence/Absence)\n")
## 
## Proportion of Scans with SEOT (Given Human Presence/Absence)
kable(prop.table(seot_human_table, margin = 2), digits = 3) # margin=2 for column proportions
0 1
0 0.938 0.922
1 0.062 0.078
# Chi-squared Test: SEOT vs Humans
cat("\nChi-squared Test: SEOT vs Humans\n")
## 
## Chi-squared Test: SEOT vs Humans
seot_human_chisq <- chisq.test(seot_human_table)
print(seot_human_chisq)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  seot_human_table
## X-squared = 1.1062, df = 1, p-value = 0.2929
# Contingency Table: HASE vs Human Presence
hase_human_table <- table(HASE_Present = ISS$HASE_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: HASE vs Humans\n")
## 
## Contingency Table: HASE vs Humans
kable(hase_human_table)
0 1
0 646 755
1 42 14
# Proportion Table: HASE given Human Presence/Absence
cat("\nProportion of Scans with HASE (Given Human Presence/Absence)\n")
## 
## Proportion of Scans with HASE (Given Human Presence/Absence)
kable(prop.table(hase_human_table, margin = 2), digits = 3)
0 1
0 0.939 0.982
1 0.061 0.018
# Chi-squared Test: HASE vs Humans
cat("\nChi-squared Test: HASE vs Humans\n")
## 
## Chi-squared Test: HASE vs Humans
hase_human_chisq <- chisq.test(hase_human_table)
print(hase_human_chisq)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  hase_human_table
## X-squared = 16.892, df = 1, p-value = 3.956e-05
# Contingency Table: Any MM vs Human Presence
mm_human_table <- table(MM_Present = ISS$MM_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: Any MM vs Humans\n")
## 
## Contingency Table: Any MM vs Humans
kable(mm_human_table)
0 1
0 604 697
1 84 72
# Proportion Table: Any MM given Human Presence/Absence
cat("\nProportion of Scans with Any MM (Given Human Presence/Absence)\n")
## 
## Proportion of Scans with Any MM (Given Human Presence/Absence)
kable(prop.table(mm_human_table, margin = 2), digits = 3)
0 1
0 0.878 0.906
1 0.122 0.094
# Chi-squared Test: Any MM vs Humans
cat("\nChi-squared Test: Any MM vs Humans\n")
## 
## Chi-squared Test: Any MM vs Humans
mm_human_chisq <- chisq.test(mm_human_table)
print(mm_human_chisq)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mm_human_table
## X-squared = 2.7869, df = 1, p-value = 0.09504

Interpretation: Based on the proportions and Chi-squared test results, we can determine if the presence of SEOT, HASE, or any MM is significantly associated with human presence during scans. For example, a significant p-value (< 0.05) would lead us to reject the null hypothesis of independence. The proportion tables indicate the direction of association (e.g., are MMs more or less likely to be seen when humans are present?).

Q1/Q1b: What are the frequencies of different human activities?

Analysis: Simple summary counts from the ISS dataset.

activity_counts <- ISS %>%
  summarise(across(all_of(human_activity_cols_iss), sum, na.rm = TRUE)) %>%
  pivot_longer(everything(), names_to = "Activity", values_to = "TotalOccurrences")

# Counts of scans where each activity was present (>=1 person doing it)
activity_presence_counts <- ISS %>%
  summarise(across(all_of(human_activity_cols_iss), ~ sum(. > 0, na.rm = TRUE))) %>%
  pivot_longer(everything(), names_to = "Activity", values_to = "ScansPresent")

activity_summary <- full_join(activity_counts, activity_presence_counts, by = "Activity")

cat("Summary of Human Activity Occurrences (ISS Data):\n")
## Summary of Human Activity Occurrences (ISS Data):
kable(arrange(activity_summary, desc(ScansPresent)))
Activity TotalOccurrences ScansPresent
TEBO 1829 500
WA 1220 339
ONBO 450 113
ST 444 85
No.dogs 77 61
WADO 74 50
OT 25 11
PH 10 5
BO 18 4
# Basic bar chart visualization
ggplot(activity_summary, aes(x = reorder(Activity, -ScansPresent), y = ScansPresent)) +
  geom_col() +
  labs(title = "Number of Scans Where Each Human Activity Was Present",
       x = "Human Activity Code",
       y = "Number of Scans") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Interpretation: This provides a baseline understanding of which human activities are most common in the harbor during the study period based on scan samples.

Question 2: Behavioral Responses to Humans (IFF Data)

This section focuses on the detailed focal follow data (IFF) to understand how MM behavior changes in response to human presence, activity type, and proximity.

Q2a/Q5: Does MM behavior duration change based on human activity or distance?

Hypotheses:

  • H0 (Activity): The duration of a specific marine mammal behavior (e.g., Alert, Foraging, Resting) does not differ significantly across different types of human activity present.
  • H1 (Activity): The duration of a specific marine mammal behavior differs significantly across different types of human activity present.
  • H0 (Distance): The duration of a specific marine mammal behavior does not differ significantly across different distance bins to humans.
  • H1 (Distance): The duration of a specific marine mammal behavior differs significantly across different distance bins to humans.

Analysis: Due to non-normality of duration data (as explored previously), we use the Kruskal-Wallis test. We analyze SEOT and HASE separately. We focus on key behaviors where changes might be expected.

# Filter IFF data for key behaviors and valid activity/distance data
IFF_analysis <- IFF %>%
  filter(Behavior %in% key_behaviors,
         !is.na(State.Duration.sec),
         !is.na(Activity) | !is.na(Dist.bin)) # Need at least one predictor

# Filter out activities with very few observations during focals if needed
activity_counts_iff <- IFF_analysis %>% count(Activity) %>% filter(!is.na(Activity))
common_activities <- activity_counts_iff %>% filter(n > 5) %>% pull(Activity) # Example threshold: >5 occurrences
IFF_analysis <- IFF_analysis %>% filter(Activity %in% common_activities | is.na(Activity))

# Separate by species
IFF_SEOT_analysis <- IFF_analysis %>% filter(Species == "SEOT")
IFF_HASE_analysis <- IFF_analysis %>% filter(Species == "HASE")

# Further filter for distance analysis (need non-NA Dist.bin)
IFF_SEOT_dist <- IFF_SEOT_analysis %>% filter(!is.na(Dist.bin))
IFF_HASE_dist <- IFF_HASE_analysis %>% filter(!is.na(Dist.bin))

# Further filter for activity analysis (need non-NA Activity)
IFF_SEOT_act <- IFF_SEOT_analysis %>% filter(!is.na(Activity) & Activity != "")
IFF_HASE_act <- IFF_HASE_analysis %>% filter(!is.na(Activity) & Activity != "")

Analysis: Duration vs. Distance Bin

# --- SEOT ---
cat("--- SEOT: Duration vs Distance Bin ---\n")
## --- SEOT: Duration vs Distance Bin ---
seot_kw_dist_results <- list()
# Loop through SEOT behaviors present in the distance-filtered data
for (beh in intersect(key_behaviors, unique(IFF_SEOT_dist$Behavior))) {
  data_subset <- IFF_SEOT_dist %>% filter(Behavior == beh)
  # Check if there are multiple distance bins represented and enough data points
  if (length(unique(data_subset$Dist.bin)) > 1 && nrow(data_subset) > 5) {
    test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Dist.bin, data = data_subset), error = function(e) NULL)
    # If test runs without error, store and print
    if (!is.null(test_result)) {
      cat("\nBehavior:", as.character(beh), "\n")
      print(test_result)
      seot_kw_dist_results[[as.character(beh)]] <- test_result
    }
  }
}
## 
## Behavior: resting 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 1.325, df = 2, p-value = 0.5156
## 
## 
## Behavior: swimming 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 0.85714, df = 2, p-value = 0.6514
## 
## 
## Behavior: foraging 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 0.95455, df = 3, p-value = 0.8122
## 
## 
## Behavior: alert 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 2.1055, df = 3, p-value = 0.5508
# Visualization: SEOT Duration vs Distance Bin (ONLY if results exist)
if (length(seot_kw_dist_results) > 0) {
  print( # Explicitly print ggplot object from within if statement
    ggplot(IFF_SEOT_dist %>% filter(Behavior %in% names(seot_kw_dist_results)),
           aes(x = Dist.bin, y = State.Duration.sec)) +
      geom_boxplot() +
      facet_wrap(~ Behavior, scales = "free_y") +
      labs(title = "SEOT Behavior Duration by Binned Distance to Humans",
           subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
           x = "Distance Bin (1=Closest, 4=Farthest)", y = "State Duration (sec)") +
      theme_bw()
  )
} else {
  cat("\nNo significant Kruskal-Wallis results found for SEOT duration vs distance; skipping plot.\n")
}

# --- HASE ---
cat("\n--- HASE: Duration vs Distance Bin ---\n")
## 
## --- HASE: Duration vs Distance Bin ---
hase_kw_dist_results <- list()
# Loop through HASE behaviors present in the distance-filtered data
for (beh in intersect(key_behaviors, unique(IFF_HASE_dist$Behavior))) {
  data_subset <- IFF_HASE_dist %>% filter(Behavior == beh)
  # Check if there are multiple distance bins represented and enough data points
   if (length(unique(data_subset$Dist.bin)) > 1 && nrow(data_subset) > 5) {
    test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Dist.bin, data = data_subset), error = function(e) NULL)
    # If test runs without error, store and print
    if (!is.null(test_result)) {
      cat("\nBehavior:", as.character(beh), "\n")
      print(test_result)
      hase_kw_dist_results[[as.character(beh)]] <- test_result
    }
  }
}

# Visualization: HASE Duration vs Distance Bin (ONLY if results exist)
if (length(hase_kw_dist_results) > 0) {
  print( # Explicitly print ggplot object
    ggplot(IFF_HASE_dist %>% filter(Behavior %in% names(hase_kw_dist_results)),
           aes(x = Dist.bin, y = State.Duration.sec)) +
      geom_boxplot() +
      facet_wrap(~ Behavior, scales = "free_y") +
      labs(title = "HASE Behavior Duration by Binned Distance to Humans",
           subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
           x = "Distance Bin (1=Closest, 4=Farthest)", y = "State Duration (sec)") +
      theme_bw()
   )
} else {
  cat("\nNo significant Kruskal-Wallis results found for HASE duration vs distance; skipping plot.\n")
}
## 
## No significant Kruskal-Wallis results found for HASE duration vs distance; skipping plot.

Analysis: Duration vs. Human Activity

# --- SEOT ---
cat("\n--- SEOT: Duration vs Human Activity ---\n")
## 
## --- SEOT: Duration vs Human Activity ---
seot_kw_act_results <- list()
# Loop through SEOT behaviors present in the activity-filtered data
for (beh in intersect(key_behaviors, unique(IFF_SEOT_act$Behavior))) {
  data_subset <- IFF_SEOT_act %>% filter(Behavior == beh)
  # Check if there are multiple activity types represented and enough data points
  if (length(unique(data_subset$Activity)) > 1 && nrow(data_subset) > 5) {
     test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Activity, data = data_subset), error = function(e) NULL)
    # If test runs without error, store and print
    if (!is.null(test_result)) {
      cat("\nBehavior:", as.character(beh), "\n")
      print(test_result)
      seot_kw_act_results[[as.character(beh)]] <- test_result
    }
  }
}
## 
## Behavior: resting 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 7.2533, df = 3, p-value = 0.06425
## 
## 
## Behavior: grooming 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 2.4643, df = 3, p-value = 0.4818
## 
## 
## Behavior: alert 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 6.859, df = 3, p-value = 0.07653
# Visualization: SEOT Duration vs Activity (ONLY if results exist)
if(length(seot_kw_act_results) > 0) {
 print( # Explicitly print ggplot object
   ggplot(IFF_SEOT_act %>% filter(Behavior %in% names(seot_kw_act_results)),
          aes(x = Activity, y = State.Duration.sec)) +
     geom_boxplot() +
     facet_wrap(~ Behavior, scales = "free_y") +
     labs(title = "SEOT Behavior Duration by Human Activity",
          subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
          x = "Human Activity Code", y = "State Duration (sec)") +
     theme_bw() +
     theme(axis.text.x = element_text(angle = 45, hjust = 1))
 )
} else {
 cat("\nNo significant Kruskal-Wallis results found for SEOT duration vs activity; skipping plot.\n")
}

# --- HASE ---
cat("\n--- HASE: Duration vs Human Activity ---\n")
## 
## --- HASE: Duration vs Human Activity ---
hase_kw_act_results <- list()
# Loop through HASE behaviors present in the activity-filtered data
for (beh in intersect(key_behaviors, unique(IFF_HASE_act$Behavior))) {
  data_subset <- IFF_HASE_act %>% filter(Behavior == beh)
  # Check if there are multiple activity types represented and enough data points
  if (length(unique(data_subset$Activity)) > 1 && nrow(data_subset) > 5) {
    test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Activity, data = data_subset), error = function(e) NULL)
    # If test runs without error, store and print
     if (!is.null(test_result)) {
      cat("\nBehavior:", as.character(beh), "\n")
      print(test_result)
      hase_kw_act_results[[as.character(beh)]] <- test_result
     }
  }
}

# Visualization: HASE Duration vs Activity (ONLY if results exist)
if(length(hase_kw_act_results) > 0) {
  print( # Explicitly print ggplot object
    ggplot(IFF_HASE_act %>% filter(Behavior %in% names(hase_kw_act_results)),
           aes(x = Activity, y = State.Duration.sec)) +
      geom_boxplot() +
      facet_wrap(~ Behavior, scales = "free_y") +
      labs(title = "HASE Behavior Duration by Human Activity",
           subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
           x = "Human Activity Code", y = "State Duration (sec)") +
      theme_bw() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1))
  )
} else {
  cat("\nNo significant Kruskal-Wallis results found for HASE duration vs activity; skipping plot.\n")
}
## 
## No significant Kruskal-Wallis results found for HASE duration vs activity; skipping plot.

Post-Hoc Testing (If Kruskal-Wallis is Significant)

If the Kruskal-Wallis tests above show significant differences (p < 0.05) for a given behavior and factor (distance or activity), pairwise comparisons are needed to determine which* specific groups differ.*

# Example for SEOT Alert duration vs Distance Bin (only run if KW was significant)
beh_to_test <- "alert"
if (exists("seot_kw_dist_results") && !is.null(seot_kw_dist_results[[beh_to_test]]) && seot_kw_dist_results[[beh_to_test]]$p.value < 0.05) {
  cat("\n--- Post-hoc Test: SEOT Alert Duration vs Distance Bin ---\n")
  dunn_result <- pairwise.wilcox.test(IFF_SEOT_dist$State.Duration.sec[IFF_SEOT_dist$Behavior == beh_to_test],
                                      IFF_SEOT_dist$Dist.bin[IFF_SEOT_dist$Behavior == beh_to_test],
                                      p.adjust.method = "holm")
  print(dunn_result)
}

# Example for SEOT Alert duration vs Activity (only run if KW was significant)
beh_to_test <- "alert"
activity_data_subset <- IFF_SEOT_act %>% filter(Behavior == beh_to_test)
if (exists("seot_kw_act_results") && !is.null(seot_kw_act_results[[beh_to_test]]) && seot_kw_act_results[[beh_to_test]]$p.value < 0.05) {
 cat("\n--- Post-hoc Test: SEOT Alert Duration vs Activity ---\n")
 # Note: pairwise.wilcox.test might give warnings if group sizes are small. Consider alternatives if needed.
 dunn_result_act <- pairwise.wilcox.test(activity_data_subset$State.Duration.sec,
                                     activity_data_subset$Activity,
                                     p.adjust.method = "holm")
 print(dunn_result_act)
}

# Repeat similar post-hoc tests for other significant KW results (for SEOT and HASE)

Interpretation: The Kruskal-Wallis results indicate whether there is an overall significant difference in behavior duration based on distance or activity. Significant results (p < 0.05) reject the null hypothesis. Post-hoc tests (if run) pinpoint which specific distance bins or activities lead to significantly different durations for that behavior, for each species. Visualizations help illustrate these potential differences.

Q1c: Which human activities are disproportionately involved in HWIs?

Hypothesis:

  • H0: The proportion of interactions classified as HWI is independent of the type of human activity present.
  • H1: The proportion of interactions classified as HWI differs significantly depending on the type of human activity present.

Analysis: We calculate the proportion of ‘yes’ HWIs for each activity observed during focal follows and use a Chi-squared test.

# Use IFF data where Activity and HWI are recorded
IFF_hwi_act <- IFF %>%
  filter(!is.na(Activity), Activity != "", HWI %in% c("yes", "no"))

# Calculate counts and proportions
hwi_activity_summary <- IFF_hwi_act %>%
  count(Activity, HWI) %>%
  pivot_wider(names_from = HWI, values_from = n, values_fill = 0) %>%
  mutate(Total = yes + no,
         Prop_HWI = ifelse(Total > 0, yes / Total, 0)) %>%
  arrange(desc(Prop_HWI))

cat("Summary of HWI Proportions by Human Activity (IFF Data):\n")
## Summary of HWI Proportions by Human Activity (IFF Data):
kable(hwi_activity_summary, digits = 3)
Activity no yes Total Prop_HWI
ST 0 2 2 1.000
WADO 0 3 3 1.000
PH 6 9 15 0.600
ONBO 1 1 2 0.500
WA 4 3 7 0.429
BO 21 15 36 0.417
TEBO 5 2 7 0.286
# Chi-squared test (requires expected counts >= 5, might need Fisher's test if not)
hwi_activity_table <- IFF_hwi_act %>%
  count(Activity, HWI) %>%
  pivot_wider(names_from = HWI, values_from = n, values_fill = 0) %>%
  select(Activity, yes, no) %>%
  tibble::column_to_rownames("Activity")

cat("\nChi-squared Test: HWI vs Activity\n")
## 
## Chi-squared Test: HWI vs Activity
# Check assumptions for Chi-squared (may need fisher.test if expected counts are low)
hwi_act_chisq <- chisq.test(hwi_activity_table)
print(hwi_act_chisq)
## 
##  Pearson's Chi-squared test
## 
## data:  hwi_activity_table
## X-squared = 7.9792, df = 6, p-value = 0.2396
# print(hwi_act_chisq$expected) # Check expected counts

# Plot proportions
ggplot(hwi_activity_summary %>% filter(Total >= 5), aes(x = reorder(Activity, -Prop_HWI), y = Prop_HWI)) +
  geom_col() +
  geom_text(aes(label = paste0("n=", Total)), vjust = -0.5, size = 3) + # Add total counts
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(title = "Proportion of Interactions Classified as HWI by Human Activity",
       subtitle = "Based on IFF data, activities with >= 5 observations shown",
       x = "Human Activity Code",
       y = "Proportion Resulting in HWI") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Interpretation: This analysis identifies which human activities have the highest probability of being associated with an HWI during focal follows. A significant Chi-squared test suggests the type of activity matters in determining HWI likelihood.

Q2b: How does the proportion of time spent in different behaviors change with distance?

Hypothesis:

  • H0: The proportional time allocation across different behavioral states is independent of the distance bin to humans.
  • H1: The proportional time allocation across different behavioral states differs significantly based on the distance bin to humans.

Analysis: Aggregate durations per behavior within each distance bin, calculate proportions, visualize, and test with Chi-squared.

# --- SEOT ---
cat("\n--- SEOT: Time Budget vs Distance Bin ---\n")
## 
## --- SEOT: Time Budget vs Distance Bin ---
seot_time_budget_dist <- IFF_SEOT_dist %>%
  filter(Behavior %in% key_behaviors) %>%
  group_by(Dist.bin, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec, na.rm = TRUE)) %>%
  ungroup() %>%
  group_by(Dist.bin) %>%
  mutate(Proportion = TotalDuration / sum(TotalDuration, na.rm = TRUE)) %>%
  ungroup()

kable(seot_time_budget_dist %>% select(Dist.bin, Behavior, Proportion) %>%
        pivot_wider(names_from = Behavior, values_from = Proportion, values_fill = 0),
      digits = 3)
Dist.bin alert foraging grooming resting swimming
1 0.148 0.370 0.481 0.000 0.000
2 0.317 0.061 0.048 0.537 0.037
3 0.053 0.178 0.340 0.157 0.272
4 0.066 0.198 0.192 0.468 0.076
# Chi-squared test on counts (or total durations as a proxy for counts if appropriate)
seot_time_budget_dist_table <- seot_time_budget_dist %>%
  select(Dist.bin, Behavior, TotalDuration) %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Dist.bin) # Keep only numeric counts/durations

# Caution: Chi-squared assumes counts. Using summed durations is an approximation.
# May need more sophisticated analysis (e.g., compositional data analysis) for rigorous testing.
cat("\nChi-squared Test: SEOT Behavior Counts vs Distance Bin (using duration sums as proxy)\n")
## 
## Chi-squared Test: SEOT Behavior Counts vs Distance Bin (using duration sums as proxy)
seot_tb_dist_chisq <- chisq.test(seot_time_budget_dist_table)
print(seot_tb_dist_chisq)
## 
##  Pearson's Chi-squared test
## 
## data:  seot_time_budget_dist_table
## X-squared = 1140, df = 12, p-value < 2.2e-16
# Visualization
ggplot(seot_time_budget_dist, aes(x = Dist.bin, y = Proportion, fill = Behavior)) +
  geom_col(position = "fill") + # Use position="fill" for proportions
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "SEOT Time Budget Allocation by Distance Bin",
       x = "Distance Bin", y = "Proportion of Time", fill = "Behavior") +
  theme_minimal()

# --- HASE ---
cat("\n--- HASE: Time Budget vs Distance Bin ---\n")
## 
## --- HASE: Time Budget vs Distance Bin ---
hase_time_budget_dist <- IFF_HASE_dist %>%
 filter(Behavior %in% key_behaviors) %>%
  group_by(Dist.bin, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec, na.rm = TRUE)) %>%
  ungroup() %>%
  group_by(Dist.bin) %>%
  mutate(Proportion = TotalDuration / sum(TotalDuration, na.rm = TRUE)) %>%
  ungroup()

kable(hase_time_budget_dist %>% select(Dist.bin, Behavior, Proportion) %>%
        pivot_wider(names_from = Behavior, values_from = Proportion, values_fill = 0),
      digits = 3)
Dist.bin alert resting
2 0.023 0.977
# Chi-squared test
hase_time_budget_dist_table <- hase_time_budget_dist %>%
  select(Dist.bin, Behavior, TotalDuration) %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Dist.bin)

cat("\nChi-squared Test: HASE Behavior Counts vs Distance Bin (using duration sums as proxy)\n")
## 
## Chi-squared Test: HASE Behavior Counts vs Distance Bin (using duration sums as proxy)
hase_tb_dist_chisq <- chisq.test(hase_time_budget_dist_table)
print(hase_tb_dist_chisq)
## 
##  Chi-squared test for given probabilities
## 
## data:  hase_time_budget_dist_table
## X-squared = 432.02, df = 1, p-value < 2.2e-16
# Visualization
ggplot(hase_time_budget_dist, aes(x = Dist.bin, y = Proportion, fill = Behavior)) +
  geom_col(position = "fill") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "HASE Time Budget Allocation by Distance Bin",
       x = "Distance Bin", y = "Proportion of Time", fill = "Behavior") +
  theme_minimal()

Interpretation: This analysis shows if MMs shift their overall activity patterns (e.g., spending more time alert and less time resting) when humans are closer. A significant Chi-squared test suggests distance influences the time budget.

Question 3: Deeper Analyses

Q8: Are there species-specific differences in responses?

Analysis: Compare the results from Q2a/Q5 between SEOT and HASE. Perform direct tests where appropriate.

Hypothesis Example (Alert Duration vs. Boats):

  • H0: There is no difference in the median alert duration between SEOT and HASE when boats (BO activity code) are present.
  • H1: There is a significant difference in the median alert duration between SEOT and HASE when boats are present.
# Compare KW p-values and visualizations from Q2a/Q5 between species.

# Example Direct Test: Alert duration when Boats (BO) are present
iff_alert_boats <- IFF_analysis %>%
  filter(Behavior == "alert", Activity == "BO")

# Check if enough data for both species under this condition
cat("\nCounts for Alert behavior during BO activity:\n")
## 
## Counts for Alert behavior during BO activity:
print(table(iff_alert_boats$Species))
## 
## HASE SEOT 
##    1   14
if(nrow(iff_alert_boats) > 5 && length(unique(iff_alert_boats$Species)) == 2) {
  cat("\nWilcoxon Rank-Sum Test: Alert Duration (SEOT vs HASE) during Boat Activity\n")
  wilcox_boats_alert <- wilcox.test(State.Duration.sec ~ Species, data = iff_alert_boats)
  print(wilcox_boats_alert)

  # Visualization
  ggplot(iff_alert_boats, aes(x=Species, y=State.Duration.sec, fill=Species)) +
    geom_boxplot() +
    labs(title="Alert Duration during Boat Activity: SEOT vs HASE", y="Duration (sec)") +
    theme_bw()

} else {
  cat("\nNot enough data for direct comparison of alert duration during Boat activity.\n")
}
## 
## Wilcoxon Rank-Sum Test: Alert Duration (SEOT vs HASE) during Boat Activity
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  State.Duration.sec by Species
## W = 10, p-value = 0.5625
## alternative hypothesis: true location shift is not equal to 0

# Repeat similar comparisons for other behaviors/activities/distances of interest.

Interpretation: This directly tests if the two species react differently to the same type of human activity or proximity. Significant differences highlight potentially different sensitivities or behavioral strategies.

Q10: Does the overall behavioral time budget shift with human presence/proximity? (IFF Data)

Hypothesis:

  • H0: The overall proportional time budget of MMs does not differ significantly between conditions of human presence vs. absence (or near vs. far).
  • H1: The overall proportional time budget of MMs differs significantly between conditions.

Analysis: Aggregate total time per behavior under different conditions (e.g., any human activity present vs. none; Dist.bin 1-2 vs 3-4) and use Chi-squared on the counts (using durations as proxy).

# Define Human Presence for IFF data (based on Activity column)
IFF_analysis <- IFF_analysis %>%
  mutate(Humans_Present_IFF = ifelse(!is.na(Activity) & Activity != "", "Present", "Absent"))

# Define Proximity Bins
IFF_analysis <- IFF_analysis %>%
  mutate(Proximity = case_when(
    Dist.bin %in% c("1", "2") ~ "Near",
    Dist.bin %in% c("3", "4") ~ "Far",
    TRUE ~ NA_character_ # Handle cases where Dist.bin is NA
  ))

# --- SEOT Time Budget vs Human Presence ---
cat("\n--- SEOT Time Budget vs Human Presence (IFF Based) ---\n")
## 
## --- SEOT Time Budget vs Human Presence (IFF Based) ---
seot_budget_hum <- IFF_analysis %>%
  filter(Species == "SEOT") %>%
  group_by(Humans_Present_IFF, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec)) %>%
  filter(TotalDuration > 0) %>% # Remove combinations with zero time
  ungroup()

# Pivot for Chi-squared
seot_budget_hum_table <- seot_budget_hum %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Humans_Present_IFF)

if(nrow(seot_budget_hum_table) == 2) { # Ensure both present/absent categories exist
  print(chisq.test(seot_budget_hum_table))
} else {
 cat("Cannot perform Chi-squared test; only one level of Human Presence found for SEOT.\n")
}
## Cannot perform Chi-squared test; only one level of Human Presence found for SEOT.
# Visualization (optional)
# ggplot(seot_budget_hum %>% group_by(Humans_Present_IFF) %>% mutate(Prop = TotalDuration/sum(TotalDuration)),
#        aes(x=Humans_Present_IFF, y=Prop, fill=Behavior)) + geom_col(position="fill")

# --- SEOT Time Budget vs Proximity ---
cat("\n--- SEOT Time Budget vs Proximity (Near vs Far) ---\n")
## 
## --- SEOT Time Budget vs Proximity (Near vs Far) ---
seot_budget_prox <- IFF_analysis %>%
  filter(Species == "SEOT", !is.na(Proximity)) %>%
  group_by(Proximity, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec)) %>%
   filter(TotalDuration > 0) %>%
  ungroup()

seot_budget_prox_table <- seot_budget_prox %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Proximity)

if(nrow(seot_budget_prox_table) == 2) {
  print(chisq.test(seot_budget_prox_table))
} else {
 cat("Cannot perform Chi-squared test; only one level of Proximity found for SEOT.\n")
}
## 
##  Pearson's Chi-squared test
## 
## data:  seot_budget_prox_table
## X-squared = 552.16, df = 4, p-value < 2.2e-16
# Repeat for HASE...
# --- HASE Time Budget vs Human Presence ---
cat("\n--- HASE Time Budget vs Human Presence (IFF Based) ---\n")
## 
## --- HASE Time Budget vs Human Presence (IFF Based) ---
hase_budget_hum <- IFF_analysis %>%
  filter(Species == "HASE") %>%
  group_by(Humans_Present_IFF, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec)) %>%
  filter(TotalDuration > 0) %>%
  ungroup()

hase_budget_hum_table <- hase_budget_hum %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Humans_Present_IFF)

if(nrow(hase_budget_hum_table) == 2) {
  print(chisq.test(hase_budget_hum_table))
} else {
 cat("Cannot perform Chi-squared test; only one level of Human Presence found for HASE.\n")
}
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  hase_budget_hum_table
## X-squared = 30.624, df = 1, p-value = 3.132e-08
# --- HASE Time Budget vs Proximity ---
cat("\n--- HASE Time Budget vs Proximity (Near vs Far) ---\n")
## 
## --- HASE Time Budget vs Proximity (Near vs Far) ---
hase_budget_prox <- IFF_analysis %>%
  filter(Species == "HASE", !is.na(Proximity)) %>%
  group_by(Proximity, Behavior) %>%
  summarise(TotalDuration = sum(State.Duration.sec)) %>%
  filter(TotalDuration > 0) %>%
  ungroup()

hase_budget_prox_table <- hase_budget_prox %>%
  pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
  select(-Proximity)

if(nrow(hase_budget_prox_table) == 2) {
  print(chisq.test(hase_budget_prox_table))
} else {
 cat("Cannot perform Chi-squared test; only one level of Proximity found for HASE.\n")
}
## Cannot perform Chi-squared test; only one level of Proximity found for HASE.

Interpretation: Significant Chi-squared results suggest that the overall way MMs allocate their time across different behaviors changes depending on whether humans are present or how close they are.

Q11: Do environmental factors influence MM presence or responses?

Analysis: Incorporate Site, Tide, Weather, Db into models predicting MM presence (ISS) or behavioral duration/response (IFF). Note: Db analysis is limited by missing data in the sample ISS.csv.

# Example: Logistic Regression for SEOT presence including environmental factors (ISS data)
# Need to handle potential NAs in predictor variables first
ISS_env <- ISS %>% filter(!is.na(Tide), !is.na(Weather), !is.na(Site)) # Add Db if using

seot_env_model <- glm(SEOT_present ~ Humans_present + Activity_Category + Tide + factor(Weather) + factor(Site), # + Db,
                      family = binomial(link="logit"), data = ISS_env)
summary(seot_env_model)

# Repeat for HASE
hase_env_model <- glm(HASE_present ~ Humans_present + Activity_Category + Tide + factor(Weather) + factor(Site), # + Db,
                      family = binomial(link="logit"), data = ISS_env)
summary(hase_env_model)
# Example: Kruskal-Wallis testing if SEOT alert duration differs by Site (IFF data)
seot_alert_site <- IFF_SEOT_analysis %>% filter(Behavior == "alert", !is.na(Site))
if(length(unique(seot_alert_site$Site)) > 1) {
  print(kruskal.test(State.Duration.sec ~ factor(Site), data = seot_alert_site))
}

# Could also include environmental terms in GLMs if using that approach for duration.

Interpretation: Significant terms for environmental factors in these models would indicate that tide, weather, site, or noise level influence either the baseline presence of MMs or potentially modify their response to human activities.

Q13: Does noise level (Db) affect MM presence or interact with human presence?

Note: This analysis is heavily dependent on the quality and completeness of the Db data. Based on the sample ISS.csv, Db has many missing values, and the values themselves have a large range - requires careful checking. Assuming sufficient data were available:

# Prerequisite: Handle NAs in Db
ISS_db <- ISS %>% filter(!is.na(Db))

# Correlation test (Spearman due to likely non-normal Db)
cor_db_seot <- cor.test(ISS_db$Db, ISS_db$SEOT_present, method="spearman")
print(cor_db_seot)
cor_db_hase <- cor.test(ISS_db$Db, ISS_db$HASE_present, method="spearman")
print(cor_db_hase)

# Logistic regression including Db and Human presence
# Model 1: Additive effects
seot_db_model1 <- glm(SEOT_present ~ Db + Humans_present, data=ISS_db, family=binomial)
summary(seot_db_model1)
# Model 2: Interaction effect (does effect of humans depend on noise level?)
seot_db_model2 <- glm(SEOT_present ~ Db * Humans_present, data=ISS_db, family=binomial)
summary(seot_db_model2)
# Compare models if desired (e.g., using anova(seot_db_model1, seot_db_model2, test="Chisq"))

# Repeat for HASE

Interpretation: Significant correlation or Db term in regression suggests noise level influences MM presence. A significant interaction term (Db:Humans_present) would suggest the effect of human presence depends on the background noise level.

Data Limitations

It is important to acknowledge the limitations inherent in this study:

  1. Distance Estimation: Distances (Dist, Dist.bin) in the IFF dataset are estimates and may have associated observer error. The use of bins helps mitigate this but reduces precision.
  2. HWI Classification: The classification of an interaction as an HWI (‘yes’/‘no’) can be subjective and depend on observer interpretation of subtle behavioral changes. Definitions should be clearly stated in the thesis methods.
  3. Individual Identification: While Focal.ID exists in IFF, it’s unclear if the same individuals were reliably identified across multiple days or weeks. This limits analyses of habituation or long-term effects on specific animals. Scan data (ISS) does not track individuals.
  4. Noise Data (Db): The decibel readings in ISS appear to have many missing values in the provided sample. Furthermore, a single Db reading per scan might not fully capture the acoustic environment experienced by MMs, especially if the source is localized or transient. Analysis involving Db must carefully consider data quality and interpretation scope.
  5. Confounding Factors: While we attempt to control for some environmental factors, other unmeasured variables (e.g., prey availability, specific boat types, time of day beyond tide) could influence MM behavior and their response to humans.
  6. “Activity” Coding: The Activity codes in IFF need clear definitions. Some codes might encompass multiple specific actions. The simple presence/absence based on Activity in IFF_analysis doesn’t capture the number of people involved in that activity.
  7. IFF vs ISS Temporal Scale: IFF focuses on duration within a follow, while ISS is a snapshot. Direct comparison requires care; they measure different aspects of presence and behavior.

Synthesis and Discussion

(This section would be written after completing and interpreting the analyses. It should synthesize the key findings)

Example Structure:

  • Overall Patterns: Briefly summarize the baseline presence of SEOT, HASE, and humans, and the most common human activities. Discuss the general rate of co-occurrence and whether it deviates significantly from random chance.
  • Human Impact on Behavior: Discuss the main findings regarding how human presence, specific activities, and proximity affect MM behavior.
    • Which activities are most likely to lead to HWIs?
    • Do MMs change the duration of key behaviors (vigilance, foraging, resting) when humans are near or specific activities occur?
    • Do MMs shift their overall time budgets in response to human presence/proximity?
  • Species Differences: Highlight any significant differences found between SEOT and HASE in their sensitivity or behavioral responses to human stimuli. Discuss potential reasons (e.g., baseline behavior, reliance on different habitats within the harbor, historical exposure).
  • Environmental Context: Discuss how factors like site, tide, or weather might mediate MM presence or their interactions with humans. Discuss limitations of noise analysis if applicable.
  • Conservation Implications: Translate the findings into potential management recommendations. For example, if specific activities or proximity ranges consistently elicit strong responses (like flushing or cessation of foraging), suggest potential guidelines or outreach efforts. Are certain sites more sensitive? Do species require different management approaches?
  • Future Directions: Suggest further research based on the findings and limitations (e.g., more detailed acoustic monitoring, studies on individual habituation, experimental approaches to test specific stimuli).

This R Markdown structure provides a comprehensive framework for your analysis, moving from basic descriptions to rigorous statistical testing and interpretation, suitable for a Master’s thesis. Remember to fill in the interpretations based on the actual output of your code execution. Good luck! ```