This document outlines the analysis of marine mammal (Sea Otter - SEOT, Harbor Seal - HASE) behavior in response to human activities within the study harbor. Data were collected using Instantaneous Scan Sampling (ISS) to capture broad presence and co-occurrence patterns, and Instantaneous Focal Follows (IFF) to detail behavioral states, durations, and specific interactions. The goal is to understand how human presence, activity type, and proximity influence marine mammal behavior, providing insights relevant to local wildlife management and conservation.
(write up methods)
First, we load necessary libraries and the datasets.
# Load core libraries for data manipulation and visualization
library(dplyr)
library(tidyr)
library(ggplot2)
library(knitr) # For nice tables
# Load datasets (ensure CSV files are in the project directory or provide full path)
ISS <- read.csv("ISS.csv", stringsAsFactors = TRUE, na.strings=c("NA", "")) # Treat blank cells as NA
IFF <- read.csv("IFF.csv", stringsAsFactors = TRUE, na.strings=c("NA", ""))
# Initial inspection
# print("ISS Head:")
# kable(head(ISS))
# print("IFF Head:")
# kable(head(IFF))
# print("ISS Structure:")
# str(ISS)
# print("IFF Structure:")
# str(IFF)
We will prepare the datasets by creating necessary helper columns and handling missing values where appropriate for specific analyses.
# ISS Data Preparation
# Convert SEOT/HASE counts to presence/absence (0 or 1)
ISS <- ISS %>%
mutate(
SEOT_present = ifelse(SEOT > 0, 1, 0),
HASE_present = ifelse(HASE > 0, 1, 0),
MM_present = ifelse(SEOT_present > 0 | HASE_present > 0, 1, 0),
Humans_present = ifelse(Humans > 0, 1, 0)
)
# Define human activity columns for easier reference
human_activity_cols_iss <- c("ST", "WA", "WADO", "No.dogs", "BO", "ONBO", "TEBO", "PH", "OT")
# Check Db column for potential issues (e.g., if needed for Q13)
# summary(ISS$Db)
# sum(is.na(ISS$Db)) # Count NAs in Db - Note: Many NAs observed in sample data. Will limit noise analysis.
# IFF Data Preparation
# Check data types and potential issues
# summary(IFF)
# str(IFF)
# Convert relevant columns to factors if not already
IFF$Species <- factor(IFF$Species)
IFF$Behavior <- factor(IFF$Behavior)
IFF$Activity <- factor(IFF$Activity)
IFF$HWI <- factor(IFF$HWI)
IFF$Dist.bin <- factor(IFF$Dist.bin, levels = c("1", "2", "3", "4"), ordered = TRUE) # Ensure ordinal factor
# Filter out rows with NA state duration as they are not useful for duration analysis
IFF <- IFF %>% filter(!is.na(State.Duration.sec))
# Create separate datasets for SEOT and HASE focal follows for species-specific analysis
IFF_SEOT <- IFF %>% filter(Species == "SEOT")
IFF_HASE <- IFF %>% filter(Species == "HASE")
# Define key behavioral states for analysis (adjust as needed)
key_behaviors <- c("resting", "grooming", "swimming", "foraging", "alert", "periscoping", "dive", "locomoting", "mating", "socializing", "hauled out") # Based on IFF unique values
We will use a combination of descriptive statistics, contingency table analysis (Chi-squared tests), non-parametric tests for comparing distributions (Kruskal-Wallis, Wilcoxon rank-sum), and potentially generalized linear models (GLMs) where appropriate for count or duration data. Analyses will be performed separately for SEOT and HASE where relevant to investigate species-specific responses. Significance level is set at α = 0.05.
Hypothesis:
Analysis: We use the ISS
dataset to
calculate co-occurrence frequencies and perform Chi-squared tests for
independence.
# Overall Human and MM Presence (from ISS)
total_scans <- nrow(ISS)
scans_with_humans <- sum(ISS$Humans_present, na.rm = TRUE)
scans_with_seot <- sum(ISS$SEOT_present, na.rm = TRUE)
scans_with_hase <- sum(ISS$HASE_present, na.rm = TRUE)
scans_with_mm <- sum(ISS$MM_present, na.rm = TRUE)
cat("Total Scans:", total_scans, "\n")
## Total Scans: 1457
cat("Scans with Humans:", scans_with_humans, "(", round(100*scans_with_humans/total_scans, 1), "%)\n")
## Scans with Humans: 769 ( 52.8 %)
cat("Scans with SEOT:", scans_with_seot, "(", round(100*scans_with_seot/total_scans, 1), "%)\n")
## Scans with SEOT: 103 ( 7.1 %)
cat("Scans with HASE:", scans_with_hase, "(", round(100*scans_with_hase/total_scans, 1), "%)\n")
## Scans with HASE: 56 ( 3.8 %)
cat("Scans with Any MM:", scans_with_mm, "(", round(100*scans_with_mm/total_scans, 1), "%)\n")
## Scans with Any MM: 156 ( 10.7 %)
# Contingency Table: SEOT vs Human Presence
seot_human_table <- table(SEOT_Present = ISS$SEOT_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: SEOT vs Humans\n")
##
## Contingency Table: SEOT vs Humans
kable(seot_human_table)
0 | 1 | |
---|---|---|
0 | 645 | 709 |
1 | 43 | 60 |
# Proportion Table: SEOT given Human Presence/Absence
cat("\nProportion of Scans with SEOT (Given Human Presence/Absence)\n")
##
## Proportion of Scans with SEOT (Given Human Presence/Absence)
kable(prop.table(seot_human_table, margin = 2), digits = 3) # margin=2 for column proportions
0 | 1 | |
---|---|---|
0 | 0.938 | 0.922 |
1 | 0.062 | 0.078 |
# Chi-squared Test: SEOT vs Humans
cat("\nChi-squared Test: SEOT vs Humans\n")
##
## Chi-squared Test: SEOT vs Humans
seot_human_chisq <- chisq.test(seot_human_table)
print(seot_human_chisq)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: seot_human_table
## X-squared = 1.1062, df = 1, p-value = 0.2929
# Contingency Table: HASE vs Human Presence
hase_human_table <- table(HASE_Present = ISS$HASE_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: HASE vs Humans\n")
##
## Contingency Table: HASE vs Humans
kable(hase_human_table)
0 | 1 | |
---|---|---|
0 | 646 | 755 |
1 | 42 | 14 |
# Proportion Table: HASE given Human Presence/Absence
cat("\nProportion of Scans with HASE (Given Human Presence/Absence)\n")
##
## Proportion of Scans with HASE (Given Human Presence/Absence)
kable(prop.table(hase_human_table, margin = 2), digits = 3)
0 | 1 | |
---|---|---|
0 | 0.939 | 0.982 |
1 | 0.061 | 0.018 |
# Chi-squared Test: HASE vs Humans
cat("\nChi-squared Test: HASE vs Humans\n")
##
## Chi-squared Test: HASE vs Humans
hase_human_chisq <- chisq.test(hase_human_table)
print(hase_human_chisq)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: hase_human_table
## X-squared = 16.892, df = 1, p-value = 3.956e-05
# Contingency Table: Any MM vs Human Presence
mm_human_table <- table(MM_Present = ISS$MM_present, Humans_Present = ISS$Humans_present)
cat("\nContingency Table: Any MM vs Humans\n")
##
## Contingency Table: Any MM vs Humans
kable(mm_human_table)
0 | 1 | |
---|---|---|
0 | 604 | 697 |
1 | 84 | 72 |
# Proportion Table: Any MM given Human Presence/Absence
cat("\nProportion of Scans with Any MM (Given Human Presence/Absence)\n")
##
## Proportion of Scans with Any MM (Given Human Presence/Absence)
kable(prop.table(mm_human_table, margin = 2), digits = 3)
0 | 1 | |
---|---|---|
0 | 0.878 | 0.906 |
1 | 0.122 | 0.094 |
# Chi-squared Test: Any MM vs Humans
cat("\nChi-squared Test: Any MM vs Humans\n")
##
## Chi-squared Test: Any MM vs Humans
mm_human_chisq <- chisq.test(mm_human_table)
print(mm_human_chisq)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: mm_human_table
## X-squared = 2.7869, df = 1, p-value = 0.09504
Interpretation: Based on the proportions and Chi-squared test results, we can determine if the presence of SEOT, HASE, or any MM is significantly associated with human presence during scans. For example, a significant p-value (< 0.05) would lead us to reject the null hypothesis of independence. The proportion tables indicate the direction of association (e.g., are MMs more or less likely to be seen when humans are present?).
Analysis: Simple summary counts from the
ISS
dataset.
activity_counts <- ISS %>%
summarise(across(all_of(human_activity_cols_iss), sum, na.rm = TRUE)) %>%
pivot_longer(everything(), names_to = "Activity", values_to = "TotalOccurrences")
# Counts of scans where each activity was present (>=1 person doing it)
activity_presence_counts <- ISS %>%
summarise(across(all_of(human_activity_cols_iss), ~ sum(. > 0, na.rm = TRUE))) %>%
pivot_longer(everything(), names_to = "Activity", values_to = "ScansPresent")
activity_summary <- full_join(activity_counts, activity_presence_counts, by = "Activity")
cat("Summary of Human Activity Occurrences (ISS Data):\n")
## Summary of Human Activity Occurrences (ISS Data):
kable(arrange(activity_summary, desc(ScansPresent)))
Activity | TotalOccurrences | ScansPresent |
---|---|---|
TEBO | 1829 | 500 |
WA | 1220 | 339 |
ONBO | 450 | 113 |
ST | 444 | 85 |
No.dogs | 77 | 61 |
WADO | 74 | 50 |
OT | 25 | 11 |
PH | 10 | 5 |
BO | 18 | 4 |
# Basic bar chart visualization
ggplot(activity_summary, aes(x = reorder(Activity, -ScansPresent), y = ScansPresent)) +
geom_col() +
labs(title = "Number of Scans Where Each Human Activity Was Present",
x = "Human Activity Code",
y = "Number of Scans") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Interpretation: This provides a baseline understanding of which human activities are most common in the harbor during the study period based on scan samples.
This section focuses on the detailed focal follow data
(IFF
) to understand how MM behavior changes in response to
human presence, activity type, and proximity.
Hypotheses:
Analysis: Due to non-normality of duration data (as explored previously), we use the Kruskal-Wallis test. We analyze SEOT and HASE separately. We focus on key behaviors where changes might be expected.
# Filter IFF data for key behaviors and valid activity/distance data
IFF_analysis <- IFF %>%
filter(Behavior %in% key_behaviors,
!is.na(State.Duration.sec),
!is.na(Activity) | !is.na(Dist.bin)) # Need at least one predictor
# Filter out activities with very few observations during focals if needed
activity_counts_iff <- IFF_analysis %>% count(Activity) %>% filter(!is.na(Activity))
common_activities <- activity_counts_iff %>% filter(n > 5) %>% pull(Activity) # Example threshold: >5 occurrences
IFF_analysis <- IFF_analysis %>% filter(Activity %in% common_activities | is.na(Activity))
# Separate by species
IFF_SEOT_analysis <- IFF_analysis %>% filter(Species == "SEOT")
IFF_HASE_analysis <- IFF_analysis %>% filter(Species == "HASE")
# Further filter for distance analysis (need non-NA Dist.bin)
IFF_SEOT_dist <- IFF_SEOT_analysis %>% filter(!is.na(Dist.bin))
IFF_HASE_dist <- IFF_HASE_analysis %>% filter(!is.na(Dist.bin))
# Further filter for activity analysis (need non-NA Activity)
IFF_SEOT_act <- IFF_SEOT_analysis %>% filter(!is.na(Activity) & Activity != "")
IFF_HASE_act <- IFF_HASE_analysis %>% filter(!is.na(Activity) & Activity != "")
Analysis: Duration vs. Distance Bin
# --- SEOT ---
cat("--- SEOT: Duration vs Distance Bin ---\n")
## --- SEOT: Duration vs Distance Bin ---
seot_kw_dist_results <- list()
# Loop through SEOT behaviors present in the distance-filtered data
for (beh in intersect(key_behaviors, unique(IFF_SEOT_dist$Behavior))) {
data_subset <- IFF_SEOT_dist %>% filter(Behavior == beh)
# Check if there are multiple distance bins represented and enough data points
if (length(unique(data_subset$Dist.bin)) > 1 && nrow(data_subset) > 5) {
test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Dist.bin, data = data_subset), error = function(e) NULL)
# If test runs without error, store and print
if (!is.null(test_result)) {
cat("\nBehavior:", as.character(beh), "\n")
print(test_result)
seot_kw_dist_results[[as.character(beh)]] <- test_result
}
}
}
##
## Behavior: resting
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 1.325, df = 2, p-value = 0.5156
##
##
## Behavior: swimming
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 0.85714, df = 2, p-value = 0.6514
##
##
## Behavior: foraging
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 0.95455, df = 3, p-value = 0.8122
##
##
## Behavior: alert
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Dist.bin
## Kruskal-Wallis chi-squared = 2.1055, df = 3, p-value = 0.5508
# Visualization: SEOT Duration vs Distance Bin (ONLY if results exist)
if (length(seot_kw_dist_results) > 0) {
print( # Explicitly print ggplot object from within if statement
ggplot(IFF_SEOT_dist %>% filter(Behavior %in% names(seot_kw_dist_results)),
aes(x = Dist.bin, y = State.Duration.sec)) +
geom_boxplot() +
facet_wrap(~ Behavior, scales = "free_y") +
labs(title = "SEOT Behavior Duration by Binned Distance to Humans",
subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
x = "Distance Bin (1=Closest, 4=Farthest)", y = "State Duration (sec)") +
theme_bw()
)
} else {
cat("\nNo significant Kruskal-Wallis results found for SEOT duration vs distance; skipping plot.\n")
}
# --- HASE ---
cat("\n--- HASE: Duration vs Distance Bin ---\n")
##
## --- HASE: Duration vs Distance Bin ---
hase_kw_dist_results <- list()
# Loop through HASE behaviors present in the distance-filtered data
for (beh in intersect(key_behaviors, unique(IFF_HASE_dist$Behavior))) {
data_subset <- IFF_HASE_dist %>% filter(Behavior == beh)
# Check if there are multiple distance bins represented and enough data points
if (length(unique(data_subset$Dist.bin)) > 1 && nrow(data_subset) > 5) {
test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Dist.bin, data = data_subset), error = function(e) NULL)
# If test runs without error, store and print
if (!is.null(test_result)) {
cat("\nBehavior:", as.character(beh), "\n")
print(test_result)
hase_kw_dist_results[[as.character(beh)]] <- test_result
}
}
}
# Visualization: HASE Duration vs Distance Bin (ONLY if results exist)
if (length(hase_kw_dist_results) > 0) {
print( # Explicitly print ggplot object
ggplot(IFF_HASE_dist %>% filter(Behavior %in% names(hase_kw_dist_results)),
aes(x = Dist.bin, y = State.Duration.sec)) +
geom_boxplot() +
facet_wrap(~ Behavior, scales = "free_y") +
labs(title = "HASE Behavior Duration by Binned Distance to Humans",
subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
x = "Distance Bin (1=Closest, 4=Farthest)", y = "State Duration (sec)") +
theme_bw()
)
} else {
cat("\nNo significant Kruskal-Wallis results found for HASE duration vs distance; skipping plot.\n")
}
##
## No significant Kruskal-Wallis results found for HASE duration vs distance; skipping plot.
Analysis: Duration vs. Human Activity
# --- SEOT ---
cat("\n--- SEOT: Duration vs Human Activity ---\n")
##
## --- SEOT: Duration vs Human Activity ---
seot_kw_act_results <- list()
# Loop through SEOT behaviors present in the activity-filtered data
for (beh in intersect(key_behaviors, unique(IFF_SEOT_act$Behavior))) {
data_subset <- IFF_SEOT_act %>% filter(Behavior == beh)
# Check if there are multiple activity types represented and enough data points
if (length(unique(data_subset$Activity)) > 1 && nrow(data_subset) > 5) {
test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Activity, data = data_subset), error = function(e) NULL)
# If test runs without error, store and print
if (!is.null(test_result)) {
cat("\nBehavior:", as.character(beh), "\n")
print(test_result)
seot_kw_act_results[[as.character(beh)]] <- test_result
}
}
}
##
## Behavior: resting
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 7.2533, df = 3, p-value = 0.06425
##
##
## Behavior: grooming
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 2.4643, df = 3, p-value = 0.4818
##
##
## Behavior: alert
##
## Kruskal-Wallis rank sum test
##
## data: State.Duration.sec by Activity
## Kruskal-Wallis chi-squared = 6.859, df = 3, p-value = 0.07653
# Visualization: SEOT Duration vs Activity (ONLY if results exist)
if(length(seot_kw_act_results) > 0) {
print( # Explicitly print ggplot object
ggplot(IFF_SEOT_act %>% filter(Behavior %in% names(seot_kw_act_results)),
aes(x = Activity, y = State.Duration.sec)) +
geom_boxplot() +
facet_wrap(~ Behavior, scales = "free_y") +
labs(title = "SEOT Behavior Duration by Human Activity",
subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
x = "Human Activity Code", y = "State Duration (sec)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
)
} else {
cat("\nNo significant Kruskal-Wallis results found for SEOT duration vs activity; skipping plot.\n")
}
# --- HASE ---
cat("\n--- HASE: Duration vs Human Activity ---\n")
##
## --- HASE: Duration vs Human Activity ---
hase_kw_act_results <- list()
# Loop through HASE behaviors present in the activity-filtered data
for (beh in intersect(key_behaviors, unique(IFF_HASE_act$Behavior))) {
data_subset <- IFF_HASE_act %>% filter(Behavior == beh)
# Check if there are multiple activity types represented and enough data points
if (length(unique(data_subset$Activity)) > 1 && nrow(data_subset) > 5) {
test_result <- tryCatch(kruskal.test(State.Duration.sec ~ Activity, data = data_subset), error = function(e) NULL)
# If test runs without error, store and print
if (!is.null(test_result)) {
cat("\nBehavior:", as.character(beh), "\n")
print(test_result)
hase_kw_act_results[[as.character(beh)]] <- test_result
}
}
}
# Visualization: HASE Duration vs Activity (ONLY if results exist)
if(length(hase_kw_act_results) > 0) {
print( # Explicitly print ggplot object
ggplot(IFF_HASE_act %>% filter(Behavior %in% names(hase_kw_act_results)),
aes(x = Activity, y = State.Duration.sec)) +
geom_boxplot() +
facet_wrap(~ Behavior, scales = "free_y") +
labs(title = "HASE Behavior Duration by Human Activity",
subtitle = "Only behaviors with successful Kruskal-Wallis test shown",
x = "Human Activity Code", y = "State Duration (sec)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
)
} else {
cat("\nNo significant Kruskal-Wallis results found for HASE duration vs activity; skipping plot.\n")
}
##
## No significant Kruskal-Wallis results found for HASE duration vs activity; skipping plot.
Post-Hoc Testing (If Kruskal-Wallis is Significant)
If the Kruskal-Wallis tests above show significant differences (p < 0.05) for a given behavior and factor (distance or activity), pairwise comparisons are needed to determine which* specific groups differ.*
# Example for SEOT Alert duration vs Distance Bin (only run if KW was significant)
beh_to_test <- "alert"
if (exists("seot_kw_dist_results") && !is.null(seot_kw_dist_results[[beh_to_test]]) && seot_kw_dist_results[[beh_to_test]]$p.value < 0.05) {
cat("\n--- Post-hoc Test: SEOT Alert Duration vs Distance Bin ---\n")
dunn_result <- pairwise.wilcox.test(IFF_SEOT_dist$State.Duration.sec[IFF_SEOT_dist$Behavior == beh_to_test],
IFF_SEOT_dist$Dist.bin[IFF_SEOT_dist$Behavior == beh_to_test],
p.adjust.method = "holm")
print(dunn_result)
}
# Example for SEOT Alert duration vs Activity (only run if KW was significant)
beh_to_test <- "alert"
activity_data_subset <- IFF_SEOT_act %>% filter(Behavior == beh_to_test)
if (exists("seot_kw_act_results") && !is.null(seot_kw_act_results[[beh_to_test]]) && seot_kw_act_results[[beh_to_test]]$p.value < 0.05) {
cat("\n--- Post-hoc Test: SEOT Alert Duration vs Activity ---\n")
# Note: pairwise.wilcox.test might give warnings if group sizes are small. Consider alternatives if needed.
dunn_result_act <- pairwise.wilcox.test(activity_data_subset$State.Duration.sec,
activity_data_subset$Activity,
p.adjust.method = "holm")
print(dunn_result_act)
}
# Repeat similar post-hoc tests for other significant KW results (for SEOT and HASE)
Interpretation: The Kruskal-Wallis results indicate whether there is an overall significant difference in behavior duration based on distance or activity. Significant results (p < 0.05) reject the null hypothesis. Post-hoc tests (if run) pinpoint which specific distance bins or activities lead to significantly different durations for that behavior, for each species. Visualizations help illustrate these potential differences.
Hypothesis:
Analysis: We calculate the proportion of ‘yes’ HWIs for each activity observed during focal follows and use a Chi-squared test.
# Use IFF data where Activity and HWI are recorded
IFF_hwi_act <- IFF %>%
filter(!is.na(Activity), Activity != "", HWI %in% c("yes", "no"))
# Calculate counts and proportions
hwi_activity_summary <- IFF_hwi_act %>%
count(Activity, HWI) %>%
pivot_wider(names_from = HWI, values_from = n, values_fill = 0) %>%
mutate(Total = yes + no,
Prop_HWI = ifelse(Total > 0, yes / Total, 0)) %>%
arrange(desc(Prop_HWI))
cat("Summary of HWI Proportions by Human Activity (IFF Data):\n")
## Summary of HWI Proportions by Human Activity (IFF Data):
kable(hwi_activity_summary, digits = 3)
Activity | no | yes | Total | Prop_HWI |
---|---|---|---|---|
ST | 0 | 2 | 2 | 1.000 |
WADO | 0 | 3 | 3 | 1.000 |
PH | 6 | 9 | 15 | 0.600 |
ONBO | 1 | 1 | 2 | 0.500 |
WA | 4 | 3 | 7 | 0.429 |
BO | 21 | 15 | 36 | 0.417 |
TEBO | 5 | 2 | 7 | 0.286 |
# Chi-squared test (requires expected counts >= 5, might need Fisher's test if not)
hwi_activity_table <- IFF_hwi_act %>%
count(Activity, HWI) %>%
pivot_wider(names_from = HWI, values_from = n, values_fill = 0) %>%
select(Activity, yes, no) %>%
tibble::column_to_rownames("Activity")
cat("\nChi-squared Test: HWI vs Activity\n")
##
## Chi-squared Test: HWI vs Activity
# Check assumptions for Chi-squared (may need fisher.test if expected counts are low)
hwi_act_chisq <- chisq.test(hwi_activity_table)
print(hwi_act_chisq)
##
## Pearson's Chi-squared test
##
## data: hwi_activity_table
## X-squared = 7.9792, df = 6, p-value = 0.2396
# print(hwi_act_chisq$expected) # Check expected counts
# Plot proportions
ggplot(hwi_activity_summary %>% filter(Total >= 5), aes(x = reorder(Activity, -Prop_HWI), y = Prop_HWI)) +
geom_col() +
geom_text(aes(label = paste0("n=", Total)), vjust = -0.5, size = 3) + # Add total counts
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(title = "Proportion of Interactions Classified as HWI by Human Activity",
subtitle = "Based on IFF data, activities with >= 5 observations shown",
x = "Human Activity Code",
y = "Proportion Resulting in HWI") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Interpretation: This analysis identifies which human activities have the highest probability of being associated with an HWI during focal follows. A significant Chi-squared test suggests the type of activity matters in determining HWI likelihood.
Hypothesis:
Analysis: Aggregate durations per behavior within each distance bin, calculate proportions, visualize, and test with Chi-squared.
# --- SEOT ---
cat("\n--- SEOT: Time Budget vs Distance Bin ---\n")
##
## --- SEOT: Time Budget vs Distance Bin ---
seot_time_budget_dist <- IFF_SEOT_dist %>%
filter(Behavior %in% key_behaviors) %>%
group_by(Dist.bin, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec, na.rm = TRUE)) %>%
ungroup() %>%
group_by(Dist.bin) %>%
mutate(Proportion = TotalDuration / sum(TotalDuration, na.rm = TRUE)) %>%
ungroup()
kable(seot_time_budget_dist %>% select(Dist.bin, Behavior, Proportion) %>%
pivot_wider(names_from = Behavior, values_from = Proportion, values_fill = 0),
digits = 3)
Dist.bin | alert | foraging | grooming | resting | swimming |
---|---|---|---|---|---|
1 | 0.148 | 0.370 | 0.481 | 0.000 | 0.000 |
2 | 0.317 | 0.061 | 0.048 | 0.537 | 0.037 |
3 | 0.053 | 0.178 | 0.340 | 0.157 | 0.272 |
4 | 0.066 | 0.198 | 0.192 | 0.468 | 0.076 |
# Chi-squared test on counts (or total durations as a proxy for counts if appropriate)
seot_time_budget_dist_table <- seot_time_budget_dist %>%
select(Dist.bin, Behavior, TotalDuration) %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Dist.bin) # Keep only numeric counts/durations
# Caution: Chi-squared assumes counts. Using summed durations is an approximation.
# May need more sophisticated analysis (e.g., compositional data analysis) for rigorous testing.
cat("\nChi-squared Test: SEOT Behavior Counts vs Distance Bin (using duration sums as proxy)\n")
##
## Chi-squared Test: SEOT Behavior Counts vs Distance Bin (using duration sums as proxy)
seot_tb_dist_chisq <- chisq.test(seot_time_budget_dist_table)
print(seot_tb_dist_chisq)
##
## Pearson's Chi-squared test
##
## data: seot_time_budget_dist_table
## X-squared = 1140, df = 12, p-value < 2.2e-16
# Visualization
ggplot(seot_time_budget_dist, aes(x = Dist.bin, y = Proportion, fill = Behavior)) +
geom_col(position = "fill") + # Use position="fill" for proportions
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "SEOT Time Budget Allocation by Distance Bin",
x = "Distance Bin", y = "Proportion of Time", fill = "Behavior") +
theme_minimal()
# --- HASE ---
cat("\n--- HASE: Time Budget vs Distance Bin ---\n")
##
## --- HASE: Time Budget vs Distance Bin ---
hase_time_budget_dist <- IFF_HASE_dist %>%
filter(Behavior %in% key_behaviors) %>%
group_by(Dist.bin, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec, na.rm = TRUE)) %>%
ungroup() %>%
group_by(Dist.bin) %>%
mutate(Proportion = TotalDuration / sum(TotalDuration, na.rm = TRUE)) %>%
ungroup()
kable(hase_time_budget_dist %>% select(Dist.bin, Behavior, Proportion) %>%
pivot_wider(names_from = Behavior, values_from = Proportion, values_fill = 0),
digits = 3)
Dist.bin | alert | resting |
---|---|---|
2 | 0.023 | 0.977 |
# Chi-squared test
hase_time_budget_dist_table <- hase_time_budget_dist %>%
select(Dist.bin, Behavior, TotalDuration) %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Dist.bin)
cat("\nChi-squared Test: HASE Behavior Counts vs Distance Bin (using duration sums as proxy)\n")
##
## Chi-squared Test: HASE Behavior Counts vs Distance Bin (using duration sums as proxy)
hase_tb_dist_chisq <- chisq.test(hase_time_budget_dist_table)
print(hase_tb_dist_chisq)
##
## Chi-squared test for given probabilities
##
## data: hase_time_budget_dist_table
## X-squared = 432.02, df = 1, p-value < 2.2e-16
# Visualization
ggplot(hase_time_budget_dist, aes(x = Dist.bin, y = Proportion, fill = Behavior)) +
geom_col(position = "fill") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "HASE Time Budget Allocation by Distance Bin",
x = "Distance Bin", y = "Proportion of Time", fill = "Behavior") +
theme_minimal()
Interpretation: This analysis shows if MMs shift their overall activity patterns (e.g., spending more time alert and less time resting) when humans are closer. A significant Chi-squared test suggests distance influences the time budget.
Analysis: Compare the results from Q2a/Q5 between SEOT and HASE. Perform direct tests where appropriate.
Hypothesis Example (Alert Duration vs. Boats):
BO
activity
code) are present.# Compare KW p-values and visualizations from Q2a/Q5 between species.
# Example Direct Test: Alert duration when Boats (BO) are present
iff_alert_boats <- IFF_analysis %>%
filter(Behavior == "alert", Activity == "BO")
# Check if enough data for both species under this condition
cat("\nCounts for Alert behavior during BO activity:\n")
##
## Counts for Alert behavior during BO activity:
print(table(iff_alert_boats$Species))
##
## HASE SEOT
## 1 14
if(nrow(iff_alert_boats) > 5 && length(unique(iff_alert_boats$Species)) == 2) {
cat("\nWilcoxon Rank-Sum Test: Alert Duration (SEOT vs HASE) during Boat Activity\n")
wilcox_boats_alert <- wilcox.test(State.Duration.sec ~ Species, data = iff_alert_boats)
print(wilcox_boats_alert)
# Visualization
ggplot(iff_alert_boats, aes(x=Species, y=State.Duration.sec, fill=Species)) +
geom_boxplot() +
labs(title="Alert Duration during Boat Activity: SEOT vs HASE", y="Duration (sec)") +
theme_bw()
} else {
cat("\nNot enough data for direct comparison of alert duration during Boat activity.\n")
}
##
## Wilcoxon Rank-Sum Test: Alert Duration (SEOT vs HASE) during Boat Activity
##
## Wilcoxon rank sum test with continuity correction
##
## data: State.Duration.sec by Species
## W = 10, p-value = 0.5625
## alternative hypothesis: true location shift is not equal to 0
# Repeat similar comparisons for other behaviors/activities/distances of interest.
Interpretation: This directly tests if the two species react differently to the same type of human activity or proximity. Significant differences highlight potentially different sensitivities or behavioral strategies.
Hypothesis:
Analysis: Aggregate total time per behavior under different conditions (e.g., any human activity present vs. none; Dist.bin 1-2 vs 3-4) and use Chi-squared on the counts (using durations as proxy).
# Define Human Presence for IFF data (based on Activity column)
IFF_analysis <- IFF_analysis %>%
mutate(Humans_Present_IFF = ifelse(!is.na(Activity) & Activity != "", "Present", "Absent"))
# Define Proximity Bins
IFF_analysis <- IFF_analysis %>%
mutate(Proximity = case_when(
Dist.bin %in% c("1", "2") ~ "Near",
Dist.bin %in% c("3", "4") ~ "Far",
TRUE ~ NA_character_ # Handle cases where Dist.bin is NA
))
# --- SEOT Time Budget vs Human Presence ---
cat("\n--- SEOT Time Budget vs Human Presence (IFF Based) ---\n")
##
## --- SEOT Time Budget vs Human Presence (IFF Based) ---
seot_budget_hum <- IFF_analysis %>%
filter(Species == "SEOT") %>%
group_by(Humans_Present_IFF, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec)) %>%
filter(TotalDuration > 0) %>% # Remove combinations with zero time
ungroup()
# Pivot for Chi-squared
seot_budget_hum_table <- seot_budget_hum %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Humans_Present_IFF)
if(nrow(seot_budget_hum_table) == 2) { # Ensure both present/absent categories exist
print(chisq.test(seot_budget_hum_table))
} else {
cat("Cannot perform Chi-squared test; only one level of Human Presence found for SEOT.\n")
}
## Cannot perform Chi-squared test; only one level of Human Presence found for SEOT.
# Visualization (optional)
# ggplot(seot_budget_hum %>% group_by(Humans_Present_IFF) %>% mutate(Prop = TotalDuration/sum(TotalDuration)),
# aes(x=Humans_Present_IFF, y=Prop, fill=Behavior)) + geom_col(position="fill")
# --- SEOT Time Budget vs Proximity ---
cat("\n--- SEOT Time Budget vs Proximity (Near vs Far) ---\n")
##
## --- SEOT Time Budget vs Proximity (Near vs Far) ---
seot_budget_prox <- IFF_analysis %>%
filter(Species == "SEOT", !is.na(Proximity)) %>%
group_by(Proximity, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec)) %>%
filter(TotalDuration > 0) %>%
ungroup()
seot_budget_prox_table <- seot_budget_prox %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Proximity)
if(nrow(seot_budget_prox_table) == 2) {
print(chisq.test(seot_budget_prox_table))
} else {
cat("Cannot perform Chi-squared test; only one level of Proximity found for SEOT.\n")
}
##
## Pearson's Chi-squared test
##
## data: seot_budget_prox_table
## X-squared = 552.16, df = 4, p-value < 2.2e-16
# Repeat for HASE...
# --- HASE Time Budget vs Human Presence ---
cat("\n--- HASE Time Budget vs Human Presence (IFF Based) ---\n")
##
## --- HASE Time Budget vs Human Presence (IFF Based) ---
hase_budget_hum <- IFF_analysis %>%
filter(Species == "HASE") %>%
group_by(Humans_Present_IFF, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec)) %>%
filter(TotalDuration > 0) %>%
ungroup()
hase_budget_hum_table <- hase_budget_hum %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Humans_Present_IFF)
if(nrow(hase_budget_hum_table) == 2) {
print(chisq.test(hase_budget_hum_table))
} else {
cat("Cannot perform Chi-squared test; only one level of Human Presence found for HASE.\n")
}
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: hase_budget_hum_table
## X-squared = 30.624, df = 1, p-value = 3.132e-08
# --- HASE Time Budget vs Proximity ---
cat("\n--- HASE Time Budget vs Proximity (Near vs Far) ---\n")
##
## --- HASE Time Budget vs Proximity (Near vs Far) ---
hase_budget_prox <- IFF_analysis %>%
filter(Species == "HASE", !is.na(Proximity)) %>%
group_by(Proximity, Behavior) %>%
summarise(TotalDuration = sum(State.Duration.sec)) %>%
filter(TotalDuration > 0) %>%
ungroup()
hase_budget_prox_table <- hase_budget_prox %>%
pivot_wider(names_from = Behavior, values_from = TotalDuration, values_fill = 0) %>%
select(-Proximity)
if(nrow(hase_budget_prox_table) == 2) {
print(chisq.test(hase_budget_prox_table))
} else {
cat("Cannot perform Chi-squared test; only one level of Proximity found for HASE.\n")
}
## Cannot perform Chi-squared test; only one level of Proximity found for HASE.
Interpretation: Significant Chi-squared results suggest that the overall way MMs allocate their time across different behaviors changes depending on whether humans are present or how close they are.
Analysis: Incorporate Site
,
Tide
, Weather
, Db
into models
predicting MM presence (ISS) or behavioral duration/response (IFF).
Note: Db
analysis is limited by missing data in the
sample ISS.csv
.
# Example: Logistic Regression for SEOT presence including environmental factors (ISS data)
# Need to handle potential NAs in predictor variables first
ISS_env <- ISS %>% filter(!is.na(Tide), !is.na(Weather), !is.na(Site)) # Add Db if using
seot_env_model <- glm(SEOT_present ~ Humans_present + Activity_Category + Tide + factor(Weather) + factor(Site), # + Db,
family = binomial(link="logit"), data = ISS_env)
summary(seot_env_model)
# Repeat for HASE
hase_env_model <- glm(HASE_present ~ Humans_present + Activity_Category + Tide + factor(Weather) + factor(Site), # + Db,
family = binomial(link="logit"), data = ISS_env)
summary(hase_env_model)
# Example: Kruskal-Wallis testing if SEOT alert duration differs by Site (IFF data)
seot_alert_site <- IFF_SEOT_analysis %>% filter(Behavior == "alert", !is.na(Site))
if(length(unique(seot_alert_site$Site)) > 1) {
print(kruskal.test(State.Duration.sec ~ factor(Site), data = seot_alert_site))
}
# Could also include environmental terms in GLMs if using that approach for duration.
Interpretation: Significant terms for environmental factors in these models would indicate that tide, weather, site, or noise level influence either the baseline presence of MMs or potentially modify their response to human activities.
Db
) affect MM presence or
interact with human presence?Note: This analysis is heavily dependent on the quality and
completeness of the Db
data. Based on the sample
ISS.csv
, Db
has many missing values, and the
values themselves have a large range - requires careful checking.
Assuming sufficient data were available:
# Prerequisite: Handle NAs in Db
ISS_db <- ISS %>% filter(!is.na(Db))
# Correlation test (Spearman due to likely non-normal Db)
cor_db_seot <- cor.test(ISS_db$Db, ISS_db$SEOT_present, method="spearman")
print(cor_db_seot)
cor_db_hase <- cor.test(ISS_db$Db, ISS_db$HASE_present, method="spearman")
print(cor_db_hase)
# Logistic regression including Db and Human presence
# Model 1: Additive effects
seot_db_model1 <- glm(SEOT_present ~ Db + Humans_present, data=ISS_db, family=binomial)
summary(seot_db_model1)
# Model 2: Interaction effect (does effect of humans depend on noise level?)
seot_db_model2 <- glm(SEOT_present ~ Db * Humans_present, data=ISS_db, family=binomial)
summary(seot_db_model2)
# Compare models if desired (e.g., using anova(seot_db_model1, seot_db_model2, test="Chisq"))
# Repeat for HASE
Interpretation: Significant correlation or
Db
term in regression suggests noise level influences MM
presence. A significant interaction term
(Db:Humans_present
) would suggest the effect of human
presence depends on the background noise level.
It is important to acknowledge the limitations inherent in this study:
Dist
,
Dist.bin
) in the IFF dataset are estimates and may have
associated observer error. The use of bins helps mitigate this but
reduces precision.Focal.ID
exists in IFF, it’s unclear if the same
individuals were reliably identified across multiple days or weeks. This
limits analyses of habituation or long-term effects on specific animals.
Scan data (ISS
) does not track individuals.Db
): The decibel readings
in ISS
appear to have many missing values in the provided
sample. Furthermore, a single Db
reading per scan might not
fully capture the acoustic environment experienced by MMs, especially if
the source is localized or transient. Analysis involving Db
must carefully consider data quality and interpretation scope.Activity
codes
in IFF
need clear definitions. Some codes might encompass
multiple specific actions. The simple presence/absence based on
Activity
in IFF_analysis
doesn’t capture the
number of people involved in that activity.(This section would be written after completing and interpreting the analyses. It should synthesize the key findings)
Example Structure:
This R Markdown structure provides a comprehensive framework for your analysis, moving from basic descriptions to rigorous statistical testing and interpretation, suitable for a Master’s thesis. Remember to fill in the interpretations based on the actual output of your code execution. Good luck! ```