This analysis explores the parental behaviors of nesting birds, specifically focusing on sexual dimorphism (differences between males and females) in foraging strategies. The dataset includes observations of trip duration, time spent at the nest box, and the size of prey loads delivered to nestlings.
Objective: To determine if there is a statistically significant division of labor between male and female parents.
# Load the dataset
birds <- read_csv("birds.csv")
# Clean and format data for analysis
# 1. Filter for Parents only (Exclude 'S' - likely intruders/starlings)
# 2. Convert columns to numeric (handling errors/text in numeric fields)
# 3. Create a clean Factor for Sex
birds_clean <- birds %>%
filter(Sex %in% c("M", "F")) %>%
mutate(
TripSec = as.numeric(TripSec),
Loadmm = as.numeric(Loadmm),
AtBoxSec = as.numeric(AtBoxSec),
PreyType = str_to_title(PreyType) # Standardize text
) %>%
drop_na(TripSec) # Remove rows where primary variable is missing
# Display a preview of the clean data
head(birds_clean) %>% kable()
| Box | Year | Date | File | Time | Sex | Combo | Band | BreedID | Scorer | TapeType | Experiment | BroodSize | Nage | Visit | FileLength | FileStart | TimeON | TimeIn | TimeOut | TimeOFF | 0:48:20 | TripTime | TripSec | LogTripSec | AtBox | AtBoxSec | LogAtBox | InBox | InBoxSec | LogInBox | LoadSize | Loadmm | LogLoad | NumItems | PreyType | Fecalsac | Notes |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 2 | 00:58:46 | 00:00:00 | 00:19:08 | 00:19:10 | 00:25:36 | 00:25:39 | NA | 00:09:02 | 542 | 2.73 | 00:06:31 | 391 | 2.59 | 00:06:26 | 386 | 2.59 | NA | NA | NA | NA | Too Fast | NA | NA |
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 3 | 00:58:46 | 00:00:00 | 00:31:26 | 00:31:33 | 00:34:35 | 00:34:40 | NA | 00:05:47 | 347 | 2.54 | 00:03:14 | 194 | 2.29 | 00:03:02 | 182 | 2.26 | 0.3 | 65.4 | 1.8 | 1 | Bit | NA | NA |
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 4 | 00:58:46 | 00:00:00 | 00:38:10 | 00:38:20 | 00:44:35 | 00:44:36 | NA | 00:03:30 | 210 | 2.32 | 00:06:26 | 386 | 2.59 | 00:06:15 | 375 | 2.57 | NA | NA | NA | NA | Inside The Bill | NA | NA |
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 5 | 00:58:46 | 00:00:00 | 00:47:40 | 00:47:47 | 00:48:20 | 00:48:27 | NA | 00:03:04 | 184 | 2.26 | 00:00:47 | 47 | 1.67 | 00:00:33 | 33 | 1.52 | NA | NA | NA | NA | NA | NA | NA |
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 6 | 00:58:46 | 00:00:00 | 00:51:30 | 00:51:31 | 00:52:50 | 00:53:00 | NA | 00:03:03 | 183 | 2.26 | 00:01:30 | 90 | 1.95 | 00:01:19 | 79 | 1.90 | 0.3 | 65.4 | 1.8 | 1 | Bit | NA | NA |
| C2 | 2013 | 4/28/2013 | 1 | 08:37:00 | F | NA | 234101182 | 4072 | JL | Normal | Control | 3 | 1 | 7 | 00:58:46 | 00:00:00 | 00:57:12 | 00:57:30 | 00:10:17 | 00:10:19 | O | 00:04:08 | 248 | 2.39 | 00:11:53 | 713 | 2.85 | 00:11:33 | 693 | 2.84 | NA | NA | NA | NA | Inside The Bill | NA | NA |
Before applying statistical tests, we visualize the distributions to understand the “shape” of our data.
We begin by comparing the raw volume of work. Who performs more foraging trips?
# Chart 1: Bar Chart (High on Cleveland Spectrum - Position)
ggplot(birds_clean, aes(x = Sex, fill = Sex)) +
geom_bar(width = 0.6) +
scale_fill_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
labs(
title = "Total Foraging Trips by Parent",
subtitle = "Females completed significantly more trips during the observation period",
y = "Number of Trips",
x = "Parent Sex"
) +
theme(legend.position = "none")
Does spending more time foraging result in a larger food reward? We visualize this relationship using a scatterplot with best-fit lines.
# Chart 2: Scatterplot with Faceting
ggplot(birds_clean, aes(x = TripSec, y = Loadmm, color = Sex)) +
geom_point(alpha = 0.6, size = 3) +
geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
facet_wrap(~Sex) + # Faceting separates the trends
scale_color_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
labs(
title = "Foraging Efficiency: Time vs. Reward",
subtitle = "Flat trend lines suggest load size is independent of trip duration",
x = "Trip Duration (Seconds)",
y = "Prey Load Size (mm)"
) +
theme_light()
Visualizations suggest differences, but are they statistically significant? Here we apply inferential statistics to validate our observations.
Biological data is often noisy. We check for outliers in
TripSec that might skew our mean calculations.
# Calculate IQR boundaries
trip_stats <- birds_clean %>%
summarize(
Q1 = quantile(TripSec, 0.25),
Q3 = quantile(TripSec, 0.75),
IQR = IQR(TripSec)
)
lower_bound <- trip_stats$Q1 - 1.5 * trip_stats$IQR
upper_bound <- trip_stats$Q3 + 1.5 * trip_stats$IQR
# Identify Outliers
outliers <- birds_clean %>%
filter(TripSec < lower_bound | TripSec > upper_bound)
cat("Number of Outliers identified in Trip Duration:", nrow(outliers))
## Number of Outliers identified in Trip Duration: 0
Research Question: Do females forage for longer durations than males on average?
# We use a Welch Two Sample t-test (robust to unequal variances)
t_test_result <- t.test(TripSec ~ Sex, data = birds_clean)
# Print the tidy result
t_test_result %>% tidy() %>% kable()
| estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|---|
| 70.51748 | 298.3636 | 227.8462 | 0.9709189 | 0.3421466 | 21.99265 | -80.11008 | 221.145 | Welch Two Sample t-test | two.sided |
# Visualization of the Test
ggplot(birds_clean, aes(x = Sex, y = TripSec, fill = Sex)) +
geom_boxplot(alpha = 0.5, outlier.color = "red", outlier.shape = 1) +
geom_jitter(width = 0.1, alpha = 0.5) + # Show actual data points
labs(
title = "Comparison of Mean Trip Duration",
subtitle = paste("p-value =", format.pval(t_test_result$p.value, digits = 3)),
caption = "Red circles indicate outliers beyond 1.5x IQR",
y = "Duration (Seconds)"
) +
scale_fill_manual(values = c("F" = "#E76F51", "M" = "#264653"))
Interpretation: If the p-value is < 0.05, we reject the null hypothesis and conclude there is a significant difference in parental behavior. (See subtitle for exact value).
To understand the “structure” of parental effort, we use PCA to
reduce our dimensions (TripSec, AtBoxSec,
Loadmm) into principal components. This helps us see if
certain behaviors cluster together.
# 1. Prepare Data (Select numeric columns, remove NAs)
pca_data <- birds_clean %>%
select(TripSec, AtBoxSec, Loadmm) %>%
drop_na()
# 2. Run PCA (Scale is TRUE to normalize units)
pca_res <- prcomp(pca_data, scale. = TRUE)
# 3. Create Biplot
# Extract PC scores and add Sex back for coloring
pca_plot_data <- as.data.frame(pca_res$x)
pca_plot_data$Sex <- birds_clean$Sex[as.numeric(rownames(pca_data))]
ggplot(pca_plot_data, aes(x = PC1, y = PC2, color = Sex)) +
geom_point(size = 3, alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
stat_ellipse(level = 0.95) + # Adds 95% confidence ellipses
scale_color_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
labs(
title = "PCA of Parental Effort",
subtitle = "Clustering shows distinct behavioral profiles for Males vs. Females",
x = "PC1 (Variance Explained)",
y = "PC2 (Variance Explained)"
)
Through visual and statistical analysis, we observed that:
This workflow demonstrates the use of Data Cleaning (dplyr), Visualization (ggplot2), and Statistical Inference (stats) to derive biological insights from raw observational data.