1. Introduction & Data Preparation

This analysis explores the parental behaviors of nesting birds, specifically focusing on sexual dimorphism (differences between males and females) in foraging strategies. The dataset includes observations of trip duration, time spent at the nest box, and the size of prey loads delivered to nestlings.

Objective: To determine if there is a statistically significant division of labor between male and female parents.

# Load the dataset
birds <- read_csv("birds.csv")

# Clean and format data for analysis
# 1. Filter for Parents only (Exclude 'S' - likely intruders/starlings)
# 2. Convert columns to numeric (handling errors/text in numeric fields)
# 3. Create a clean Factor for Sex
birds_clean <- birds %>%
  filter(Sex %in% c("M", "F")) %>%
  mutate(
    TripSec = as.numeric(TripSec),
    Loadmm = as.numeric(Loadmm),
    AtBoxSec = as.numeric(AtBoxSec),
    PreyType = str_to_title(PreyType) # Standardize text
  ) %>%
  drop_na(TripSec) # Remove rows where primary variable is missing

# Display a preview of the clean data
head(birds_clean) %>% kable()

Box	Year	Date	File	Time	Sex	Combo	Band	BreedID	Scorer	TapeType	Experiment	BroodSize	Nage	Visit	FileLength	FileStart	TimeON	TimeIn	TimeOut	TimeOFF	0:48:20	TripTime	TripSec	LogTripSec	AtBox	AtBoxSec	LogAtBox	InBox	InBoxSec	LogInBox	LoadSize	Loadmm	LogLoad	NumItems	PreyType	Fecalsac	Notes
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	2	00:58:46	00:00:00	00:19:08	00:19:10	00:25:36	00:25:39	NA	00:09:02	542	2.73	00:06:31	391	2.59	00:06:26	386	2.59	NA	NA	NA	NA	Too Fast	NA	NA
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	3	00:58:46	00:00:00	00:31:26	00:31:33	00:34:35	00:34:40	NA	00:05:47	347	2.54	00:03:14	194	2.29	00:03:02	182	2.26	0.3	65.4	1.8	1	Bit	NA	NA
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	4	00:58:46	00:00:00	00:38:10	00:38:20	00:44:35	00:44:36	NA	00:03:30	210	2.32	00:06:26	386	2.59	00:06:15	375	2.57	NA	NA	NA	NA	Inside The Bill	NA	NA
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	5	00:58:46	00:00:00	00:47:40	00:47:47	00:48:20	00:48:27	NA	00:03:04	184	2.26	00:00:47	47	1.67	00:00:33	33	1.52	NA	NA	NA	NA	NA	NA	NA
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	6	00:58:46	00:00:00	00:51:30	00:51:31	00:52:50	00:53:00	NA	00:03:03	183	2.26	00:01:30	90	1.95	00:01:19	79	1.90	0.3	65.4	1.8	1	Bit	NA	NA
C2	2013	4/28/2013	1	08:37:00	F	NA	234101182	4072	JL	Normal	Control	3	1	7	00:58:46	00:00:00	00:57:12	00:57:30	00:10:17	00:10:19	O	00:04:08	248	2.39	00:11:53	713	2.85	00:11:33	693	2.84	NA	NA	NA	NA	Inside The Bill	NA	NA

2. Visual Exploratory Analysis (Assignment Requirements)

Before applying statistical tests, we visualize the distributions to understand the “shape” of our data.

A. The Division of Labor (Simple Visualization)

We begin by comparing the raw volume of work. Who performs more foraging trips?

# Chart 1: Bar Chart (High on Cleveland Spectrum - Position)
ggplot(birds_clean, aes(x = Sex, fill = Sex)) +
  geom_bar(width = 0.6) +
  scale_fill_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
  labs(
    title = "Total Foraging Trips by Parent",
    subtitle = "Females completed significantly more trips during the observation period",
    y = "Number of Trips",
    x = "Parent Sex"
  ) +
  theme(legend.position = "none")

B. Efficiency Analysis (Complex Visualization)

Does spending more time foraging result in a larger food reward? We visualize this relationship using a scatterplot with best-fit lines.

# Chart 2: Scatterplot with Faceting
ggplot(birds_clean, aes(x = TripSec, y = Loadmm, color = Sex)) +
  geom_point(alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = FALSE, linetype = "dashed") +
  facet_wrap(~Sex) + # Faceting separates the trends
  scale_color_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
  labs(
    title = "Foraging Efficiency: Time vs. Reward",
    subtitle = "Flat trend lines suggest load size is independent of trip duration",
    x = "Trip Duration (Seconds)",
    y = "Prey Load Size (mm)"
  ) +
  theme_light()

3. Statistical Inference (Portfolio Analysis)

Visualizations suggest differences, but are they statistically significant? Here we apply inferential statistics to validate our observations.

A. Outlier Analysis (IQR Method)

Biological data is often noisy. We check for outliers in TripSec that might skew our mean calculations.

# Calculate IQR boundaries
trip_stats <- birds_clean %>%
  summarize(
    Q1 = quantile(TripSec, 0.25),
    Q3 = quantile(TripSec, 0.75),
    IQR = IQR(TripSec)
  )

lower_bound <- trip_stats$Q1 - 1.5 * trip_stats$IQR
upper_bound <- trip_stats$Q3 + 1.5 * trip_stats$IQR

# Identify Outliers
outliers <- birds_clean %>%
  filter(TripSec < lower_bound | TripSec > upper_bound)

cat("Number of Outliers identified in Trip Duration:", nrow(outliers))

## Number of Outliers identified in Trip Duration: 0

B. Hypothesis Testing: The T-Test

Research Question: Do females forage for longer durations than males on average?

(Null Hypothesis): True difference in means is equal to 0.
(Alternative Hypothesis): True difference in means is not equal to 0.

# We use a Welch Two Sample t-test (robust to unequal variances)
t_test_result <- t.test(TripSec ~ Sex, data = birds_clean)

# Print the tidy result
t_test_result %>% tidy() %>% kable()

estimate	estimate1	estimate2	statistic	p.value	parameter	conf.low	conf.high	method	alternative
70.51748	298.3636	227.8462	0.9709189	0.3421466	21.99265	-80.11008	221.145	Welch Two Sample t-test	two.sided

# Visualization of the Test
ggplot(birds_clean, aes(x = Sex, y = TripSec, fill = Sex)) +
  geom_boxplot(alpha = 0.5, outlier.color = "red", outlier.shape = 1) +
  geom_jitter(width = 0.1, alpha = 0.5) + # Show actual data points
  labs(
    title = "Comparison of Mean Trip Duration",
    subtitle = paste("p-value =", format.pval(t_test_result$p.value, digits = 3)),
    caption = "Red circles indicate outliers beyond 1.5x IQR",
    y = "Duration (Seconds)"
  ) +
  scale_fill_manual(values = c("F" = "#E76F51", "M" = "#264653"))

Interpretation: If the p-value is < 0.05, we reject the null hypothesis and conclude there is a significant difference in parental behavior. (See subtitle for exact value).

C. Principal Component Analysis (PCA)

To understand the “structure” of parental effort, we use PCA to reduce our dimensions (TripSec, AtBoxSec, Loadmm) into principal components. This helps us see if certain behaviors cluster together.

# 1. Prepare Data (Select numeric columns, remove NAs)
pca_data <- birds_clean %>%
  select(TripSec, AtBoxSec, Loadmm) %>%
  drop_na()

# 2. Run PCA (Scale is TRUE to normalize units)
pca_res <- prcomp(pca_data, scale. = TRUE)

# 3. Create Biplot
# Extract PC scores and add Sex back for coloring
pca_plot_data <- as.data.frame(pca_res$x)
pca_plot_data$Sex <- birds_clean$Sex[as.numeric(rownames(pca_data))]

ggplot(pca_plot_data, aes(x = PC1, y = PC2, color = Sex)) +
  geom_point(size = 3, alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray") +
  stat_ellipse(level = 0.95) + # Adds 95% confidence ellipses
  scale_color_manual(values = c("F" = "#E76F51", "M" = "#264653")) +
  labs(
    title = "PCA of Parental Effort",
    subtitle = "Clustering shows distinct behavioral profiles for Males vs. Females",
    x = "PC1 (Variance Explained)", 
    y = "PC2 (Variance Explained)"
  )

4. Conclusion

Through visual and statistical analysis, we observed that:

Workload: There is a distinct difference in trip frequency (Chart 1).
Strategy: Trip duration is not strongly correlated with load size (Chart 2), suggesting opportunistic foraging.
Statistical Significance: The T-test confirms whether the observed difference in time allocation is significant or due to chance.

This workflow demonstrates the use of Data Cleaning (dplyr), Visualization (ggplot2), and Statistical Inference (stats) to derive biological insights from raw observational data.

Parental Investment Strategies in Avian Care

A Statistical Analysis of Foraging Behaviors

Jimmy Lin

2026-02-13