In Golem Grad, an island in North Macedonia, male Hermann’s tortoises outnumber females by a ratio of 19:1, leaving the population in a precarious position. In this population, male tortoises are sexually coercive, or aggressive. This aggression puts the females at risk for fatal falls from the island’s plateau where the tortoises live. Males will pursue females in “copulatory trains” where multiples males pursue one female at a single time. A count known as BCI (Body Condition Index) is a measurement to compare the quality the tortoise is in physically. These observations were taken for each tortoise seen through the years in three different locations and in either spring or summer. The variable known as “recode” means that they were seen that season so 1 would be the first year 2 the second and so on. So what is driving these tortoises to be at such a skewed gender ratio, what is allowing certain females to survive in comparison to females that don’t. With this dataset we want to see what the effects of factors such as sex, season, and locality play on the BCI of Hermann’s Tortoises.
Data sets were already cleaned so no cleaning need to be done
tortoise_body_condition_cleaned <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-03/tortoise_body_condition_cleaned.csv')
## Rows: 10174 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): individual, season, locality, sex
## dbl (5): year, year_recode, body_mass_grams, body_condition_index, straight_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Lets confirm our factors and variables
# Make sure variables are correct types
tortoise_body_condition_cleaned$sex <- as.factor(tortoise_body_condition_cleaned$sex)
tortoise_body_condition_cleaned$season <- as.factor(tortoise_body_condition_cleaned$season)
tortoise_body_condition_cleaned$locality <- as.factor(tortoise_body_condition_cleaned$locality)
tortoise_body_condition_cleaned$year_recode <- as.numeric(tortoise_body_condition_cleaned$year_recode)
What is the relationship to body condition index based on sex. Does the location and/or season also play an effect on body condition?
This groups by max recode year
last_seen <- tortoise_body_condition_cleaned %>%
group_by(individual) %>%
filter(year_recode == max(year_recode, na.rm = TRUE)) %>%
slice(1) %>%
ungroup()
last_seen
## # A tibble: 2,139 × 9
## individual year year_recode season locality sex body_mass_grams
## <chr> <dbl> <dbl> <fct> <fct> <fct> <dbl>
## 1 1 2020 13 Summer Plateau m 1194
## 2 10 2010 3 Spring Beach f 982
## 3 100 2017 10 Spring Plateau m 1063
## 4 1000 2009 2 Spring Plateau m 924
## 5 1001 2021 14 Summer Plateau m 1268
## 6 1003 2010 3 Spring Plateau m 898
## 7 1004 2009 2 Spring Plateau m 1391
## 8 1005 2022 15 Summer Plateau m 1025
## 9 1006 2018 11 Summer Plateau m 1186
## 10 1007 2023 16 Spring Plateau m 1135
## # ℹ 2,129 more rows
## # ℹ 2 more variables: body_condition_index <dbl>,
## # straight_carapace_length_mm <dbl>
Now Grapical interpretation of the dataset
#Sex x BCI
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = sex)) +
geom_boxplot(alpha = 0.7) +
theme_pubr() +
scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
theme(legend.position = "none") +
labs(
title = "Body Condition Index by Sex",
x = "Sex",
y = "BCI" )
#location x BCI?
ggplot(last_seen, aes(x = locality, y = body_condition_index, fill = sex)) + geom_boxplot(alpha = 0.7) +
theme_pubr() +
scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
labs(
title = "Body Condition Index by Location and Sex",
x = "Location",
y = "Body Condition Index",
fill = "sex"
)
##INTERACTION
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = locality)) +
geom_boxplot(alpha = 0.7) +
theme_pubr() +
labs(
title = "Interaction of Sex and Location on BCI",
x = "Sex",
y = "Body Condition Index",
fill = "locality"
)
##By Season
ggplot(last_seen, aes(x = season, y = body_condition_index, fill = sex)) +
geom_boxplot(alpha = 0.7) +
theme_pubr() +
scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
labs(
title = "BCI by Season and Sex",
x = "Season",
y = "Body Condition Index"
)
#all
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = sex)) +
geom_boxplot(alpha = 0.7) +
facet_grid(season ~ locality) +
theme_pubr() +
scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
theme(legend.position = "none") +
labs(
title = "Body Condition Index by Sex Across Season and Location",
x = "Sex",
y = "Body Condition Index"
)
Check data for Normailty
## Base model (untransformed)
tortoise_model <- lm(body_condition_index ~ sex * season, data = last_seen)
# Diagnostics for base model
hist(resid(tortoise_model))
ggqqplot(resid(tortoise_model))
shapiro.test(resid(tortoise_model))
##
## Shapiro-Wilk normality test
##
## data: resid(tortoise_model)
## W = 0.99528, p-value = 2.684e-06
leveneTest(body_condition_index ~ sex * season, data = last_seen)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 94.15 < 2.2e-16 ***
## 2135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Transform Done for Better Normailty
# Log-transformed model
last_seen$log_bci <- log(last_seen$body_condition_index)
tortoise_model_log <- lm(log_bci ~ sex * season, data = last_seen)
# nomality for log model
hist(resid(tortoise_model_log))
ggqqplot(resid(tortoise_model_log))
shapiro.test(resid(tortoise_model_log))
##
## Shapiro-Wilk normality test
##
## data: resid(tortoise_model_log)
## W = 0.99487, p-value = 9.672e-07
leveneTest(log_bci ~ sex * season, data = last_seen)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 37.047 < 2.2e-16 ***
## 2135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Two-Way ANOVA done on the effect of BCI based on sex depending on what season
# Final ANOVA (use log version)
tortoise_anova <- aov(log_bci ~ sex * season, data = last_seen)
summary(tortoise_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 20.43 20.434 987.915 < 2e-16 ***
## season 1 0.17 0.175 8.441 0.00371 **
## sex:season 1 0.00 0.003 0.140 0.70835
## Residuals 2135 44.16 0.021
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#post-hoc
TukeyHSD(tortoise_anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = log_bci ~ sex * season, data = last_seen)
##
## $sex
## diff lwr upr p adj
## m-f -0.2134496 -0.2267674 -0.2001319 0
##
## $season
## diff lwr upr p adj
## Summer-Spring 0.01820878 0.005917457 0.03050011 0.0037079
##
## $`sex:season`
## diff lwr upr p adj
## m:Spring-f:Spring -0.21585199 -0.2390639636 -0.19264001 0.0000000
## f:Summer-f:Spring 0.01461242 -0.0149008529 0.04412569 0.5803051
## m:Summer-f:Spring -0.19611337 -0.2202481413 -0.17197861 0.0000000
## f:Summer-m:Spring 0.23046441 0.2048019237 0.25612689 0.0000000
## m:Summer-m:Spring 0.01973861 0.0005027933 0.03897443 0.0417434
## m:Summer-f:Summer -0.21072579 -0.2372258696 -0.18422572 0.0000000
The variables sex and season are significant predictors of body condition index (BCI). (sex: F = 987.92, p < 0.001; season: F = 8.44, p < 0.01; sex × season: F = 0.14, p > 0.05)
Linear Mixed Effects model done to test wheter also season had any effect on the final Two-Way ANOVA
# Mixed Effects model
tortoise_lmm <- lmer(log_bci ~ sex * season + (1 | locality), data = last_seen)
summary(tortoise_lmm)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log_bci ~ sex * season + (1 | locality)
## Data: last_seen
##
## REML criterion at convergence: -2528.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.0124 -0.6496 0.0424 0.6962 3.0495
##
## Random effects:
## Groups Name Variance Std.Dev.
## locality (Intercept) 0.01181 0.1087
## Residual 0.01761 0.1327
## Number of obs: 2139, groups: locality, 3
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 1.949e+00 6.323e-02 2.035e+00 30.824 0.000954 ***
## sexm -1.630e-01 9.293e-03 2.134e+03 -17.542 < 2e-16 ***
## seasonSummer 1.279e-03 1.062e-02 2.133e+03 0.121 0.904079
## sexm:seasonSummer 1.333e-02 1.265e-02 2.133e+03 1.054 0.292080
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) sexm ssnSmm
## sexm -0.097
## seasonSummr -0.068 0.470
## sxm:ssnSmmr 0.059 -0.579 -0.838
anova(tortoise_lmm)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## sex 7.3912 7.3912 1 2134.9 419.6091 <2e-16 ***
## season 0.0277 0.0277 1 2133.1 1.5705 0.2103
## sex:season 0.0196 0.0196 1 2133.0 1.1106 0.2921
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#post-hoc
emmeans(tortoise_lmm, ~ sex)
## NOTE: Results may be misleading due to involvement in interactions
## sex emmean SE df lower.CL upper.CL
## f 1.95 0.0631 2.02 1.68 2.22
## m 1.79 0.0629 2.01 1.52 2.06
##
## Results are averaged over the levels of: season
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
emmeans(tortoise_lmm, ~ season)
## NOTE: Results may be misleading due to involvement in interactions
## season emmean SE df lower.CL upper.CL
## Spring 1.87 0.063 2.01 1.60 2.14
## Summer 1.88 0.063 2.01 1.61 2.14
##
## Results are averaged over the levels of: sex
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
The variables sex and season are significant predictors of body condition index (BCI). (sex: F = 419.61, p < 0.001; season: F = 1.57, p > 0.05; sex × season: F = 1.11, p > 0.05)
The results show a pattern across both the ANOVA and the linear mixed-effects model: sex is the dominant predictor of body condition index (BCI), with females having higher body condition than males. This difference remains strong even after accounting for seasonal variation and localities, suggesting that it reflects a robust biological pattern rather than a sampling artifact (Rugiero & Luiselli, 2006).
One possible explanation for higher BCI in females is survival-based selection. Females with lower body condition may be less likely to survive long enough to be detected in the population, meaning that the observed sample is biased toward higher-quality individuals (Sibeaux et al., 2016). This could create an apparent pattern where only the most robust females are present in the dataset, inflating the average female body condition relative to males. In addition, females may invest differently in energy allocation, prioritising storage of reserves needed for reproduction, such as egg production, which could contribute to consistently higher body condition values compared to males (Sibeaux et al., 2016).
Seasonal effects were weak in the two-way ANOVA and absent in the Linear Mixed Effect model, suggesting that short-term environmental variation between spring and summer in the ANOVA could just be the variation of data based on which of the three locations the data was recorded from (Rugiero & Luiselli, 2006). This may indicate that Hermann’s tortoise maintain relatively stable energy reserves across seasons, or that any seasonal changes are too small to detect given the variability among individuals.
In summary, this study has shown that the body condition in Hermann’s tortoises is primarily influenced by the biological sex of the organism. The findings from this research correlate well with ecological predictions, whereby female individuals may face different pressures concerning energy and survival. One possible explanation is that differences in energy allocation and reproductive investment lead to higher body condition in females compared to males. That these better body conditions could also reinforce the survival-based processes where only higher-quality females persist in the population.
Tidy tuesday data: https://github.com/rfordatascience/tidytuesday/blob/main/data/2026/2026-03-03/intro.md
background info and photo from nyt: https://www.nytimes.com/2026/02/14/science/tortoises-island-sex-cliff.html
Sources Sibeaux, A. et al. (2016), Sex-specific ecophysiological responses to environmental fluctuations of free-ranging Hermann’s tortoises https://pmc.ncbi.nlm.nih.gov/articles/PMC5142051/
Rugiero, L. & Luiselli, L. (2006), Ecological modelling of habitat use and the annual activity patterns in an urban population of the tortoise, Testudo hermanni https://www.tandfonline.com/doi/full/10.1080/11250000600700086