Part 1 - Introduction

Tortoises on Golem Grad
Tortoises on Golem Grad

In Golem Grad, an island in North Macedonia, male Hermann’s tortoises outnumber females by a ratio of 19:1, leaving the population in a precarious position. In this population, male tortoises are sexually coercive, or aggressive. This aggression puts the females at risk for fatal falls from the island’s plateau where the tortoises live. Males will pursue females in “copulatory trains” where multiples males pursue one female at a single time. A count known as BCI (Body Condition Index) is a measurement to compare the quality the tortoise is in physically. These observations were taken for each tortoise seen through the years in three different locations and in either spring or summer. The variable known as “recode” means that they were seen that season so 1 would be the first year 2 the second and so on. So what is driving these tortoises to be at such a skewed gender ratio, what is allowing certain females to survive in comparison to females that don’t. With this dataset we want to see what the effects of factors such as sex, season, and locality play on the BCI of Hermann’s Tortoises.

Data sets were already cleaned so no cleaning need to be done
tortoise_body_condition_cleaned <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-03/tortoise_body_condition_cleaned.csv')
## Rows: 10174 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): individual, season, locality, sex
## dbl (5): year, year_recode, body_mass_grams, body_condition_index, straight_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Lets confirm our factors and variables
# Make sure variables are correct types
tortoise_body_condition_cleaned$sex <- as.factor(tortoise_body_condition_cleaned$sex)
tortoise_body_condition_cleaned$season <- as.factor(tortoise_body_condition_cleaned$season)
tortoise_body_condition_cleaned$locality <- as.factor(tortoise_body_condition_cleaned$locality)

tortoise_body_condition_cleaned$year_recode <- as.numeric(tortoise_body_condition_cleaned$year_recode)

Part 2 - Main Research Question

What is the relationship to body condition index based on sex. Does the location and/or season also play an effect on body condition?

Part 3 - Exploring the Data (Descriptive Statistics)

This groups by max recode year 
last_seen <- tortoise_body_condition_cleaned %>%
  group_by(individual) %>%
  filter(year_recode == max(year_recode, na.rm = TRUE)) %>%
  slice(1) %>%  
  ungroup() 
last_seen
## # A tibble: 2,139 × 9
##    individual  year year_recode season locality sex   body_mass_grams
##    <chr>      <dbl>       <dbl> <fct>  <fct>    <fct>           <dbl>
##  1 1           2020          13 Summer Plateau  m                1194
##  2 10          2010           3 Spring Beach    f                 982
##  3 100         2017          10 Spring Plateau  m                1063
##  4 1000        2009           2 Spring Plateau  m                 924
##  5 1001        2021          14 Summer Plateau  m                1268
##  6 1003        2010           3 Spring Plateau  m                 898
##  7 1004        2009           2 Spring Plateau  m                1391
##  8 1005        2022          15 Summer Plateau  m                1025
##  9 1006        2018          11 Summer Plateau  m                1186
## 10 1007        2023          16 Spring Plateau  m                1135
## # ℹ 2,129 more rows
## # ℹ 2 more variables: body_condition_index <dbl>,
## #   straight_carapace_length_mm <dbl>
Now Grapical interpretation of the dataset
#Sex x BCI
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = sex)) +
  geom_boxplot(alpha = 0.7) +
  theme_pubr() +
  scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
  theme(legend.position = "none") +
  labs(
    title = "Body Condition Index by Sex",
    x = "Sex",
    y = "BCI" )

#location x BCI?
ggplot(last_seen, aes(x = locality, y = body_condition_index, fill = sex)) + geom_boxplot(alpha = 0.7) +
  theme_pubr() +
  scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
  labs(
    title = "Body Condition Index by Location and Sex",
    x = "Location",
    y = "Body Condition Index",
    fill = "sex"
  )

##INTERACTION
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = locality)) +
  geom_boxplot(alpha = 0.7) +
  theme_pubr() +
  labs(
    title = "Interaction of Sex and Location on BCI",
    x = "Sex",
    y = "Body Condition Index",
    fill = "locality"
  )

##By Season
ggplot(last_seen, aes(x = season, y = body_condition_index, fill = sex)) +
  geom_boxplot(alpha = 0.7) +
  theme_pubr() +
  scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
  labs(
    title = "BCI by Season and Sex",
    x = "Season",
    y = "Body Condition Index"
  )

#all
ggplot(last_seen, aes(x = sex, y = body_condition_index, fill = sex)) +
  geom_boxplot(alpha = 0.7) +
  facet_grid(season ~ locality) +
  theme_pubr() +
  scale_fill_manual(values = c("m" = "#3384ea", "f" = "#e7298a")) +
  theme(legend.position = "none") +
  labs(
    title = "Body Condition Index by Sex Across Season and Location",
    x = "Sex",
    y = "Body Condition Index"
  )

Check data for Normailty
## Base model (untransformed)
tortoise_model <- lm(body_condition_index ~ sex * season, data = last_seen)

# Diagnostics for base model
hist(resid(tortoise_model))

ggqqplot(resid(tortoise_model))

shapiro.test(resid(tortoise_model))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(tortoise_model)
## W = 0.99528, p-value = 2.684e-06
leveneTest(body_condition_index ~ sex * season, data = last_seen)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3   94.15 < 2.2e-16 ***
##       2135                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Transform Done for Better Normailty
# Log-transformed model
last_seen$log_bci <- log(last_seen$body_condition_index)

tortoise_model_log <- lm(log_bci ~ sex * season, data = last_seen)

# nomality for log model
hist(resid(tortoise_model_log))

ggqqplot(resid(tortoise_model_log))

shapiro.test(resid(tortoise_model_log))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(tortoise_model_log)
## W = 0.99487, p-value = 9.672e-07
leveneTest(log_bci ~ sex * season, data = last_seen)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    3  37.047 < 2.2e-16 ***
##       2135                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Two-Way ANOVA done on the effect of BCI based on sex depending on what season
# Final ANOVA (use log version)
tortoise_anova <- aov(log_bci ~ sex * season, data = last_seen)
summary(tortoise_anova)
##               Df Sum Sq Mean Sq F value  Pr(>F)    
## sex            1  20.43  20.434 987.915 < 2e-16 ***
## season         1   0.17   0.175   8.441 0.00371 ** 
## sex:season     1   0.00   0.003   0.140 0.70835    
## Residuals   2135  44.16   0.021                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#post-hoc 
TukeyHSD(tortoise_anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = log_bci ~ sex * season, data = last_seen)
## 
## $sex
##           diff        lwr        upr p adj
## m-f -0.2134496 -0.2267674 -0.2001319     0
## 
## $season
##                     diff         lwr        upr     p adj
## Summer-Spring 0.01820878 0.005917457 0.03050011 0.0037079
## 
## $`sex:season`
##                          diff           lwr         upr     p adj
## m:Spring-f:Spring -0.21585199 -0.2390639636 -0.19264001 0.0000000
## f:Summer-f:Spring  0.01461242 -0.0149008529  0.04412569 0.5803051
## m:Summer-f:Spring -0.19611337 -0.2202481413 -0.17197861 0.0000000
## f:Summer-m:Spring  0.23046441  0.2048019237  0.25612689 0.0000000
## m:Summer-m:Spring  0.01973861  0.0005027933  0.03897443 0.0417434
## m:Summer-f:Summer -0.21072579 -0.2372258696 -0.18422572 0.0000000
The variables sex and season are significant predictors of body condition index (BCI). (sex: F = 987.92, p < 0.001; season: F = 8.44, p < 0.01; sex × season: F = 0.14, p > 0.05)
Linear Mixed Effects model done to test wheter also season had any effect on the final Two-Way ANOVA
# Mixed Effects model
tortoise_lmm <- lmer(log_bci ~ sex * season + (1 | locality), data = last_seen)


summary(tortoise_lmm)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: log_bci ~ sex * season + (1 | locality)
##    Data: last_seen
## 
## REML criterion at convergence: -2528.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0124 -0.6496  0.0424  0.6962  3.0495 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  locality (Intercept) 0.01181  0.1087  
##  Residual             0.01761  0.1327  
## Number of obs: 2139, groups:  locality, 3
## 
## Fixed effects:
##                     Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)        1.949e+00  6.323e-02  2.035e+00  30.824 0.000954 ***
## sexm              -1.630e-01  9.293e-03  2.134e+03 -17.542  < 2e-16 ***
## seasonSummer       1.279e-03  1.062e-02  2.133e+03   0.121 0.904079    
## sexm:seasonSummer  1.333e-02  1.265e-02  2.133e+03   1.054 0.292080    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) sexm   ssnSmm
## sexm        -0.097              
## seasonSummr -0.068  0.470       
## sxm:ssnSmmr  0.059 -0.579 -0.838
anova(tortoise_lmm)
## Type III Analysis of Variance Table with Satterthwaite's method
##            Sum Sq Mean Sq NumDF  DenDF  F value Pr(>F)    
## sex        7.3912  7.3912     1 2134.9 419.6091 <2e-16 ***
## season     0.0277  0.0277     1 2133.1   1.5705 0.2103    
## sex:season 0.0196  0.0196     1 2133.0   1.1106 0.2921    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#post-hoc 
emmeans(tortoise_lmm, ~ sex)
## NOTE: Results may be misleading due to involvement in interactions
##  sex emmean     SE   df lower.CL upper.CL
##  f     1.95 0.0631 2.02     1.68     2.22
##  m     1.79 0.0629 2.01     1.52     2.06
## 
## Results are averaged over the levels of: season 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95
emmeans(tortoise_lmm, ~ season)
## NOTE: Results may be misleading due to involvement in interactions
##  season emmean    SE   df lower.CL upper.CL
##  Spring   1.87 0.063 2.01     1.60     2.14
##  Summer   1.88 0.063 2.01     1.61     2.14
## 
## Results are averaged over the levels of: sex 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95
The variables sex and season are significant predictors of body condition index (BCI). (sex: F = 419.61, p < 0.001; season: F = 1.57, p > 0.05; sex × season: F = 1.11, p > 0.05)

Part 4 - Discussion

The results show a pattern across both the ANOVA and the linear mixed-effects model: sex is the dominant predictor of body condition index (BCI), with females having higher body condition than males. This difference remains strong even after accounting for seasonal variation and localities, suggesting that it reflects a robust biological pattern rather than a sampling artifact (Rugiero & Luiselli, 2006).

One possible explanation for higher BCI in females is survival-based selection. Females with lower body condition may be less likely to survive long enough to be detected in the population, meaning that the observed sample is biased toward higher-quality individuals (Sibeaux et al., 2016). This could create an apparent pattern where only the most robust females are present in the dataset, inflating the average female body condition relative to males. In addition, females may invest differently in energy allocation, prioritising storage of reserves needed for reproduction, such as egg production, which could contribute to consistently higher body condition values compared to males (Sibeaux et al., 2016).

Seasonal effects were weak in the two-way ANOVA and absent in the Linear Mixed Effect model, suggesting that short-term environmental variation between spring and summer in the ANOVA could just be the variation of data based on which of the three locations the data was recorded from (Rugiero & Luiselli, 2006). This may indicate that Hermann’s tortoise maintain relatively stable energy reserves across seasons, or that any seasonal changes are too small to detect given the variability among individuals.

Part 5 - Conclusion

In summary, this study has shown that the body condition in Hermann’s tortoises is primarily influenced by the biological sex of the organism. The findings from this research correlate well with ecological predictions, whereby female individuals may face different pressures concerning energy and survival. One possible explanation is that differences in energy allocation and reproductive investment lead to higher body condition in females compared to males. That these better body conditions could also reinforce the survival-based processes where only higher-quality females persist in the population.

Part 6 - References

Tidy tuesday data: https://github.com/rfordatascience/tidytuesday/blob/main/data/2026/2026-03-03/intro.md

background info and photo from nyt: https://www.nytimes.com/2026/02/14/science/tortoises-island-sex-cliff.html

Sources Sibeaux, A. et al. (2016), Sex-specific ecophysiological responses to environmental fluctuations of free-ranging Hermann’s tortoises https://pmc.ncbi.nlm.nih.gov/articles/PMC5142051/

Rugiero, L. & Luiselli, L. (2006), Ecological modelling of habitat use and the annual activity patterns in an urban population of the tortoise, Testudo hermanni https://www.tandfonline.com/doi/full/10.1080/11250000600700086