Data Expeditions Project

EvAnth Summer ’25

Author

Sammy Ciuni, Chelsea Nguyen

Hypotheses and Predictions

Hypothesis: A higher weaning age signifies a prolonged developmental period necessary for higher cognitive processes.

  • Higher cognitive processes can be quantified by the ratio between brain mass and body mass, otherwise called the encephalization quotient.

Prediction: Primates with a larger encephalization are more likely to have a higher weaning age
Null hypothesis: No correlation exists between the developmental period length and a larger encephalization quotient.
Alt hypothesis: A correlation exists between the developmental period length and a larger encephalization quotient.

Our hypothesis aims to reject the null.

Variables we are interested in:

  • mean_BodyMass_kg: The mean body mass of primates in kilograms. In our data exploration, we will mutate this column to be the mean body mass in g.

  • Mean_brain_mass_g: The mean brain mass of primates in grams.

  • Age_Weaning_d: The age of a primate/mammal when it is able to consume food other than its mother’s milk.

Data Exploration

We’ve begun with selecting the following columns from our dataset titled primates:

  • Family

  • Genus

  • CommonName

  • Species

  • mean_BodyMass_kg

  • Mean_brain_mass_g

  • Age_Weaning_d

We then got rid of the NA values in the numerical columns and renamed the dataset from primates to primates_selected. A glimpse of the dataset is below.

# A tibble: 90 × 7
   Family          Genus   CommonName Species mean_BodyMass_kg Mean_brain_mass_g
   <chr>           <chr>   <chr>      <chr>              <dbl>             <dbl>
 1 Cercopithecidae Alleno… Allen_s_S… Alleno…             5.3               58.0
 2 Daubentoniidae  Dauben… Aye_aye    Dauben…             2.54              45.2
 3 Pitheciidae     Cacajao Bald_Uaca… Cacaja…             3.17              74.3
 4 Cercopithecidae Macaca  Barbary_M… Macaca…            13.5               87.7
 5 Cercopithecidae Semnop… Bengal_Sa… Semnop…            11.4              112. 
 6 Lemuridae       Varecia Black_and… Vareci…             3.55              31.2
 7 Lemuridae       Eulemur Black_Lem… Eulemu…             2.06              22.6
 8 Cercopithecidae Cercop… Blue_Monk… Cercop…             4.89              75  
 9 Cercopithecidae Macaca  Bonnet_Ma… Macaca…             5.14              69.4
10 Hominidae       Pan     Bonobo     Pan_pa…            39.1              330. 
# ℹ 80 more rows
# ℹ 1 more variable: Age_Weaning_d <dbl>

Next, we mutated and overwrote the dataset to do the following:

  • Change mean_BodyMass_g from kilograms to grams.

  • Find the encephalization quotient by dividing Mean_brain_mass_g by mean_BodyMass_g.

  • Renamed this dataset primates_enceph

A glimpse of the dataset is below.

# A tibble: 90 × 9
   Family          Genus   CommonName Species mean_BodyMass_kg Mean_brain_mass_g
   <chr>           <chr>   <chr>      <chr>              <dbl>             <dbl>
 1 Cercopithecidae Alleno… Allen_s_S… Alleno…             5.3               58.0
 2 Daubentoniidae  Dauben… Aye_aye    Dauben…             2.54              45.2
 3 Pitheciidae     Cacajao Bald_Uaca… Cacaja…             3.17              74.3
 4 Cercopithecidae Macaca  Barbary_M… Macaca…            13.5               87.7
 5 Cercopithecidae Semnop… Bengal_Sa… Semnop…            11.4              112. 
 6 Lemuridae       Varecia Black_and… Vareci…             3.55              31.2
 7 Lemuridae       Eulemur Black_Lem… Eulemu…             2.06              22.6
 8 Cercopithecidae Cercop… Blue_Monk… Cercop…             4.89              75  
 9 Cercopithecidae Macaca  Bonnet_Ma… Macaca…             5.14              69.4
10 Hominidae       Pan     Bonobo     Pan_pa…            39.1              330. 
# ℹ 80 more rows
# ℹ 3 more variables: Age_Weaning_d <dbl>, mean_BodyMass_g <dbl>,
#   enceph_quotient <dbl>

In making our figure, we took the log of the enceph_quotient and age_weaning_d columns to make the relationship linear when plotting them on a scatter plot. We have added labels to the figure and a line of best fit.

`geom_smooth()` using formula = 'y ~ x'

Now that we have a visualization of the relationship, we then fit a linear model between the two transformed variables to find the p-value. The p-value will help us test if the relationship between weaning age and the encephalization quotient is statistically significant. If the p-value is less than an alpha value of 0.05, then we will reject the null hypothesis and accept the alternative, deeming that there is a significant correlation between the variables. If the p-value is above 0.05, we will fail to reject the null and deem no significant relationship.

# A tibble: 2 × 5
  term            estimate std.error statistic  p.value
  <chr>              <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)       -2.79     0.355      -7.85 9.21e-12
2 log_weaning_age   -0.281    0.0633     -4.44 2.56e- 5

The p-value, 0.000025, is less than 0.05, so we reject the null and accept the alternative hypothesis. Thus, there exists a statistically significant relationship between weaning age and the encephalization quotient.