Introduction (done)

Which body dimensions are the best indicators of gender?

This project explores the relationship between gender and several body dimensions (body girth measurements and skeletal diameter measurements). Logistic regression will be used because quantitative variables are being used to predict a binary outcome. The dataset used was built in 2003 and originally published in the Journal of Statistics Education. It contains 507 cases, 247 of which are male and 260 of which are female. It should be noted that only physically active individuals were included in this dataset, as obesity, pregnancy, and incapacitating conditions tend to unpredictably affect body dimensions. All measurements of length are in centimeters.

Though there are 21 unique measurements in this dataset as columns plus general variables (age, weight, height, gender), this project will only make use of the following:

wri_di Quantitative, wrist diameter, measured as sum of two wrists
kne_di Quantitative, knee diameter, measured as sum of two knees
ank_di Quantitative, ankle diameter, measured as sum of two ankles
wai_gi Quantitative, waist girth, measured at the narrowest part of torso below the rib cage as average of contracted and relaxed position
hip_gi Quantitative, hip girth, measured at level of bitrochanteric diameter
sex Categorical, 1 if respondent is male, 0 if female

“Body measurements of 507 physically active individuals.” OpenIntro. Originally published by Heinz, G et al. in “Exploring Relationships in Body Dimensions.”, Journal of Statistics Education 11(2). Retrieved from https://www.openintro.org/data/index.php?data=bdims on December 5,2025.

Data Analysis (done)

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.4.3
bdims <- read_csv("D:/DATA 101/Datasets/bdims.csv")
colSums(is.na(bdims)) # no nulls!
## bia_di bii_di bit_di che_de che_di elb_di wri_di kne_di ank_di sho_gi che_gi 
##      0      0      0      0      0      0      0      0      0      0      0 
## wai_gi nav_gi hip_gi thi_gi bic_gi for_gi kne_gi cal_gi ank_gi wri_gi    age 
##      0      0      0      0      0      0      0      0      0      0      0 
##    wgt    hgt    sex 
##      0      0      0
head(bdims, 10)
## # A tibble: 10 × 25
##    bia_di bii_di bit_di che_de che_di elb_di wri_di kne_di ank_di sho_gi che_gi
##     <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1   42.9   26     31.5   17.7   28     13.1   10.4   18.8   14.1   106.   89.5
##  2   43.7   28.5   33.5   16.9   30.8   14     11.8   20.6   15.1   110.   97  
##  3   40.1   28.2   33.3   20.9   31.7   13.9   10.9   19.7   14.1   115.   97.5
##  4   44.3   29.9   34     18.4   28.2   13.9   11.2   20.9   15     104.   97  
##  5   42.5   29.9   34     21.5   29.4   15.2   11.6   20.7   14.9   108.   97.5
##  6   43.3   27     31.5   19.6   31.3   14     11.5   18.8   13.9   120.   99.9
##  7   43.5   30     34     21.9   31.7   16.1   12.5   20.8   15.6   124.  107. 
##  8   44.4   29.8   33.2   21.8   28.8   15.1   11.9   21     14.6   120.  102. 
##  9   43.5   26.5   32.1   15.5   27.5   14.1   11.2   18.9   13.2   111    91  
## 10   42     28     34     22.5   28     15.6   12     21.1   15     120.   93.5
## # ℹ 14 more variables: wai_gi <dbl>, nav_gi <dbl>, hip_gi <dbl>, thi_gi <dbl>,
## #   bic_gi <dbl>, for_gi <dbl>, kne_gi <dbl>, cal_gi <dbl>, ank_gi <dbl>,
## #   wri_gi <dbl>, age <dbl>, wgt <dbl>, hgt <dbl>, sex <dbl>
data05 <- bdims |>
  select(wri_di, kne_di, ank_di, wai_gi, hip_gi, sex) |>
  rename(
    "wrist_diameter" = "wri_di",
    "knee_diameter" = "kne_di",
    "ankle_diameter" = "ank_di",
    "waist_girth" = "wai_gi",
    "hip_girth" = "hip_gi",
    "gender" = "sex"
  ) 

Regression Analysis (done)

lgrm <- glm(gender ~ ., data=data05, family="binomial")
summary(lgrm)
## 
## Call:
## glm(formula = gender ~ ., family = "binomial", data = data05)
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -29.66706    6.62986  -4.475 7.65e-06 ***
## wrist_diameter   2.77070    0.62390   4.441 8.96e-06 ***
## knee_diameter   -0.26524    0.33065  -0.802    0.422    
## ankle_diameter   2.22222    0.48184   4.612 3.99e-06 ***
## waist_girth      0.48475    0.06864   7.062 1.64e-12 ***
## hip_girth       -0.64495    0.09009  -7.159 8.12e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 702.518  on 506  degrees of freedom
## Residual deviance:  95.767  on 501  degrees of freedom
## AIC: 107.77
## 
## Number of Fisher Scoring iterations: 8

This model indicates that wrist diameter, ankle diameter, waist girth, and hip girth are all strong indicators for predicting gender (\(p<0.05\) and any common significance level). The signs on the z-values mean that high wrist, ankle, and waist measurements are strong indicators for being male while high hip girth is a strong indicator that the respondent is female (negative z-value, means it predicts the 0 case instead of the 1 case). Knee diameter isn’t useful for predicting anything, as noted by \(p=0.422\).

Significant variables:

Model Assumptions and Diagnostics (done)

predicted.probs <- lgrm$fitted.values
predicted.classes <- ifelse(predicted.probs > 0.8, 1, 0)

confusion <- table(
  Predicted = factor(predicted.classes, levels = c(0, 1)),
  Actual = factor(data05$gender, levels = c(0, 1))
)

confusion
##          Actual
## Predicted   0   1
##         0 254  23
##         1   6 224

The model correctly guessed 254 women to be women and 224 men to be men. It incorrectly guessed that 23 men were women, and that 6 women were men. I used a probability threshold of 0.8 not because this is a crucial test but because I wanted to increase precision.

library(pROC)
## Warning: package 'pROC' was built under R version 4.4.3
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var
roc1 <- roc(response = data05$gender, predictor=lgrm$fitted.values,levels=c("0","1"),direction="<")

auc1 <- auc(roc1)
auc1
## Area under the curve: 0.9947
plot.roc(roc1, print.auc=TRUE,xlab="False Positive Rate",ylab="True Positive Rate")

TP <- 224
TN <- 254
FP <- 6
FN <- 23

accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)
specificity <- TN / (TN + FP)
precision <- TP / (TP + FP)

cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))
## Accuracy: 0.943 
## Sensitivity: 0.907 
## Specificity: 0.977 
## Precision: 0.974

The AUC value of 0.995 indicates that this model is very good at predicting gender. The other performance metrics are excellent. The positive predictive value (precision) is very high at 97.4%, while the true positive rate (sensitivity) is also high at 90.7%, which means that the model is very confident in its positive predictions. 94.3% accuracy means this is a good model overall.

Conclusion and Future Directions

The logistic regression analysis model found that increased wrist diameter, ankle diameter and waist girth are strong indicators that a person is male while increased hip girth indicates a person is female. Knee diameter was noted to be a very bad indicator with no statistically significant value. While there are certainly more conclusive and efficient ways to ascertain gender, thresholds determined from this form of analysis could be used as part of training a machine learning model to predict gender based on an image of a person and/or certain body measurements.

The model was an absolute success, with very high performance metrics. It is limited by the 5 possible predictors that it was given. It is possible or even likely that there are other body measurements that are even stronger indicators of gender. Future research could pursue this avenue. Another direction of future research could focus on obese people, and identifying what body measurements are the best indicators of gender regardless of or considering body type.

References

“Body measurements of 507 physically active individuals.” OpenIntro. Originally published by Heinz, G et al. in “Exploring Relationships in Body Dimensions.”, Journal of Statistics Education 11(2). Retrieved from https://www.openintro.org/data/index.php?data=bdims on December 5,2025.