Using Statistical Learning to Construct Data Defined Size Charts for Stock Management Purposes

Technical Report

Author

John Spragg PhD CMILT, Retired Citizen Scientist jspragg@talktalk.net

Published

June 8, 2025

Abstract

The data workflow described in this technical report offers a predictive model for the construction of garment size charts using statistical learning techniques¹ (James et al. 2013). This is in contrast to previous researchers who have employed descriptive and exploratory statistics to summarize various body measurements in support of better garment fit. The garment size charts constructed by this research are designed in support of better stock management decisions rather then garment comfort. The size chart measurements identified by the statistical models were identified by applying regression models to 800 female subjects; whose measurements were obtained by computer vision technology. The research discovered that the regression models are sensitive to body shape and that more accurate body measurements are predicted when the body measurement data set of subjects has been divided into body shape subgroups. All models were developed in the statistical programming language R (see Section 3.2.1 ).

Introduction

This research report employs the BodyM set of body measurements (Ruiz et al. 2022), hosted on Amazon Web Services (AWS)² , to construct a set of Size Charts that exploit body shape metrics to produce size charts for inventory management purposes. The BodyM data sets are derived from scanned images of real people . The large public body measurement data set includes 8,978 frontal and lateral silhouettes for 2,505 real subjects, paired with height, weight and 14 body measurements. The data set includes Height, Weight, Gender, and 14 body measurements in cm: ankle girth, arm-length, bicep girth, calf girth, chest girth, forearm girth, height, hip girth, leg-length, shoulder-breadth, shoulder-to-crotch length, thigh girth, waist girth, and wrist girth. The ethnicity distribution of BodyM is: White 40%, Asian 30%, Black/African American 14%, American Indian or Alaska Native 1%, Other 15%; with 15% of the individuals also indicating Hispanic.

The BodyM data set is available from AWS. The data sets are split into 3 collections: Training, Test Set A, Test Set B. For the training and Test-A sets, used in this study, subjects are photographed and 3D-scanned in a lab by technicians.

For the current study we extracted 800 female subjects from the data set with their height, weight, bust girth, hip girth, and waist girth measurements. to construct a set of garment size charts. How well these size charts mapped with the female population was then estimated to support retail inventory management operations.

The study initially explored the mathematical relationship between subjects’ Height, Bust Girth, Hip Girth, and Waist Girth measurements Boslaugh and Watters (2008) .

BodyM Data Exploration

Table 1

   BustGirth         HipGirth       WaistGirth    
 Min.   : 77.00   Min.   : 80.5   Min.   : 64.00  
 1st Qu.: 88.50   1st Qu.: 94.0   1st Qu.: 76.00  
 Median : 94.00   Median : 99.0   Median : 82.00  
 Mean   : 97.16   Mean   :101.4   Mean   : 84.88  
 3rd Qu.:102.62   3rd Qu.:106.1   3rd Qu.: 90.50  
 Max.   :154.00   Max.   :158.5   Max.   :147.00

   Weight_kg        Height_cm    
 Min.   : 28.50   Min.   :141.0  
 1st Qu.: 56.00   1st Qu.:159.0  
 Median : 63.00   Median :163.0  
 Mean   : 66.65   Mean   :163.3  
 3rd Qu.: 73.00   3rd Qu.:167.6  
 Max.   :156.50   Max.   :186.5

The question to be addressed: Is there a linear relationship between Height, Bust Girth, Hip Girth, and Waist Girth measurements that can be exploited to produce garment size charts that will predict consumer demand for each size of garment?

Figure 1 suggests there is a clear linear relationships between bust girth, hip girth and waist girth measurements that promise the existence of predictive models. Furthermore, these linear relationships become stronger once the subjects in the data set have been mapped to a body shape classification. The garment industry recognize 6 different body shape classifications: Rectangle, Hourglass, Top Hourglass, Bottom Hourglass, Triangle, and Inverted Triangle. These classifications are self-explanatory and we will not spend space discussing their rational for garment design and body image research. This has been discussed in (Chrimes et al. 2023), and (Webster, Cornolo, and Kelkel 2012).

BodyM Body Shape Distribution

Table 2: Body Shape Percentages

Body Shapes	N	%
Rectangle	670	83.75
Bottom Hourglass	81	10.125
Triangle	22	2.75
Inverted Triangle	11	1.375
Hourglass	9	1.125
Top Hourglass	7	0.875

The 3D scatter plot, Figure 2, illustrates the increased coverage once regression models have been applied to body shape groups rather than the aggregated data set of subjects. The regression lines, once decoupled, fan out to intersect the majority of data points.

Figure 2: 3D Scatter Plot with linear regression plots for each body shape.

However, these plots do not explain the impact of subject height on body shape.

An ANOVA statistic (Boslaugh and Watters 2008) provide insight.

Table 3: ANOVA Model Summary

term	df	sumsq	meansq	statistic	p.value
BodyShape	5	1313.647	262.72938	6.048538	1.66e-05
Residuals	794	34488.853	43.43684	NA	NA

The high value p-value show in table Table 3 indicates there is a statistically significant connection between body shape and height. We therefore need to further divide the data set into distinct height categories. Figure 3 allows us to observe height distributions according to body shape.

Figure 3: Height differences across body shapes

The thick line in the middle of the box is the median value. The box itself shows the first and third quartiles. The whiskers extending from the bottom and top of the box identify the range of the data excluding outliers. The dots show the outliers.

The rectangle body shape has some significant outliers that would distort conclusions drawn from a garment size chart that included the heights of all subjects.

To keep analysis relatively simple we will use the percentile ranking of each body shape group to further separate body shapes into Short, Regular, and Tall classifications. Note, that these rankings are unique to each body shape. So for Rectangle, Short will be in the interval minimum to 1st quantile. [141cm, 159cm], Regular in the interval 1st quantile to 3rd quantile [160cm, 168cm], and Tall in the interval 3rd quantile to maximum [169cm, 187cm]. All body shapes will be classified according to the percentile ranking of their height measurements.

Rectangle Height Classification

Figure 4: Retangle Distribution of Hight

Hourglass Height Classification

Figure 5: Hourglass Distribution of Hight

Top Hourglass Height Classification

Figure 6: TopHourglass Distribution of Hight

Bottom Hourglass Height Classification

Figure 7: BottomHourglass Distribution of Hight

Triangle Height Classification

Figure 8: Triangle Distribution of Hight

Inverted Triangle Height Classification

Figure 9: InvertedTriangle Distribution of Hight

This now allows us to compute what percentage each body shape/height pairings contribute to the data set .

Table 4: Percent of Subjects by BodyShape and Height

BodyShape	Length	Count	Percentage
Rectangle	Regular	377	56.5
Rectangle	Short	157	23.5
Rectangle	Tall	136	20.5
BottomHourglass	Regular	42	52.0
BottomHourglass	Short	31	38.5
Triangle	Tall	11	50.0
Hourglass	Regular	9	100.0
BottomHourglass	Tall	8	10.0
Triangle	Regular	8	36.5
InvertedTriangle	Regular	5	45.5
InvertedTriangle	Short	5	45.5
TopHourglass	Regular	4	57.0
TopHourglass	Tall	3	43.0
Triangle	Short	3	13.5
InvertedTriangle	Tall	1	9.0

The Regular sized Rectangle body shape contributes 56% of the data set. While the Tall Inverted Triangle has only a single member. The Inverted Triangle is clearly under represented in the data set and will not therefore be used for predictive models.

Linear Modelling

Linear regression models are generated for each set of subjects defined by body shape and height. The waist girth measurement will predict the hip girth measurement, and the waist girth measurement and the hip girth measurements combined will predict the bust girth measurement. The resulting coefficients and statistics for body shape and height groups are shown in the following tables.

The critical regression statistics that decide if the research is worthy are the Residuals, R² , and the F-statistic. The coefficients will eventually provide the predictive model but now we only need to verify the approach.

The residual standard error reports the standard error of the residuals. The residuals provide important information about the distribution of subjects around the regression model. The residuals should be normally distributed. Any deviation from normality indicates outliers that put in doubt the validity of the model.

Regular Rectangle (n = 377)

Table 5


=========================================================================
                                     Dependent variable:                 
                    -----------------------------------------------------
                             HipGirth                  BustGirth         
                               (1)                        (2)            
-------------------------------------------------------------------------
HipGirth                                                0.245***         
                                                        (0.054)          
                                                                         
WaistGirth                   0.751***                   0.693***         
                             (0.016)                    (0.044)          
                                                                         
Constant                    37.117***                  13.636***         
                             (1.381)                    (2.491)          
                                                                         
-------------------------------------------------------------------------
Observations                   377                        377            
R2                            0.853                      0.878           
Adjusted R2                   0.852                      0.877           
Residual Std. Error      3.694 (df = 375)           3.895 (df = 374)     
F Statistic         2,168.631*** (df = 1; 375) 1,341.228*** (df = 2; 374)
=========================================================================
Note:                                         *p<0.1; **p<0.05; ***p<0.01

Figure 10 illustrates the importance of residuals. The graph suggests that for regular rectangle body shapes 50% of subjects are within 2.5 cm of the model’s prediction.

R² (coefficient of determination) and adjusted R² are a measure of the model’s quality. For Rectangle body shapes 85% of the lm(Hip + Waist) relationship can be explained by the model, and 89% the lm(Bust ~ Hip * Waist) relationship can be explained by the model.

The F-statistic tells us whether the models are significant or insignificant. The model is significant if any of the coefficients are non-zero. It is insignificant if all coefficients are zero. The F-statistic tells us if the number crunching is worth continuing to the next stage in the workflow of utilizing the models in size chart generation.

The p-value of p<0.01 indicates high statistical significance for the model.

The validity of the models will be clear when we employ the models to generate size charts and calculate the percentage of the sample population that each size chart entry covers.

Short Rectangle (n = 157)

Table 6


=====================================================================
                                   Dependent variable:               
                    -------------------------------------------------
                            HipGirth                BustGirth        
                              (1)                      (2)           
---------------------------------------------------------------------
HipGirth                                             0.261***        
                                                     (0.061)         
                                                                     
WaistGirth                  0.788***                 0.652***        
                            (0.028)                  (0.053)         
                                                                     
Constant                   33.151***                16.065***        
                            (2.491)                  (2.784)         
                                                                     
---------------------------------------------------------------------
Observations                  157                      157           
R2                           0.832                    0.911          
Adjusted R2                  0.831                    0.910          
Residual Std. Error     4.906 (df = 155)         3.746 (df = 154)    
F Statistic         765.842*** (df = 1; 155) 787.911*** (df = 2; 154)
=====================================================================
Note:                                     *p<0.1; **p<0.05; ***p<0.01

Tall Rectangle (n = 136)

Table 7


=======================================================================
                                    Dependent variable:                
                    ---------------------------------------------------
                             HipGirth                 BustGirth        
                               (1)                       (2)           
-----------------------------------------------------------------------
HipGirth                                               0.266***        
                                                       (0.088)         
                                                                       
WaistGirth                   0.855***                  0.634***        
                             (0.025)                   (0.079)         
                                                                       
Constant                    29.904***                 16.147***        
                             (2.259)                   (3.478)         
                                                                       
-----------------------------------------------------------------------
Observations                   136                       136           
R2                            0.894                     0.894          
Adjusted R2                   0.893                     0.892          
Residual Std. Error      3.578 (df = 134)          3.627 (df = 133)    
F Statistic         1,125.440*** (df = 1; 134) 560.652*** (df = 2; 133)
=======================================================================
Note:                                       *p<0.1; **p<0.05; ***p<0.01

Regular Bottom Hourglass (n = 42)

Table 8


===================================================================
                                  Dependent variable:              
                    -----------------------------------------------
                           HipGirth                BustGirth       
                              (1)                     (2)          
-------------------------------------------------------------------
HipGirth                                             0.039         
                                                    (0.127)        
                                                                   
WaistGirth                 1.289***                0.888***        
                            (0.070)                 (0.173)        
                                                                   
Constant                     0.398                 16.291***       
                            (5.192)                 (4.160)        
                                                                   
-------------------------------------------------------------------
Observations                  42                      42           
R2                           0.893                   0.876         
Adjusted R2                  0.891                   0.870         
Residual Std. Error     2.722 (df = 40)         2.180 (df = 39)    
F Statistic         334.506*** (df = 1; 40) 138.364*** (df = 2; 39)
===================================================================
Note:                                   *p<0.1; **p<0.05; ***p<0.01

Short Bottom Hourglass (n = 31)

Table 9


=================================================================
                                 Dependent variable:             
                    ---------------------------------------------
                           HipGirth              BustGirth       
                             (1)                    (2)          
-----------------------------------------------------------------
HipGirth                                          0.442***       
                                                  (0.086)        
                                                                 
WaistGirth                 1.254***                0.101         
                           (0.138)                (0.125)        
                                                                 
Constant                    2.044                35.491***       
                           (9.747)                (4.498)        
                                                                 
-----------------------------------------------------------------
Observations                  31                     31          
R2                          0.740                  0.826         
Adjusted R2                 0.731                  0.813         
Residual Std. Error    3.247 (df = 29)        1.497 (df = 28)    
F Statistic         82.382*** (df = 1; 29) 66.234*** (df = 2; 28)
=================================================================
Note:                                 *p<0.1; **p<0.05; ***p<0.01

Tall Triangle (n = 11)

Table 10


===============================================================
                                Dependent variable:            
                    -------------------------------------------
                          HipGirth              BustGirth      
                             (1)                   (2)         
---------------------------------------------------------------
HipGirth                                         0.571*        
                                                 (0.280)       
                                                               
WaistGirth                0.971***                0.436        
                           (0.137)               (0.296)       
                                                               
Constant                   23.596*               -3.835        
                          (11.784)              (11.922)       
                                                               
---------------------------------------------------------------
Observations                 11                    11          
R2                          0.848                 0.907        
Adjusted R2                 0.831                 0.884        
Residual Std. Error    2.325 (df = 9)        1.957 (df = 8)    
F Statistic         50.274*** (df = 1; 9) 39.054*** (df = 2; 8)
===============================================================
Note:                               *p<0.1; **p<0.05; ***p<0.01

Regular Hourglass (n = 9)

Table 11


=================================================================
                                 Dependent variable:             
                    ---------------------------------------------
                           HipGirth              BustGirth       
                             (1)                    (2)          
-----------------------------------------------------------------
HipGirth                                           0.520         
                                                  (0.380)        
                                                                 
WaistGirth                 0.844***                0.553         
                           (0.069)                (0.328)        
                                                                 
Constant                  35.679***                -0.375        
                           (5.849)                (14.786)       
                                                                 
-----------------------------------------------------------------
Observations                  9                      9           
R2                          0.955                  0.972         
Adjusted R2                 0.949                  0.962         
Residual Std. Error     0.918 (df = 7)         0.923 (df = 6)    
F Statistic         149.540*** (df = 1; 7) 103.196*** (df = 2; 6)
=================================================================
Note:                                 *p<0.1; **p<0.05; ***p<0.01

Tall Bottom Hourglass (n = 8)

Table 12


==============================================================
                               Dependent variable:            
                    ------------------------------------------
                          HipGirth             BustGirth      
                             (1)                  (2)         
--------------------------------------------------------------
HipGirth                                         1.091*       
                                                (0.445)       
                                                              
WaistGirth                0.879***               -0.409       
                           (0.127)              (0.415)       
                                                              
Constant                  32.238**               11.799       
                          (10.334)              (18.260)      
                                                              
--------------------------------------------------------------
Observations                  8                    8          
R2                          0.889                0.814        
Adjusted R2                 0.871                0.739        
Residual Std. Error    2.164 (df = 6)        2.361 (df = 5)   
F Statistic         48.152*** (df = 1; 6) 10.907** (df = 2; 5)
==============================================================
Note:                              *p<0.1; **p<0.05; ***p<0.01

Regular Triangle (n = 8)

Table 13


===============================================================
                                Dependent variable:            
                    -------------------------------------------
                           HipGirth             BustGirth      
                             (1)                   (2)         
---------------------------------------------------------------
HipGirth                                          1.038        
                                                 (0.755)       
                                                               
WaistGirth                 1.331***               -0.391       
                           (0.114)               (1.026)       
                                                               
Constant                    -8.733                18.216       
                           (10.028)              (19.672)      
                                                               
---------------------------------------------------------------
Observations                  8                     8          
R2                          0.958                 0.828        
Adjusted R2                 0.951                 0.760        
Residual Std. Error     1.766 (df = 6)        3.265 (df = 5)   
F Statistic         137.359*** (df = 1; 6) 12.076** (df = 2; 5)
===============================================================
Note:                               *p<0.1; **p<0.05; ***p<0.01

Regular Inverted Triangle (n = 5)

Table 14


==========================================================
                             Dependent variable:          
                    --------------------------------------
                         HipGirth           BustGirth     
                           (1)                 (2)        
----------------------------------------------------------
HipGirth                                      0.128       
                                             (0.295)      
                                                          
WaistGirth                0.932*              0.892       
                         (0.336)             (0.324)      
                                                          
Constant                  11.267             15.389       
                         (32.990)           (17.198)      
                                                          
----------------------------------------------------------
Observations                5                   5         
R2                        0.720               0.946       
Adjusted R2               0.627               0.892       
Residual Std. Error   5.754 (df = 3)     2.943 (df = 2)   
F Statistic         7.712* (df = 1; 3) 17.470* (df = 2; 2)
==========================================================
Note:                          *p<0.1; **p<0.05; ***p<0.01

Short Inverted Triangle (n = 5)

Table 15


=============================================================
                               Dependent variable:           
                    -----------------------------------------
                          HipGirth            BustGirth      
                            (1)                  (2)         
-------------------------------------------------------------
HipGirth                                        0.276        
                                               (0.608)       
                                                             
WaistGirth                0.665**               0.713        
                          (0.138)              (0.430)       
                                                             
Constant                  38.761*               16.767       
                          (12.298)             (26.910)      
                                                             
-------------------------------------------------------------
Observations                 5                    5          
R2                         0.886                0.950        
Adjusted R2                0.848                0.901        
Residual Std. Error    2.394 (df = 3)       2.522 (df = 2)   
F Statistic         23.303** (df = 1; 3) 19.132** (df = 2; 2)
=============================================================
Note:                             *p<0.1; **p<0.05; ***p<0.01

Regular Top Hourglass (n = 4)

Table 16


=======================================================
                            Dependent variable:        
                    -----------------------------------
                        HipGirth          BustGirth    
                           (1)               (2)       
-------------------------------------------------------
HipGirth                                    1.109      
                                           (0.369)     
                                                       
WaistGirth                0.194            -0.202      
                         (0.158)           (0.109)     
                                                       
Constant                105.073**          12.556      
                        (18.516)          (39.965)     
                                                       
-------------------------------------------------------
Observations                4                 4        
R2                        0.432             0.900      
Adjusted R2               0.148             0.701      
Residual Std. Error  2.394 (df = 2)    1.249 (df = 1)  
F Statistic         1.523 (df = 1; 2) 4.525 (df = 2; 1)
=======================================================
Note:                       *p<0.1; **p<0.05; ***p<0.01

Tall Top Hourglass (n = 3)

Table 17


=================================================
                         Dependent variable:     
                    -----------------------------
                         HipGirth       BustGirth
                            (1)            (2)   
-------------------------------------------------
HipGirth                                  0.407  
                                                 
                                                 
WaistGirth                0.765*          0.491  
                          (0.096)                
                                                 
Constant                  38.052         24.930  
                         (10.027)                
                                                 
-------------------------------------------------
Observations                 3              3    
R2                         0.985          1.000  
Adjusted R2                0.969                 
Residual Std. Error   4.022 (df = 1)             
F Statistic         63.587* (df = 1; 1)          
=================================================
Note:                 *p<0.1; **p<0.05; ***p<0.01

Short Triangle (n = 3)

Table 18


================================================
                        Dependent variable:     
                    ----------------------------
                         HipGirth      BustGirth
                           (1)            (2)   
------------------------------------------------
HipGirth                                 0.507  
                                                
                                                
WaistGirth                0.885          0.567  
                         (0.634)                
                                                
Constant                  30.087        -8.328  
                         (53.940)               
                                                
------------------------------------------------
Observations                3              3    
R2                        0.661          1.000  
Adjusted R2               0.322                 
Residual Std. Error   3.502 (df = 1)            
F Statistic         1.949 (df = 1; 1)           
================================================
Note:                *p<0.1; **p<0.05; ***p<0.01

Tall Inverted Triangle (n = 1)

Table 19


=========================================
                 Dependent variable:     
             ----------------------------
                HipGirth      BustGirth  
                  (1)            (2)     
-----------------------------------------
HipGirth                                 
                                         
                                         
WaistGirth                               
                                         
                                         
Constant        119.500        128.000   
                                         
                                         
-----------------------------------------
Observations       1              1      
R2               0.000          0.000    
Adjusted R2      0.000          0.000    
=========================================
Note:         *p<0.1; **p<0.05; ***p<0.01

The presentation of the linear regression models has been labored but the validity of these model is critical to the construction of size charts.

The number of subjects in each body shape/height classification is critical to linear prediction models. Once the number of subjects drops below, say 10, we need to question the usefulness of the model. In that it is problematic whether these small sets of subjects are representative of the wider population.

In a definitive anthropometric study for the U.S.A.F Gilbert Daniels observed: “The ‘average [person]’ is a misleading and illusory concept as a basis for design criteria, and it is particularly so when more than one dimension is being considered” Daniels (1952).

While this observation provides insight for garment designers we stress this is not the focus of this study. For this study, the goal is to anticipate the proportion of the consumer population that meet the measurements predicted by the charts.

Size Chart Construction

Size charts are constructed for each body shape / height group. Size charts are defined by the linear regression model discussed in Section 3. The machinery for producing the size charts employs a workflow model made possible by POSIT’s³ tidymodels framework (Kuhn and Wickham 2020).

sizeChartWorkflow <- function(bshape, ht, d) {
  # Extract the bodyshape and height for
  # size chart prediction
  df <- working_female_df %>% 
  select(BustGirth, HipGirth, WaistGirth, BodyShape, Length) %>% 
  filter(BodyShape == bshape & Length == ht)
  
  hip_recipe <- 
    recipe(HipGirth ~ WaistGirth, data = df)
  # Define a linear regression model
  hip_model <- linear_reg() %>%
    set_engine("lm")
  # Create a workflow
  hip_workflow <- workflow() %>%
    add_recipe(hip_recipe) %>%
    add_model(hip_model)
  # Fit the model
  hip_fit <- fit(hip_workflow, 
                 data = df)
  # New WaistGirth measurements
  sequence <- 
    round_half_up(
      seq(from = min(df$WaistGirth),
          to = max(df$WaistGirth), length.out = d))
  # Seed size chart with d Waist girths divisions
  sizeChart <- data.frame(WaistGirth = sequence)
  # Predict HipGirth
  predicted_hip <- predict(hip_fit, sizeChart)
  sizeChart$HipGirth <- predicted_hip$.pred
  # Recipe for predicting BustGirth from 
  # WaistGirth and predicted HipGirth
  bust_recipe <- 
    recipe(BustGirth ~ WaistGirth + HipGirth, 
           data = df)
  # Create a workflow
  bust_workflow <- workflow() %>%
    add_recipe(bust_recipe) %>%
    add_model(hip_model)
  # Fit the model
  bust_fit <- fit(bust_workflow, data = df)
  # Predict BustGirth using new WaistGirth and predicted HipGirth
  final_predictions <- predict(bust_fit, sizeChart)
  sizeChart$BustGirth <- final_predictions$.pred
  
  # sort size chart
  # Display the final predictions
  # Sort by WaistGirth
  sorted_sizeChart <- sizeChart %>%
  arrange(WaistGirth) %>% 
  mutate_at(vars(BustGirth, 
                 HipGirth, 
                 WaistGirth), round_half_up)

  percentages_df <- 
    calculate_percentage(sorted_sizeChart, df) %>% 
    mutate_at(vars(Combined_Percentage), round_half_up)

  percentages_df$Combined_Percentage <- 
    paste0(percentages_df$Combined_Percentage, "%")
  Percent <- c(percentages_df$Combined_Percentage, " ")
  sorted_sizeChart <- cbind(sorted_sizeChart, Percent)
  sorted_sizeChart$WaistGirth <- 
    paste0(sorted_sizeChart$WaistGirth, " cm ")
  sorted_sizeChart$HipGirth <-
    paste0(sorted_sizeChart$HipGirth, " cm ")
  sorted_sizeChart$BustGirth <- 
    paste0(sorted_sizeChart$BustGirth, " cm ")
  
  return(sorted_sizeChart)
}

Table 20: Size Chart Regular Rectangle

WaistGirth	HipGirth	BustGirth	Percent
67 cm	87.5 cm	81.5 cm	42%
82.5 cm	99 cm	95 cm	19%
98 cm	110.5 cm	108.5 cm	5%
114 cm	122.5 cm	122.5 cm	1%
129.5 cm	134.5 cm	136.5 cm

Table 21: Size Chart Short Rectangle

WaistGirth	HipGirth	BustGirth	Percent
68.5 cm	87 cm	83.5 cm	72.5%
94.5 cm	107.5 cm	106 cm	16%
121 cm	128.5 cm	128.5 cm	0.5%
147 cm	149 cm	151 cm

Table 22: Size Chart Tall Rectangle

WaistGirth	HipGirth	BustGirth	Percent
68.5 cm	88.5 cm	83 cm	25%
81.5 cm	99.5 cm	94.5 cm	23.5%
94.5 cm	110.5 cm	105.5 cm	8%
107.5 cm	122 cm	117 cm	3%
120.5 cm	133 cm	128 cm	0.5%
133.5 cm	144 cm	139 cm

Table 23: Size Chart Regular BottomHourglass

WaistGirth	HipGirth	BustGirth	Percent
64.5 cm	83.5 cm	77 cm	31%
70.5 cm	91 cm	82.5 cm	9.5%
76 cm	98.5 cm	87.5 cm	7%
82 cm	106 cm	93.5 cm	7%
88 cm	114 cm	99 cm

Table 24: Size Chart Short BottomHourglass

WaistGirth	HipGirth	BustGirth	Percent
64 cm	82.5 cm	78.5 cm	64.5%
71.5 cm	91.5 cm	83 cm	9.5%
79 cm	101 cm	88 cm	3%
86.5 cm	110.5 cm	93 cm

Table 25: Size Chart Tall Triangle

WaistGirth	HipGirth	BustGirth	Percent
80 cm	101.5 cm	89 cm	54.5%
89 cm	110 cm	98 cm	18%
97.5 cm	118.5 cm	106.5 cm

Table 26: Size Chart Regular Hourglass

WaistGirth	HipGirth	BustGirth	Percent
76.5 cm	100 cm	94 cm	22%
82.5 cm	105.5 cm	100 cm	44.5%
88.5 cm	110.5 cm	106 cm

Table 27: Size Chart Tall BottomHourglass

WaistGirth	HipGirth	BustGirth	Percent
73 cm	96.5 cm	87 cm	25%
81 cm	103.5 cm	91.5 cm	25%
88.5 cm	110 cm	95.5 cm

Table 28: Size Chart Regular Triangle

WaistGirth	HipGirth	BustGirth	Percent
81.5 cm	99.5 cm	90 cm	50%
86.5 cm	106.5 cm	95 cm	12.5%
91 cm	112.5 cm	99.5 cm	12.5%
96 cm	119 cm	104 cm

Table 29: Size Chart Regular InvertedTriangle

WaistGirth	HipGirth	BustGirth	Percent
90.5 cm	95.5 cm	108.5 cm	80%
112.5 cm	116 cm	130.5 cm

Table 30: Size Chart Short InvertedTriangle

WaistGirth	HipGirth	BustGirth	Percent
74.5 cm	88.5 cm	94 cm	80%
97 cm	103.5 cm	114.5 cm

Table 31: Size Chart Tall TopHourglass

WaistGirth	HipGirth	BustGirth	Percent
67.5 cm	89.5 cm	94.5 cm	33.5%
94 cm	110 cm	116 cm	33.5%
120.5 cm	130 cm	137 cm

Table 32: Size Chart Short Triangle

WaistGirth	HipGirth	BustGirth	Percent
82.5 cm	103 cm	91 cm	33.5%
86 cm	106 cm	94.5 cm	33.5%
89.5 cm	109.5 cm	98 cm

From an inventory management perspective, size charts resemble a kind of sieve. The Rectangle sieve screens 83.75% of female consumers who are then sifted according to height: 56.5% between 159cm and 168cm, The sieving will further predict 42% of those consumers to have waist measurements [67cm, 82.5cm], with hip measurements [87.5 cm, 99cm],and bust measurements [81.5cm, 95cm]. The charts are therefore part of the assortment planning process (Gurhan Kok, Fisher, and Vaidyanathan 2009) undertaken by retail operation managers. Clearly, this operations management perspective is not the way garment designers view size charts.

Alternative Approaches

Other researchers have explored statistical models to develop size charts. Both Gupta and Gangadhar (2004) and Otieno (2008) employ statistical techniques. Interestingly the statistical models they employ are exploratory rather than predictive. Gupta and Gangadhar (2004), for example, used Principal Component Analysis (PCA) to reduce the relationships between the various body measurements to identify the key measurements: Waist, Hip, and Bust. The resulting size charts where then validated by employing a nearest neighbor heuristic method that minimized the euclidean distance between the assigned measurements allocated with the actual measurements given in the data. The method used, aggregate loss, appears to be a version of the kmeans clustering algorithm (Everitt and Hothorn 2011). However, it is not clear from the aggregate loss formula Equation 1 if the hip, waist and bust variables are treated as a structured record with their relationship tied by a constant coefficient or as unary values that are allowed to wander independently in 3D vector space.

\[ \begin{align*} \small \text{Aggregate loss} &= \small\sqrt{\frac{\sum \left( (\text{assigned bust} - \text{actual bust})^2 + (\text{assigned hip} - \text{actual hip})^2 + (\text{assigned waist} - \text{actual waist})^2 \right)}{\text{N}}} \end{align*} \tag{1}\]

Inspired by Gupta and Gangadhar (2004) we applied PCA to the BodyM data which confirmed the decisions made in our analysis workflow. In particular grouping data into height categories.

The height vector is orthogonal, with other measurements that positively correlate with height, to the waist, hip and chest vectors. The PCA biplot indicates correlations and illustrates the structure within a data set. The direction, length, and angular distance between vectors are the key indicators of the relationship between variables.

It should be recognized that the major obstacle to the establishment of predictable models for anthropometric research is the small data sets of body measurements that are available. Collecting human body measurements, either manually, or via computer vision technology, is an expensive, time consuming activity. The volunteers are typically self selecting. Not everybody wants to be measured. Measurements taken from human random samples are clearly not homogeneous even before girth measurements are analysed. The sample needs to be pre-processed according to gender, age, height, body shape, and maybe even ethnic origin. The BodyM data set contained 2,779 individuals, 800 of which were females used for this research report. From that 800 only one individual represented the Tall Inverted Triangle body shape.

The 3D graph Figure 2 illustrates the importance of predicting size measurements from body shape classification rather than from unprocessed sampled data.

The clustering methods typically employed by researchers in this field are based on nearest neighbor heuristics. They do not calculate the probability that a subject belongs to a certain classification. Once large data sets become available model-based clustering methods (Scrucca et al. 2023) may identify subtle subgroups that would enrich regression analysis.

The approach undertaken here developed a data analysis workflow that is extendable. Additional body measurements can be feed into the regression models to enrich the quality of charts and allow further validation of the models.

Acknowledgements

The statistical analyses and visualizations in this document were performed using R, a free software environment for statistical computing and graphics. R is available from the R Foundation for Statistical Computing at https://www.r-project.org.

R packages used.

R base (2024)
Broom (Robinson, Hayes, and Couch 2024)
Development tools (Wickham et al. 2022)
ggfortify (Tang, Horikoshi, and Li 2016)
ggplot2 (Wickham 2016)
kableExtra (Zhu 2024)
knitr (Xie 2024)
tidyverse (Wickham et al. 2019)
tidymodels (Kuhn and Wickham 2020)

This document was created using Quarto, an open-source scientific and technical publishing system. Quarto provides tools for creating dynamic documents that can include text, code, and output. For more information about Quarto, visit quarto.org.

References

Boslaugh, Sarah, and Paul Andrew Watters. 2008. “Statistics in a Nutshell.” In, 5563. O’Reilly.

Chrimes, C., R. Boardman, H. McCormick, and G. Vignali. 2023. “Investigating the Impact of Body Shape on Garment Fit.” Journal of Fashion Marketing and Management 27 (5): 741–59.

Daniels, Gilbert. 1952. “The Average Man?”

Everitt, and Torsten Hothorn. 2011. “Cluster Analysis.” In, 163–200. Use r! Springer.

Gupta, D., and B. R. Gangadhar. 2004. “A Statistical Model for Developing Body Size Charts for Garments.” International Journal of Clothing Science and Technology 16 (5): 458–69.

Gurhan Kok, A, Marchall Fisher, and Ramnath Vaidyanathan. 2009. “Assortment Planning: Review of Literature and Industry Practice.” In. International Series in Operations Research and Management Science. Springer.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in r. First. Springer Texts in Statistics.

Kuhn, Max, and Hadley Wickham. 2020. “Tidymodels: A Collection of Packages for Modeling and Machine Learning Using Tidyverse Principles.” https://www.tidymodels.org.

Otieno, Rose. 2008. “Approaches in Researching Human Measurement: MMU Model of Utilising Anthropometric Data to Create Size Charts.” EuroMed Journal of Business 3 (May): 63–82. https://doi.org/10.1108/14502190810873821.

R Core Team. 2024. “R: A Language and Environment for Statistical Computing.” https://www.R-project.org/.

Robinson, David, Alex Hayes, and Simon Couch. 2024. “Broom: Convert Statistical Objects into Tidy Tibbles.” https://CRAN.R-project.org/package=broom.

Ruiz, Nataniel, Miriam Bellver Bueno, Timo Bolkart, Ambuj Arora, Ming Lin, Javier Romero, and Raja Bala. 2022. “Human Body Measurement Estimation with Adversarial Augmentation.” In. https://www.amazon.science/publications/human-body-measurement-estimation-with-adversarial-augmentation.

Scrucca, L, C Fraley, T. B. Murphy, and A. E. Raftery. 2023. Model-Based Clustering and Classification for Data Science with Applications in r. Chapman; Hall/CRC Press.

Tang, Yuan, Masaaki Horikoshi, and Wenxuan Li. 2016. “Ggfortify: Unified Interface to Visualize Statistical Result of Popular r Packages” 8. https://doi.org/10.32614/RJ-2016-060.

Webster, James M., Jeremy Cornolo, and Yohann Kelkel. 2012. “Comparison of Female Shape Analysis Methods for the Development of a New Sizing System.” In. Lugano, Switzerland: https://doi.org/10.15221/12.280.

Wickham, Hadley. 2016. “Ggplot2: Elegant Graphics for Data Analysis.” https://ggplot2.tidyverse.org.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse” 4: 1686. https://doi.org/10.21105/joss.01686.

Wickham, Hadley, Jim Hester, Winston Chang, and Jennifer Bryan. 2022. “Devtools: Tools to Make Developing r Packages Easier.” https://CRAN.R-project.org/package=devtools.

Xie, Yihui. 2024. “Knitr: A General-Purpose Package for Dynamic Report Generation in r.” https://yihui.org/knitr/.

Zhu, Hao. 2024. “kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax.” https://CRAN.R-project.org/package=kableExtra.

Footnotes

Statistical learning is also called machine learning.↩︎
BodyM Dataset was accessed on 17/11/2024from https://registry.opendata.aws/bodym↩︎
https://posit.co/↩︎