Using Statistical Learning to Construct Data Defined Size Charts for Stock Management Purposes
Technical Report
Abstract
The data workflow described in this technical report offers a predictive model for the construction of garment size charts using statistical learning techniques1 (James et al. 2013). This is in contrast to previous researchers who have employed descriptive and exploratory statistics to summarize various body measurements in support of better garment fit. The garment size charts constructed by this research are designed in support of better stock management decisions rather then garment comfort. The size chart measurements identified by the statistical models were identified by applying regression models to 800 female subjects; whose measurements were obtained by computer vision technology. The research discovered that the regression models are sensitive to body shape and that more accurate body measurements are predicted when the body measurement data set of subjects has been divided into body shape subgroups. All models were developed in the statistical programming language R (see Section 3.2.1 ).
Introduction
This research report employs the BodyM set of body measurements (Ruiz et al. 2022), hosted on Amazon Web Services (AWS)2 , to construct a set of Size Charts that exploit body shape metrics to produce size charts for inventory management purposes. The BodyM data sets are derived from scanned images of real people . The large public body measurement data set includes 8,978 frontal and lateral silhouettes for 2,505 real subjects, paired with height, weight and 14 body measurements. The data set includes Height, Weight, Gender, and 14 body measurements in cm: ankle girth, arm-length, bicep girth, calf girth, chest girth, forearm girth, height, hip girth, leg-length, shoulder-breadth, shoulder-to-crotch length, thigh girth, waist girth, and wrist girth. The ethnicity distribution of BodyM is: White 40%, Asian 30%, Black/African American 14%, American Indian or Alaska Native 1%, Other 15%; with 15% of the individuals also indicating Hispanic.
The BodyM data set is available from AWS. The data sets are split into 3 collections: Training, Test Set A, Test Set B. For the training and Test-A sets, used in this study, subjects are photographed and 3D-scanned in a lab by technicians.
For the current study we extracted 800 female subjects from the data set with their height, weight, bust girth, hip girth, and waist girth measurements. to construct a set of garment size charts. How well these size charts mapped with the female population was then estimated to support retail inventory management operations.
The study initially explored the mathematical relationship between subjects’ Height, Bust Girth, Hip Girth, and Waist Girth measurements Boslaugh and Watters (2008) .
BodyM Data Exploration
BustGirth HipGirth WaistGirth
Min. : 77.00 Min. : 80.5 Min. : 64.00
1st Qu.: 88.50 1st Qu.: 94.0 1st Qu.: 76.00
Median : 94.00 Median : 99.0 Median : 82.00
Mean : 97.16 Mean :101.4 Mean : 84.88
3rd Qu.:102.62 3rd Qu.:106.1 3rd Qu.: 90.50
Max. :154.00 Max. :158.5 Max. :147.00
Weight_kg Height_cm
Min. : 28.50 Min. :141.0
1st Qu.: 56.00 1st Qu.:159.0
Median : 63.00 Median :163.0
Mean : 66.65 Mean :163.3
3rd Qu.: 73.00 3rd Qu.:167.6
Max. :156.50 Max. :186.5
The question to be addressed: Is there a linear relationship between Height, Bust Girth, Hip Girth, and Waist Girth measurements that can be exploited to produce garment size charts that will predict consumer demand for each size of garment?
Figure 1 suggests there is a clear linear relationships between bust girth, hip girth and waist girth measurements that promise the existence of predictive models. Furthermore, these linear relationships become stronger once the subjects in the data set have been mapped to a body shape classification. The garment industry recognize 6 different body shape classifications: Rectangle, Hourglass, Top Hourglass, Bottom Hourglass, Triangle, and Inverted Triangle. These classifications are self-explanatory and we will not spend space discussing their rational for garment design and body image research. This has been discussed in (Chrimes et al. 2023), and (Webster, Cornolo, and Kelkel 2012).
BodyM Body Shape Distribution
| Body Shapes | N | % |
|---|---|---|
| Rectangle | 670 | 83.75 |
| Bottom Hourglass | 81 | 10.125 |
| Triangle | 22 | 2.75 |
| Inverted Triangle | 11 | 1.375 |
| Hourglass | 9 | 1.125 |
| Top Hourglass | 7 | 0.875 |
The 3D scatter plot, Figure 2, illustrates the increased coverage once regression models have been applied to body shape groups rather than the aggregated data set of subjects. The regression lines, once decoupled, fan out to intersect the majority of data points.
However, these plots do not explain the impact of subject height on body shape.
An ANOVA statistic (Boslaugh and Watters 2008) provide insight.
| term | df | sumsq | meansq | statistic | p.value |
|---|---|---|---|---|---|
| BodyShape | 5 | 1313.647 | 262.72938 | 6.048538 | 1.66e-05 |
| Residuals | 794 | 34488.853 | 43.43684 | NA | NA |
The high value p-value show in table Table 3 indicates there is a statistically significant connection between body shape and height. We therefore need to further divide the data set into distinct height categories. Figure 3 allows us to observe height distributions according to body shape.
The thick line in the middle of the box is the median value. The box itself shows the first and third quartiles. The whiskers extending from the bottom and top of the box identify the range of the data excluding outliers. The dots show the outliers.
The rectangle body shape has some significant outliers that would distort conclusions drawn from a garment size chart that included the heights of all subjects.
To keep analysis relatively simple we will use the percentile ranking of each body shape group to further separate body shapes into Short, Regular, and Tall classifications. Note, that these rankings are unique to each body shape. So for Rectangle, Short will be in the interval minimum to 1st quantile. [141cm, 159cm], Regular in the interval 1st quantile to 3rd quantile [160cm, 168cm], and Tall in the interval 3rd quantile to maximum [169cm, 187cm]. All body shapes will be classified according to the percentile ranking of their height measurements.
Rectangle Height Classification
Hourglass Height Classification
Top Hourglass Height Classification
Bottom Hourglass Height Classification
Triangle Height Classification
Inverted Triangle Height Classification
This now allows us to compute what percentage each body shape/height pairings contribute to the data set .
| BodyShape | Length | Count | Percentage |
|---|---|---|---|
| Rectangle | Regular | 377 | 56.5 |
| Rectangle | Short | 157 | 23.5 |
| Rectangle | Tall | 136 | 20.5 |
| BottomHourglass | Regular | 42 | 52.0 |
| BottomHourglass | Short | 31 | 38.5 |
| Triangle | Tall | 11 | 50.0 |
| Hourglass | Regular | 9 | 100.0 |
| BottomHourglass | Tall | 8 | 10.0 |
| Triangle | Regular | 8 | 36.5 |
| InvertedTriangle | Regular | 5 | 45.5 |
| InvertedTriangle | Short | 5 | 45.5 |
| TopHourglass | Regular | 4 | 57.0 |
| TopHourglass | Tall | 3 | 43.0 |
| Triangle | Short | 3 | 13.5 |
| InvertedTriangle | Tall | 1 | 9.0 |
The Regular sized Rectangle body shape contributes 56% of the data set. While the Tall Inverted Triangle has only a single member. The Inverted Triangle is clearly under represented in the data set and will not therefore be used for predictive models.
Linear Modelling
Linear regression models are generated for each set of subjects defined by body shape and height. The waist girth measurement will predict the hip girth measurement, and the waist girth measurement and the hip girth measurements combined will predict the bust girth measurement. The resulting coefficients and statistics for body shape and height groups are shown in the following tables.
The critical regression statistics that decide if the research is worthy are the Residuals, R2 , and the F-statistic. The coefficients will eventually provide the predictive model but now we only need to verify the approach.
The residual standard error reports the standard error of the residuals. The residuals provide important information about the distribution of subjects around the regression model. The residuals should be normally distributed. Any deviation from normality indicates outliers that put in doubt the validity of the model.
Regular Rectangle (n = 377)
=========================================================================
Dependent variable:
-----------------------------------------------------
HipGirth BustGirth
(1) (2)
-------------------------------------------------------------------------
HipGirth 0.245***
(0.054)
WaistGirth 0.751*** 0.693***
(0.016) (0.044)
Constant 37.117*** 13.636***
(1.381) (2.491)
-------------------------------------------------------------------------
Observations 377 377
R2 0.853 0.878
Adjusted R2 0.852 0.877
Residual Std. Error 3.694 (df = 375) 3.895 (df = 374)
F Statistic 2,168.631*** (df = 1; 375) 1,341.228*** (df = 2; 374)
=========================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Figure 10 illustrates the importance of residuals. The graph suggests that for regular rectangle body shapes 50% of subjects are within 2.5 cm of the model’s prediction.
R2 (coefficient of determination) and adjusted R2 are a measure of the model’s quality. For Rectangle body shapes 85% of the lm(Hip + Waist) relationship can be explained by the model, and 89% the lm(Bust ~ Hip * Waist) relationship can be explained by the model.
The F-statistic tells us whether the models are significant or insignificant. The model is significant if any of the coefficients are non-zero. It is insignificant if all coefficients are zero. The F-statistic tells us if the number crunching is worth continuing to the next stage in the workflow of utilizing the models in size chart generation.
The p-value of p<0.01 indicates high statistical significance for the model.
The validity of the models will be clear when we employ the models to generate size charts and calculate the percentage of the sample population that each size chart entry covers.
Short Rectangle (n = 157)
=====================================================================
Dependent variable:
-------------------------------------------------
HipGirth BustGirth
(1) (2)
---------------------------------------------------------------------
HipGirth 0.261***
(0.061)
WaistGirth 0.788*** 0.652***
(0.028) (0.053)
Constant 33.151*** 16.065***
(2.491) (2.784)
---------------------------------------------------------------------
Observations 157 157
R2 0.832 0.911
Adjusted R2 0.831 0.910
Residual Std. Error 4.906 (df = 155) 3.746 (df = 154)
F Statistic 765.842*** (df = 1; 155) 787.911*** (df = 2; 154)
=====================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Tall Rectangle (n = 136)
=======================================================================
Dependent variable:
---------------------------------------------------
HipGirth BustGirth
(1) (2)
-----------------------------------------------------------------------
HipGirth 0.266***
(0.088)
WaistGirth 0.855*** 0.634***
(0.025) (0.079)
Constant 29.904*** 16.147***
(2.259) (3.478)
-----------------------------------------------------------------------
Observations 136 136
R2 0.894 0.894
Adjusted R2 0.893 0.892
Residual Std. Error 3.578 (df = 134) 3.627 (df = 133)
F Statistic 1,125.440*** (df = 1; 134) 560.652*** (df = 2; 133)
=======================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Regular Bottom Hourglass (n = 42)
===================================================================
Dependent variable:
-----------------------------------------------
HipGirth BustGirth
(1) (2)
-------------------------------------------------------------------
HipGirth 0.039
(0.127)
WaistGirth 1.289*** 0.888***
(0.070) (0.173)
Constant 0.398 16.291***
(5.192) (4.160)
-------------------------------------------------------------------
Observations 42 42
R2 0.893 0.876
Adjusted R2 0.891 0.870
Residual Std. Error 2.722 (df = 40) 2.180 (df = 39)
F Statistic 334.506*** (df = 1; 40) 138.364*** (df = 2; 39)
===================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Short Bottom Hourglass (n = 31)
=================================================================
Dependent variable:
---------------------------------------------
HipGirth BustGirth
(1) (2)
-----------------------------------------------------------------
HipGirth 0.442***
(0.086)
WaistGirth 1.254*** 0.101
(0.138) (0.125)
Constant 2.044 35.491***
(9.747) (4.498)
-----------------------------------------------------------------
Observations 31 31
R2 0.740 0.826
Adjusted R2 0.731 0.813
Residual Std. Error 3.247 (df = 29) 1.497 (df = 28)
F Statistic 82.382*** (df = 1; 29) 66.234*** (df = 2; 28)
=================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Tall Triangle (n = 11)
===============================================================
Dependent variable:
-------------------------------------------
HipGirth BustGirth
(1) (2)
---------------------------------------------------------------
HipGirth 0.571*
(0.280)
WaistGirth 0.971*** 0.436
(0.137) (0.296)
Constant 23.596* -3.835
(11.784) (11.922)
---------------------------------------------------------------
Observations 11 11
R2 0.848 0.907
Adjusted R2 0.831 0.884
Residual Std. Error 2.325 (df = 9) 1.957 (df = 8)
F Statistic 50.274*** (df = 1; 9) 39.054*** (df = 2; 8)
===============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Regular Hourglass (n = 9)
=================================================================
Dependent variable:
---------------------------------------------
HipGirth BustGirth
(1) (2)
-----------------------------------------------------------------
HipGirth 0.520
(0.380)
WaistGirth 0.844*** 0.553
(0.069) (0.328)
Constant 35.679*** -0.375
(5.849) (14.786)
-----------------------------------------------------------------
Observations 9 9
R2 0.955 0.972
Adjusted R2 0.949 0.962
Residual Std. Error 0.918 (df = 7) 0.923 (df = 6)
F Statistic 149.540*** (df = 1; 7) 103.196*** (df = 2; 6)
=================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Tall Bottom Hourglass (n = 8)
==============================================================
Dependent variable:
------------------------------------------
HipGirth BustGirth
(1) (2)
--------------------------------------------------------------
HipGirth 1.091*
(0.445)
WaistGirth 0.879*** -0.409
(0.127) (0.415)
Constant 32.238** 11.799
(10.334) (18.260)
--------------------------------------------------------------
Observations 8 8
R2 0.889 0.814
Adjusted R2 0.871 0.739
Residual Std. Error 2.164 (df = 6) 2.361 (df = 5)
F Statistic 48.152*** (df = 1; 6) 10.907** (df = 2; 5)
==============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Regular Triangle (n = 8)
===============================================================
Dependent variable:
-------------------------------------------
HipGirth BustGirth
(1) (2)
---------------------------------------------------------------
HipGirth 1.038
(0.755)
WaistGirth 1.331*** -0.391
(0.114) (1.026)
Constant -8.733 18.216
(10.028) (19.672)
---------------------------------------------------------------
Observations 8 8
R2 0.958 0.828
Adjusted R2 0.951 0.760
Residual Std. Error 1.766 (df = 6) 3.265 (df = 5)
F Statistic 137.359*** (df = 1; 6) 12.076** (df = 2; 5)
===============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Regular Inverted Triangle (n = 5)
==========================================================
Dependent variable:
--------------------------------------
HipGirth BustGirth
(1) (2)
----------------------------------------------------------
HipGirth 0.128
(0.295)
WaistGirth 0.932* 0.892
(0.336) (0.324)
Constant 11.267 15.389
(32.990) (17.198)
----------------------------------------------------------
Observations 5 5
R2 0.720 0.946
Adjusted R2 0.627 0.892
Residual Std. Error 5.754 (df = 3) 2.943 (df = 2)
F Statistic 7.712* (df = 1; 3) 17.470* (df = 2; 2)
==========================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Short Inverted Triangle (n = 5)
=============================================================
Dependent variable:
-----------------------------------------
HipGirth BustGirth
(1) (2)
-------------------------------------------------------------
HipGirth 0.276
(0.608)
WaistGirth 0.665** 0.713
(0.138) (0.430)
Constant 38.761* 16.767
(12.298) (26.910)
-------------------------------------------------------------
Observations 5 5
R2 0.886 0.950
Adjusted R2 0.848 0.901
Residual Std. Error 2.394 (df = 3) 2.522 (df = 2)
F Statistic 23.303** (df = 1; 3) 19.132** (df = 2; 2)
=============================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Regular Top Hourglass (n = 4)
=======================================================
Dependent variable:
-----------------------------------
HipGirth BustGirth
(1) (2)
-------------------------------------------------------
HipGirth 1.109
(0.369)
WaistGirth 0.194 -0.202
(0.158) (0.109)
Constant 105.073** 12.556
(18.516) (39.965)
-------------------------------------------------------
Observations 4 4
R2 0.432 0.900
Adjusted R2 0.148 0.701
Residual Std. Error 2.394 (df = 2) 1.249 (df = 1)
F Statistic 1.523 (df = 1; 2) 4.525 (df = 2; 1)
=======================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Tall Top Hourglass (n = 3)
=================================================
Dependent variable:
-----------------------------
HipGirth BustGirth
(1) (2)
-------------------------------------------------
HipGirth 0.407
WaistGirth 0.765* 0.491
(0.096)
Constant 38.052 24.930
(10.027)
-------------------------------------------------
Observations 3 3
R2 0.985 1.000
Adjusted R2 0.969
Residual Std. Error 4.022 (df = 1)
F Statistic 63.587* (df = 1; 1)
=================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Short Triangle (n = 3)
================================================
Dependent variable:
----------------------------
HipGirth BustGirth
(1) (2)
------------------------------------------------
HipGirth 0.507
WaistGirth 0.885 0.567
(0.634)
Constant 30.087 -8.328
(53.940)
------------------------------------------------
Observations 3 3
R2 0.661 1.000
Adjusted R2 0.322
Residual Std. Error 3.502 (df = 1)
F Statistic 1.949 (df = 1; 1)
================================================
Note: *p<0.1; **p<0.05; ***p<0.01
Tall Inverted Triangle (n = 1)
=========================================
Dependent variable:
----------------------------
HipGirth BustGirth
(1) (2)
-----------------------------------------
HipGirth
WaistGirth
Constant 119.500 128.000
-----------------------------------------
Observations 1 1
R2 0.000 0.000
Adjusted R2 0.000 0.000
=========================================
Note: *p<0.1; **p<0.05; ***p<0.01
The presentation of the linear regression models has been labored but the validity of these model is critical to the construction of size charts.
The number of subjects in each body shape/height classification is critical to linear prediction models. Once the number of subjects drops below, say 10, we need to question the usefulness of the model. In that it is problematic whether these small sets of subjects are representative of the wider population.
In a definitive anthropometric study for the U.S.A.F Gilbert Daniels observed: “The ‘average [person]’ is a misleading and illusory concept as a basis for design criteria, and it is particularly so when more than one dimension is being considered” Daniels (1952).
While this observation provides insight for garment designers we stress this is not the focus of this study. For this study, the goal is to anticipate the proportion of the consumer population that meet the measurements predicted by the charts.
Size Chart Construction
Size charts are constructed for each body shape / height group. Size charts are defined by the linear regression model discussed in Section 3. The machinery for producing the size charts employs a workflow model made possible by POSIT’s3 tidymodels framework (Kuhn and Wickham 2020).
sizeChartWorkflow <- function(bshape, ht, d) {
# Extract the bodyshape and height for
# size chart prediction
df <- working_female_df %>%
select(BustGirth, HipGirth, WaistGirth, BodyShape, Length) %>%
filter(BodyShape == bshape & Length == ht)
hip_recipe <-
recipe(HipGirth ~ WaistGirth, data = df)
# Define a linear regression model
hip_model <- linear_reg() %>%
set_engine("lm")
# Create a workflow
hip_workflow <- workflow() %>%
add_recipe(hip_recipe) %>%
add_model(hip_model)
# Fit the model
hip_fit <- fit(hip_workflow,
data = df)
# New WaistGirth measurements
sequence <-
round_half_up(
seq(from = min(df$WaistGirth),
to = max(df$WaistGirth), length.out = d))
# Seed size chart with d Waist girths divisions
sizeChart <- data.frame(WaistGirth = sequence)
# Predict HipGirth
predicted_hip <- predict(hip_fit, sizeChart)
sizeChart$HipGirth <- predicted_hip$.pred
# Recipe for predicting BustGirth from
# WaistGirth and predicted HipGirth
bust_recipe <-
recipe(BustGirth ~ WaistGirth + HipGirth,
data = df)
# Create a workflow
bust_workflow <- workflow() %>%
add_recipe(bust_recipe) %>%
add_model(hip_model)
# Fit the model
bust_fit <- fit(bust_workflow, data = df)
# Predict BustGirth using new WaistGirth and predicted HipGirth
final_predictions <- predict(bust_fit, sizeChart)
sizeChart$BustGirth <- final_predictions$.pred
# sort size chart
# Display the final predictions
# Sort by WaistGirth
sorted_sizeChart <- sizeChart %>%
arrange(WaistGirth) %>%
mutate_at(vars(BustGirth,
HipGirth,
WaistGirth), round_half_up)
percentages_df <-
calculate_percentage(sorted_sizeChart, df) %>%
mutate_at(vars(Combined_Percentage), round_half_up)
percentages_df$Combined_Percentage <-
paste0(percentages_df$Combined_Percentage, "%")
Percent <- c(percentages_df$Combined_Percentage, " ")
sorted_sizeChart <- cbind(sorted_sizeChart, Percent)
sorted_sizeChart$WaistGirth <-
paste0(sorted_sizeChart$WaistGirth, " cm ")
sorted_sizeChart$HipGirth <-
paste0(sorted_sizeChart$HipGirth, " cm ")
sorted_sizeChart$BustGirth <-
paste0(sorted_sizeChart$BustGirth, " cm ")
return(sorted_sizeChart)
}| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 67 cm | 87.5 cm | 81.5 cm | 42% |
| 82.5 cm | 99 cm | 95 cm | 19% |
| 98 cm | 110.5 cm | 108.5 cm | 5% |
| 114 cm | 122.5 cm | 122.5 cm | 1% |
| 129.5 cm | 134.5 cm | 136.5 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 68.5 cm | 87 cm | 83.5 cm | 72.5% |
| 94.5 cm | 107.5 cm | 106 cm | 16% |
| 121 cm | 128.5 cm | 128.5 cm | 0.5% |
| 147 cm | 149 cm | 151 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 68.5 cm | 88.5 cm | 83 cm | 25% |
| 81.5 cm | 99.5 cm | 94.5 cm | 23.5% |
| 94.5 cm | 110.5 cm | 105.5 cm | 8% |
| 107.5 cm | 122 cm | 117 cm | 3% |
| 120.5 cm | 133 cm | 128 cm | 0.5% |
| 133.5 cm | 144 cm | 139 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 64.5 cm | 83.5 cm | 77 cm | 31% |
| 70.5 cm | 91 cm | 82.5 cm | 9.5% |
| 76 cm | 98.5 cm | 87.5 cm | 7% |
| 82 cm | 106 cm | 93.5 cm | 7% |
| 88 cm | 114 cm | 99 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 64 cm | 82.5 cm | 78.5 cm | 64.5% |
| 71.5 cm | 91.5 cm | 83 cm | 9.5% |
| 79 cm | 101 cm | 88 cm | 3% |
| 86.5 cm | 110.5 cm | 93 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 80 cm | 101.5 cm | 89 cm | 54.5% |
| 89 cm | 110 cm | 98 cm | 18% |
| 97.5 cm | 118.5 cm | 106.5 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 76.5 cm | 100 cm | 94 cm | 22% |
| 82.5 cm | 105.5 cm | 100 cm | 44.5% |
| 88.5 cm | 110.5 cm | 106 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 73 cm | 96.5 cm | 87 cm | 25% |
| 81 cm | 103.5 cm | 91.5 cm | 25% |
| 88.5 cm | 110 cm | 95.5 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 81.5 cm | 99.5 cm | 90 cm | 50% |
| 86.5 cm | 106.5 cm | 95 cm | 12.5% |
| 91 cm | 112.5 cm | 99.5 cm | 12.5% |
| 96 cm | 119 cm | 104 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 90.5 cm | 95.5 cm | 108.5 cm | 80% |
| 112.5 cm | 116 cm | 130.5 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 74.5 cm | 88.5 cm | 94 cm | 80% |
| 97 cm | 103.5 cm | 114.5 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 67.5 cm | 89.5 cm | 94.5 cm | 33.5% |
| 94 cm | 110 cm | 116 cm | 33.5% |
| 120.5 cm | 130 cm | 137 cm |
| WaistGirth | HipGirth | BustGirth | Percent |
|---|---|---|---|
| 82.5 cm | 103 cm | 91 cm | 33.5% |
| 86 cm | 106 cm | 94.5 cm | 33.5% |
| 89.5 cm | 109.5 cm | 98 cm |
From an inventory management perspective, size charts resemble a kind of sieve. The Rectangle sieve screens 83.75% of female consumers who are then sifted according to height: 56.5% between 159cm and 168cm, The sieving will further predict 42% of those consumers to have waist measurements [67cm, 82.5cm], with hip measurements [87.5 cm, 99cm],and bust measurements [81.5cm, 95cm]. The charts are therefore part of the assortment planning process (Gurhan Kok, Fisher, and Vaidyanathan 2009) undertaken by retail operation managers. Clearly, this operations management perspective is not the way garment designers view size charts.
Alternative Approaches
Other researchers have explored statistical models to develop size charts. Both Gupta and Gangadhar (2004) and Otieno (2008) employ statistical techniques. Interestingly the statistical models they employ are exploratory rather than predictive. Gupta and Gangadhar (2004), for example, used Principal Component Analysis (PCA) to reduce the relationships between the various body measurements to identify the key measurements: Waist, Hip, and Bust. The resulting size charts where then validated by employing a nearest neighbor heuristic method that minimized the euclidean distance between the assigned measurements allocated with the actual measurements given in the data. The method used, aggregate loss, appears to be a version of the kmeans clustering algorithm (Everitt and Hothorn 2011). However, it is not clear from the aggregate loss formula Equation 1 if the hip, waist and bust variables are treated as a structured record with their relationship tied by a constant coefficient or as unary values that are allowed to wander independently in 3D vector space.
\[ \begin{align*} \small \text{Aggregate loss} &= \small\sqrt{\frac{\sum \left( (\text{assigned bust} - \text{actual bust})^2 + (\text{assigned hip} - \text{actual hip})^2 + (\text{assigned waist} - \text{actual waist})^2 \right)}{\text{N}}} \end{align*} \tag{1}\]
Inspired by Gupta and Gangadhar (2004) we applied PCA to the BodyM data which confirmed the decisions made in our analysis workflow. In particular grouping data into height categories.
The height vector is orthogonal, with other measurements that positively correlate with height, to the waist, hip and chest vectors. The PCA biplot indicates correlations and illustrates the structure within a data set. The direction, length, and angular distance between vectors are the key indicators of the relationship between variables.
It should be recognized that the major obstacle to the establishment of predictable models for anthropometric research is the small data sets of body measurements that are available. Collecting human body measurements, either manually, or via computer vision technology, is an expensive, time consuming activity. The volunteers are typically self selecting. Not everybody wants to be measured. Measurements taken from human random samples are clearly not homogeneous even before girth measurements are analysed. The sample needs to be pre-processed according to gender, age, height, body shape, and maybe even ethnic origin. The BodyM data set contained 2,779 individuals, 800 of which were females used for this research report. From that 800 only one individual represented the Tall Inverted Triangle body shape.
The 3D graph Figure 2 illustrates the importance of predicting size measurements from body shape classification rather than from unprocessed sampled data.
The clustering methods typically employed by researchers in this field are based on nearest neighbor heuristics. They do not calculate the probability that a subject belongs to a certain classification. Once large data sets become available model-based clustering methods (Scrucca et al. 2023) may identify subtle subgroups that would enrich regression analysis.
The approach undertaken here developed a data analysis workflow that is extendable. Additional body measurements can be feed into the regression models to enrich the quality of charts and allow further validation of the models.
Acknowledgements
The statistical analyses and visualizations in this document were performed using R, a free software environment for statistical computing and graphics. R is available from the R Foundation for Statistical Computing at https://www.r-project.org.
R packages used.
R base (2024)
Broom (Robinson, Hayes, and Couch 2024)
Development tools (Wickham et al. 2022)
ggfortify (Tang, Horikoshi, and Li 2016)
ggplot2 (Wickham 2016)
kableExtra (Zhu 2024)
knitr (Xie 2024)
tidyverse (Wickham et al. 2019)
tidymodels (Kuhn and Wickham 2020)
This document was created using Quarto, an open-source scientific and technical publishing system. Quarto provides tools for creating dynamic documents that can include text, code, and output. For more information about Quarto, visit quarto.org.