Background

Body Mass Index (BMI) is an indicator of the amount of body fat of a person. It is a measure of ones body weight in relation to their height. It is a tool that can assess a person’s risk to diseases such as obesity and being underweight. According to the National Heart, Lung, and Blood Institute, BMI values

People with BMI above 25.0 kg/m2 have an increased risk of heart disease, type 2 diabetes, hypertension, cardiovascular disease, and many more health risk [1, 3].

Many people are concerned about their weight and health. Talking to their doctors, an exercise coach or even when researching on the internet for diet tips, BMI is very likely to be mentioned. Thus, people tend to check what their BMI is by themselves. However, when compared to measurements taken by trained professionals, discrepancies can occur when one takes these measurements themselves. One example of a source of inconsistency in measurements is the sensitivity and type of home-scales being used. Digital home-scales have been found to provide more accurate and consistent measurements than dial scales [4]. Moreover, according to a systematic review by Gorber, S.C. et al, both men and women are overwhelmingly seen overestimating their height and underestimating their weight [2]. This underestimation of weight is more pronounced in women [1, 3]. As a result, estimation that deviates further than the actual measurements of the weight and height can lead to incorrect calculation of one’s BMI.

The research question
Therefore, this project aims to answer:

  1. How common is it that an individual can be mislead by their BMI measurements because of estimated measurements of either their weight or height or both?
  2. Is there any trend between a self-reported BMI and the actual BMI, distinguished by gender?

Retrieving the Dataset

The dataset used in this project is called Self-Reports of Height and Weight. The dataset Davis.csv was retrieved from http://vincentarelbundock.github.io/Rdatasets/ on 3rd January, 2019 and was then save in my Github as BodyHW.

Note
Upon visual inspection of the dataset, ID # 12 had input for a weight of 166 kg with a height of 57 cm which is very peculiar, while their reported weight and height were 56 kg and 163 cm respectively, therefore I decided to swap these numbers, strongly convinced they were entered in the wrong columns.

# Retrieving the dataset from my Github
theURL <- "https://raw.githubusercontent.com/greeneyefirefly/sps-msds-BodyHW/master/dataset.csv"
BodyHW <- read.csv (file = theURL, header = TRUE, sep = ",")

Summary of the Dataset

The dataset contains 200 unique observations with 5 variables:

  • sex - A factor with levels: F, female; M, male.
  • weight - Measured weight in kg.
  • height - Measured height in cm.
  • repwt - Reported weight in kg. This variable has some missing values.
  • repht - Reported height in cm. This variable has some missing values
Missing values are excluded in calculations.
head(BodyHW)
##   X sex weight height repwt repht
## 1 1   M     77    182    77   180
## 2 2   F     58    161    51   159
## 3 3   F     53    161    54   158
## 4 4   M     68    177    70   175
## 5 5   F     59    157    59   155
## 6 6   M     76    170    76   165
summary(BodyHW)
##        X          sex         weight           height     
##  Min.   :  1.00   F:112   Min.   : 39.00   Min.   :148.0  
##  1st Qu.: 50.75   M: 88   1st Qu.: 55.00   1st Qu.:164.0  
##  Median :100.50           Median : 63.00   Median :169.5  
##  Mean   :100.50           Mean   : 65.25   Mean   :170.6  
##  3rd Qu.:150.25           3rd Qu.: 73.25   3rd Qu.:177.2  
##  Max.   :200.00           Max.   :119.00   Max.   :197.0  
##                                                           
##      repwt            repht      
##  Min.   : 41.00   Min.   :148.0  
##  1st Qu.: 55.00   1st Qu.:160.5  
##  Median : 63.00   Median :168.0  
##  Mean   : 65.62   Mean   :168.5  
##  3rd Qu.: 73.50   3rd Qu.:175.0  
##  Max.   :124.00   Max.   :200.0  
##  NA's   :17       NA's   :17
aggregate(BodyHW$weight, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
##   Group.1        x
## 1       F 56.89286
## 2       M 75.89773
aggregate(BodyHW$height, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
##   Group.1        x
## 1       F 164.7143
## 2       M 178.0114
aggregate(BodyHW$repwt, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
##   Group.1        x
## 1       F 56.74257
## 2       M 76.56098
aggregate(BodyHW$repht, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
##   Group.1        x
## 1       F 162.1980
## 2       M 176.2561

The dataset consist of 112 females and 88 males. The table below shows the actual and the reported measurements for females and males.

Measures Overall Females Males
Actual Weight 65.25 kg 56.89 kg 75.90 kg
Reported Weight 65.62 kg 56.74 kg 76.56 kg
Weight difference -0.37 kg 0.15 kg -0.66 kg
Actual Height 170.6 cm 164.7 cm 178.0 cm
Reported Height 168.5 cm 162.2 cm 176.3 cm
Height difference 2.1 cm 2.5 cm 1.7 cm

The table suggest that both the self-reported weight and height for females were underestimated when compared to their actual measurements, which is consistent with the literature. Whereas, for males, only height is underestimated when compared to their actual height.

From the scatter plots below, it is clearer that there are discrepancies among the self-reported measurements and the actual weight and height.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
# A scatter plot of the self-reported weight and height of the participants
ggRHW <- ggplot(BodyHW, aes(x=repwt, y=repht, na.rm = TRUE))
ggRHW + geom_point(na.rm = TRUE) + labs(title="Self-Reported Measurements", x="Self-Reported Weight", y="Self-Reported Height") + geom_smooth(method='lm', na.rm = TRUE)

# A scatter plot of the actual weight and height of the participants
ggAHW <- ggplot(BodyHW, aes(x=weight, y=height))
ggAHW + geom_point() + labs(title="Actual Measurements", x="Weight", y="Height") + geom_smooth(method='lm')


Investigating the Body Mass Index

The formula for calculating the Body Mass Index is: \(BMI = weight (kg) / height^{2} (m^{2})\)

# Converting centimeter into meter for the actual and the reported height
BodyHW[["mheight"]] <- BodyHW$height/100
BodyHW[["repmht"]] <- BodyHW$repht/100
# Calculating the BMI from the actual and the reported measurements
BodyHW[["BMI"]] <- BodyHW$weight/(BodyHW$mheight^2)
BodyHW[["repBMI"]] <- BodyHW$repwt/(BodyHW$repmht^2)

The variablity of the actual and self-reported BMI by gender.

Looking at the box plots for the both the actual and self-reported BMI, males presented to have a greater variability than females. Moreover, the median for the self-reported BMI by males has a much noticeable shift when compared to actual BMI by males.

ggrepBMI <- ggplot(BodyHW, aes(x=sex, y=repBMI, na.rm = TRUE))
ggrepBMI + geom_boxplot(na.rm = TRUE) + labs(title="BMI based on Self-Reported Measurements distributed by Gender", x="sex", y="Self reported BMI") 

ggABMI <- ggplot(BodyHW, aes(x=sex, y=BMI))
ggABMI + geom_boxplot() + labs(title="BMI based on Actual Measurements distributed by Gender", x="sex", y="Actual BMI") 


Identifiying the health status of participants based on their BMIs calculated from the actual and self-reported measurements.

# Categorizing the BMI calculated from the actual measurements.
BodyHW[["ActualCategory"]] <- ifelse(BodyHW$BMI < 18.50,"underweight", 
                              ifelse(BodyHW$BMI >= 18.50 & BodyHW$BMI < 25.00, "healthy",
                              ifelse(BodyHW$BMI >= 25.00 & BodyHW$BMI < 30.00, "overweight", "obese")))
# Categorizing the BMI calculated from the self-reported measurements.
BodyHW[["repCategory"]] <- ifelse(BodyHW$repBMI < 18.50,"underweight", 
                           ifelse(BodyHW$repBMI >= 18.50 & BodyHW$repBMI < 25.00, "healthy",
                           ifelse(BodyHW$repBMI >= 25.00 & BodyHW$repBMI < 30.00, "overweight", "obese")))

After determining what a participant’s health status would be based on their BMIs, the histograms show that are many self-reported BMI that differs from the actual BMI significantly.

ggrepBMI1 <- ggplot(BodyHW, aes(x=repBMI, fill=repCategory, na.rm = TRUE))
ggrepBMI1 + geom_histogram(bins=40, na.rm = TRUE)  + labs(title="BMI based on Self-Reported Measurements", x="Self-reported BMI", y="Count")

ggABMI2 <- ggplot(BodyHW, aes(x=BMI, fill=ActualCategory))
ggABMI2 + geom_histogram(bins=40) + labs(title="BMI based on Actual Measurements", x="Actual BMI", y="Count")

As a result, the table below highlights that there are 8 reports of healthy BMI when in fact they are actually 3 overweight and 5 underweight participants. Having the belief that one is healthy when in fact they are not, can lead to more serious risk because they might not pay attention to their health if there is not a need. Other cases included 9 reports of overweight and 1 report of underweight when in fact they are healthy, and 2 reports of obese when they are overweight.

Moreover, the table reveals that males are more likely to be under the impression that they are overweight when in fact they are healthy, and vice versa. Females, however, would have thought to be healthy when they are in fact underweight.

rows <- which(BodyHW$ActualCategory!=BodyHW$repCategory)
table(BodyHW[rows, c("sex","repCategory","ActualCategory")])
## , , ActualCategory = healthy
## 
##    repCategory
## sex healthy obese overweight underweight
##   F       0     0          1           1
##   M       0     0          8           0
## 
## , , ActualCategory = overweight
## 
##    repCategory
## sex healthy obese overweight underweight
##   F       0     1          0           0
##   M       3     1          0           0
## 
## , , ActualCategory = underweight
## 
##    repCategory
## sex healthy obese overweight underweight
##   F       5     0          0           0
##   M       0     0          0           0

Conclusion

In conclusion, 11% of the individuals can be mislead by their BMI measurements because of incorrect estimation of either their weight or height or both. Four percent of males could have been under the impression that they are overweight when they are instead healthy, and 2% of the females could have been misled to be healthy when they are instead underweight. However, this does not imply that, for example, an incorrect BMI categorization of obese for overweight, is not as alarming. In such cases, individuals should still pay close attention to their consumption and exercise activities. This is also the recommendation for individuals who had incorrect BMI categorization of underweight but are healthy. It is key that every individual, no matter their BMI category, should remain determined to maintain a BMI status of healthy, i.e. well between 18.5 kg/m2 to 24.9 kg/m2, particularly those at the cut-offs between categories and especially those who are obese.



Works Cited
1. Engstrom, J.L.; Paterson, S.A.; Doherty, A.; Trabulsi, M.; Speer, K.L. Accuracy of self-reported height and weight in women: An integrative review of the literature. J. Midwifery Women’s Health 2003, 48, 338-345.
2. Gorber, S.C.; Tremblay, M.; Moher, D.; Gorber, B. A comparison of direct vs. self-report measures for assessing height, weight and body mass index: A systematic review. Obes. Rev. 2007, 8, 307-326.
3. National Heart, Lung, and Blood Insitute. Aim for a Healthy Weight. Retrieved January 4, 2019, https://www.nhlbi.nih.gov/health/educational/lose_wt/
4. Yorkin, M.; Spaccarotella, K.; Martin-Biggers, J.; Quick, V.; Byrd-Bredbenner, C. Accuracy and consistency of weights provided by home bathroom scales. BMC Public Health 2013, 13, 1194.