Body Mass Index (BMI) is an indicator of the amount of body fat of a person. It is a measure of ones body weight in relation to their height. It is a tool that can assess a person’s risk to diseases such as obesity and being underweight. According to the National Heart, Lung, and Blood Institute, BMI values
People with BMI above 25.0 kg/m2 have an increased risk of heart disease, type 2 diabetes, hypertension, cardiovascular disease, and many more health risk [1, 3].
Many people are concerned about their weight and health. Talking to their doctors, an exercise coach or even when researching on the internet for diet tips, BMI is very likely to be mentioned. Thus, people tend to check what their BMI is by themselves. However, when compared to measurements taken by trained professionals, discrepancies can occur when one takes these measurements themselves. One example of a source of inconsistency in measurements is the sensitivity and type of home-scales being used. Digital home-scales have been found to provide more accurate and consistent measurements than dial scales [4]. Moreover, according to a systematic review by Gorber, S.C. et al, both men and women are overwhelmingly seen overestimating their height and underestimating their weight [2]. This underestimation of weight is more pronounced in women [1, 3]. As a result, estimation that deviates further than the actual measurements of the weight and height can lead to incorrect calculation of one’s BMI.
The research question
Therefore, this project aims to answer:
The dataset used in this project is called Self-Reports of Height and Weight. The dataset Davis.csv was retrieved from http://vincentarelbundock.github.io/Rdatasets/ on 3rd January, 2019 and was then save in my Github as BodyHW.
Note
Upon visual inspection of the dataset, ID # 12 had input for a weight of 166 kg with a height of 57 cm which is very peculiar, while their reported weight and height were 56 kg and 163 cm respectively, therefore I decided to swap these numbers, strongly convinced they were entered in the wrong columns.
# Retrieving the dataset from my Github
theURL <- "https://raw.githubusercontent.com/greeneyefirefly/sps-msds-BodyHW/master/dataset.csv"
BodyHW <- read.csv (file = theURL, header = TRUE, sep = ",")
The dataset contains 200 unique observations with 5 variables:
head(BodyHW)
## X sex weight height repwt repht
## 1 1 M 77 182 77 180
## 2 2 F 58 161 51 159
## 3 3 F 53 161 54 158
## 4 4 M 68 177 70 175
## 5 5 F 59 157 59 155
## 6 6 M 76 170 76 165
summary(BodyHW)
## X sex weight height
## Min. : 1.00 F:112 Min. : 39.00 Min. :148.0
## 1st Qu.: 50.75 M: 88 1st Qu.: 55.00 1st Qu.:164.0
## Median :100.50 Median : 63.00 Median :169.5
## Mean :100.50 Mean : 65.25 Mean :170.6
## 3rd Qu.:150.25 3rd Qu.: 73.25 3rd Qu.:177.2
## Max. :200.00 Max. :119.00 Max. :197.0
##
## repwt repht
## Min. : 41.00 Min. :148.0
## 1st Qu.: 55.00 1st Qu.:160.5
## Median : 63.00 Median :168.0
## Mean : 65.62 Mean :168.5
## 3rd Qu.: 73.50 3rd Qu.:175.0
## Max. :124.00 Max. :200.0
## NA's :17 NA's :17
aggregate(BodyHW$weight, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
## Group.1 x
## 1 F 56.89286
## 2 M 75.89773
aggregate(BodyHW$height, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
## Group.1 x
## 1 F 164.7143
## 2 M 178.0114
aggregate(BodyHW$repwt, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
## Group.1 x
## 1 F 56.74257
## 2 M 76.56098
aggregate(BodyHW$repht, by=list(BodyHW$sex), FUN=mean, na.rm=TRUE)
## Group.1 x
## 1 F 162.1980
## 2 M 176.2561
The dataset consist of 112 females and 88 males. The table below shows the actual and the reported measurements for females and males.
| Measures | Overall | Females | Males |
|---|---|---|---|
| Actual Weight | 65.25 kg | 56.89 kg | 75.90 kg |
| Reported Weight | 65.62 kg | 56.74 kg | 76.56 kg |
| Weight difference | -0.37 kg | 0.15 kg | -0.66 kg |
| Actual Height | 170.6 cm | 164.7 cm | 178.0 cm |
| Reported Height | 168.5 cm | 162.2 cm | 176.3 cm |
| Height difference | 2.1 cm | 2.5 cm | 1.7 cm |
The table suggest that both the self-reported weight and height for females were underestimated when compared to their actual measurements, which is consistent with the literature. Whereas, for males, only height is underestimated when compared to their actual height.
From the scatter plots below, it is clearer that there are discrepancies among the self-reported measurements and the actual weight and height.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
# A scatter plot of the self-reported weight and height of the participants
ggRHW <- ggplot(BodyHW, aes(x=repwt, y=repht, na.rm = TRUE))
ggRHW + geom_point(na.rm = TRUE) + labs(title="Self-Reported Measurements", x="Self-Reported Weight", y="Self-Reported Height") + geom_smooth(method='lm', na.rm = TRUE)
# A scatter plot of the actual weight and height of the participants
ggAHW <- ggplot(BodyHW, aes(x=weight, y=height))
ggAHW + geom_point() + labs(title="Actual Measurements", x="Weight", y="Height") + geom_smooth(method='lm')
The formula for calculating the Body Mass Index is: \(BMI = weight (kg) / height^{2} (m^{2})\)
# Converting centimeter into meter for the actual and the reported height
BodyHW[["mheight"]] <- BodyHW$height/100
BodyHW[["repmht"]] <- BodyHW$repht/100
# Calculating the BMI from the actual and the reported measurements
BodyHW[["BMI"]] <- BodyHW$weight/(BodyHW$mheight^2)
BodyHW[["repBMI"]] <- BodyHW$repwt/(BodyHW$repmht^2)
Looking at the box plots for the both the actual and self-reported BMI, males presented to have a greater variability than females. Moreover, the median for the self-reported BMI by males has a much noticeable shift when compared to actual BMI by males.
ggrepBMI <- ggplot(BodyHW, aes(x=sex, y=repBMI, na.rm = TRUE))
ggrepBMI + geom_boxplot(na.rm = TRUE) + labs(title="BMI based on Self-Reported Measurements distributed by Gender", x="sex", y="Self reported BMI")
ggABMI <- ggplot(BodyHW, aes(x=sex, y=BMI))
ggABMI + geom_boxplot() + labs(title="BMI based on Actual Measurements distributed by Gender", x="sex", y="Actual BMI")
# Categorizing the BMI calculated from the actual measurements.
BodyHW[["ActualCategory"]] <- ifelse(BodyHW$BMI < 18.50,"underweight",
ifelse(BodyHW$BMI >= 18.50 & BodyHW$BMI < 25.00, "healthy",
ifelse(BodyHW$BMI >= 25.00 & BodyHW$BMI < 30.00, "overweight", "obese")))
# Categorizing the BMI calculated from the self-reported measurements.
BodyHW[["repCategory"]] <- ifelse(BodyHW$repBMI < 18.50,"underweight",
ifelse(BodyHW$repBMI >= 18.50 & BodyHW$repBMI < 25.00, "healthy",
ifelse(BodyHW$repBMI >= 25.00 & BodyHW$repBMI < 30.00, "overweight", "obese")))
After determining what a participant’s health status would be based on their BMIs, the histograms show that are many self-reported BMI that differs from the actual BMI significantly.
ggrepBMI1 <- ggplot(BodyHW, aes(x=repBMI, fill=repCategory, na.rm = TRUE))
ggrepBMI1 + geom_histogram(bins=40, na.rm = TRUE) + labs(title="BMI based on Self-Reported Measurements", x="Self-reported BMI", y="Count")
ggABMI2 <- ggplot(BodyHW, aes(x=BMI, fill=ActualCategory))
ggABMI2 + geom_histogram(bins=40) + labs(title="BMI based on Actual Measurements", x="Actual BMI", y="Count")
As a result, the table below highlights that there are 8 reports of healthy BMI when in fact they are actually 3 overweight and 5 underweight participants. Having the belief that one is healthy when in fact they are not, can lead to more serious risk because they might not pay attention to their health if there is not a need. Other cases included 9 reports of overweight and 1 report of underweight when in fact they are healthy, and 2 reports of obese when they are overweight.
Moreover, the table reveals that males are more likely to be under the impression that they are overweight when in fact they are healthy, and vice versa. Females, however, would have thought to be healthy when they are in fact underweight.
rows <- which(BodyHW$ActualCategory!=BodyHW$repCategory)
table(BodyHW[rows, c("sex","repCategory","ActualCategory")])
## , , ActualCategory = healthy
##
## repCategory
## sex healthy obese overweight underweight
## F 0 0 1 1
## M 0 0 8 0
##
## , , ActualCategory = overweight
##
## repCategory
## sex healthy obese overweight underweight
## F 0 1 0 0
## M 3 1 0 0
##
## , , ActualCategory = underweight
##
## repCategory
## sex healthy obese overweight underweight
## F 5 0 0 0
## M 0 0 0 0
In conclusion, 11% of the individuals can be mislead by their BMI measurements because of incorrect estimation of either their weight or height or both. Four percent of males could have been under the impression that they are overweight when they are instead healthy, and 2% of the females could have been misled to be healthy when they are instead underweight. However, this does not imply that, for example, an incorrect BMI categorization of obese for overweight, is not as alarming. In such cases, individuals should still pay close attention to their consumption and exercise activities. This is also the recommendation for individuals who had incorrect BMI categorization of underweight but are healthy. It is key that every individual, no matter their BMI category, should remain determined to maintain a BMI status of healthy, i.e. well between 18.5 kg/m2 to 24.9 kg/m2, particularly those at the cut-offs between categories and especially those who are obese.