Body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender, are given for 507 physically active individuals - 247 men and 260 women.
Is there a relationship between skeletal diameter (biiliac) and BMI in women?
Evaluates the likelihood of flaws with the statistic BMI with regards to the largely uncontrollable and genetically defined diameter of one’s bones.
Is there a relationship between body girth (waist) and BMI in women?
Evaluate accuracy of statistic BMI with regards to the waist diameter, an area on the woman’s body that largely is not increased by muscle mass but is increased by an increase in fat.
Is there a relationship between body girth (thigh) and BMI in women?
Evaluate accuracy of statistic BMI with regards to the a measurement on the woman’s body that does generally change dramatically by muscle mass in athletic women and less so by body fat.
Percentages of men v women who are considered overweight given their BMI despite being considered physically active?
For further study, we want to evaluate the consistency of the BMI statistic between men and women to determine if the flaws we find in BMI measurements in women carry over to men body measurement statistics in men as well.
A categorical vector, 1 if the respondent is male, 0 if female. It is also the variable we held constant by only using female data.
Differences in body fat due to gender change BMI ratings
A numerical vector, respondent's biiliac diameter (pelvic breadth) in centimeters.
The pelvic breadth of a woman correlates to the weight of a woman but not necessarily the physical fitness or health
A numerical vector, respondent's waist girth in centimeters, measured at the narrowest part of torso below the rib cage as average of contracted and relaxed position.
The waist girth of a woman usually is a good measurement of women’s weight and overall health given that there is not a large opportunity to build up muscle mass there
A numerical vector, respondent's thigh girth in centimeters, measured below gluteal fold as the average of right and left girths.
The girth of a woman’s thigh is correlated to muscle mass and strength of quadriceps which is generally a healthy trait yet raises the BMI of the individual.
This is one of the categorical variables that we made and tested in our hypothesis tests. A BMI greater than 25 is categorized as overweight, and under 25 is not overweight.
We are trying to prove that inaccuracy of the measurement BMI for physically active people because of the inability of the scale to take into account fat and muscle mass. Given that muscle weighs more per area that fat, active individuals with larger amounts of muscle mass and likely to be given a higher BMI and be categorized into overweight and obese categories despite their physical fitness and general health. This is especially true for women in that physically active women have a greater tendency to have larger thigh muscles which would greatly increase their BMI calculation.
Important Data: Calculating BMI
BMI equation:
wgt/((hgt)^2)
hgt: A numerical vector, respondent’s height in centimeters → converted to meters
wgt: A numerical vector, respondent’s weight in kilograms.
BMI Categories:
Underweight = <18.5
Normal weight = 18.5–24.9
Overweight = 25–29.9
Obesity = BMI of 30 or greater
https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=443&height=299&randomizer=-1460483758
Vars Relationship:
sex= f; and relationship between pelvic girth and bmi
Description:
This data shows that there is a variability in the BMI of women with the same pelvic girth demonstrating that there may not be a strong correlation between the variables. There is a slight upward trend that we would expect when taking into consideration the outliers which shows that a woman who has a larger pelvic bone (due to genetics) despite being healthy, still has a higher likelihood of being classified as overweight.
https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=446&height=299&randomizer=1272110989
Vars Relationship:
sex= f; and relationship between waist girth and bmi
Description:
As the waist girth increases so does the BMI which shows the trend that an increase in width of waist ( a typically fat prone area on women) does lead to an increase in BMI.
https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=443&height=299&randomizer=1840049894
Vars Relationship:
sex = f; and relationship between thigh girth and bmi
Description:
This data shows a strong correlation between the variables with an increase in thigh girth also contributing to a higher BMI. Given the assumption that these women are healthy, thigh girth growth is mostly due to an increase in muscle mass which should not necessarily be correlated with being considered less healthy or having a higher BMI yet the data does show a correlation with an increase in BMI.
## # A tibble: 2 x 2
## bmi_over mean_bii.di
## <chr> <dbl>
## 1 not overweight 27.4
## 2 overweight 28.9
## # A tibble: 2 x 2
## bmi_over mean_wai.gi
## <chr> <dbl>
## 1 not overweight 72.2
## 2 overweight 88.3
## # A tibble: 2 x 2
## bmi_over mean_thi.gi
## <chr> <dbl>
## 1 not overweight 55.2
## 2 overweight 60.8
## # A tibble: 2 x 2
## bmi_over conditions
## <chr> <int>
## 1 not overweight 357
## 2 overweight 150
Explanation:
## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_not overweight = 357, y_bar_not overweight = 27.3832, s_not overweight = 2.1377
## n_overweight = 150, y_bar_overweight = 28.8933, s_overweight = 1.9989
## H0: mu_not overweight = mu_overweight
## HA: mu_not overweight != mu_overweight
## t = -7.6043, df = 149
## p_value = < 0.0001
## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_not overweight = 357, y_bar_not overweight = 72.2269, s_not overweight = 7.9122
## n_overweight = 150, y_bar_overweight = 88.2907, s_overweight = 8.9088
## H0: mu_not overweight = mu_overweight
## HA: mu_not overweight != mu_overweight
## t = -19.1389, df = 149
## p_value = < 0.0001
## Response variable: numerical
## Explanatory variable: categorical (2 levels)
## n_not overweight = 357, y_bar_not overweight = 55.191, s_not overweight = 3.3025
## n_overweight = 150, y_bar_overweight = 60.818, s_overweight = 4.364
## H0: mu_not overweight = mu_overweight
## HA: mu_not overweight != mu_overweight
## t = -14.1779, df = 149
## p_value = < 0.0001
When creating our Multiple Regression Model, we removed “height” and “weight” variables from consideration because it appeared counterintuitive to include these factors–which are used to calculate BMI–from consideration of what factors infuence BMI. From there, we found that backwards modelling caused the adjusted R-squared value to decrease whenever a variable was removed, so the best linear regression model includes all of the original variables. Conceptually, this makes sense bc BMI is a very sensitive calculation and any additional data makes it more accurate
## Warning in data(bmif): data set 'bmif' not found
m_all <- lm(bmi ~ bia.di + bii.di + bit.di + che.di + elb.di + wri.di + kne.di + ank.di + sho.gi + che.gi + wai.gi + nav.gi + hip.gi + thi.gi + bic.gi + for.gi + kne.gi + cal.gi + wri.gi + che.de, data = bmif)
summary(m_all)##
## Call:
## lm(formula = bmi ~ bia.di + bii.di + bit.di + che.di + elb.di +
## wri.di + kne.di + ank.di + sho.gi + che.gi + wai.gi + nav.gi +
## hip.gi + thi.gi + bic.gi + for.gi + kne.gi + cal.gi + wri.gi +
## che.de, data = bmif)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6624 -0.7392 -0.0393 0.7245 4.0551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.6502651 1.2183026 -4.638 4.53e-06 ***
## bia.di -0.1833430 0.0347547 -5.275 2.00e-07 ***
## bii.di -0.0845945 0.0351749 -2.405 0.016547 *
## bit.di -0.0686560 0.0503838 -1.363 0.173621
## che.di 0.0309861 0.0455375 0.680 0.496542
## elb.di -0.2625983 0.1015438 -2.586 0.009998 **
## wri.di -0.0126985 0.1241563 -0.102 0.918578
## kne.di 0.3677408 0.0754217 4.876 1.47e-06 ***
## ank.di -0.2843622 0.0820444 -3.466 0.000575 ***
## sho.gi -0.0003324 0.0172157 -0.019 0.984605
## che.gi 0.0705312 0.0212135 3.325 0.000952 ***
## wai.gi 0.1136994 0.0136685 8.318 9.05e-16 ***
## nav.gi 0.0324945 0.0129724 2.505 0.012575 *
## hip.gi 0.0745908 0.0263443 2.831 0.004827 **
## thi.gi 0.1348821 0.0282729 4.771 2.43e-06 ***
## bic.gi 0.0569426 0.0459191 1.240 0.215551
## for.gi 0.2101902 0.0753335 2.790 0.005476 **
## kne.gi -0.0797466 0.0414291 -1.925 0.054826 .
## cal.gi 0.1953225 0.0352945 5.534 5.12e-08 ***
## wri.gi -0.1965782 0.1117607 -1.759 0.079220 .
## che.de -0.0312200 0.0383934 -0.813 0.416524
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.171 on 486 degrees of freedom
## Multiple R-squared: 0.8772, Adjusted R-squared: 0.8721
## F-statistic: 173.5 on 20 and 486 DF, p-value: < 2.2e-16
## (Intercept) bia.di bii.di bit.di che.di
## -5.6502650557 -0.1833430389 -0.0845945281 -0.0686560331 0.0309861079
## elb.di wri.di kne.di ank.di sho.gi
## -0.2625982845 -0.0126985221 0.3677407889 -0.2843621592 -0.0003323548
## che.gi wai.gi nav.gi hip.gi thi.gi
## 0.0705311659 0.1136993843 0.0324944847 0.0745908106 0.1348821144
## bic.gi for.gi kne.gi cal.gi wri.gi
## 0.0569425657 0.2101902262 -0.0797465708 0.1953224615 -0.1965781982
## che.de
## -0.0312200081
All of the conditions for linear regression are met: Linearity: The residuals plots does not vary much around the line and therefore the correlation line that we created is not a good predictor of actual BMI and m_full is a good predictor of BMI Nearly Normal Residuals: The normal probability plot is centered around 0 Constant Variability: based on the residuals vs. fitted plot, the constant variability condition appears to be met because as x varies the variability does not vary substantially. Each variable is linearly related to the outcome: the residuals are randomly scattered around 0, so this condition is met
qplot(x = .fitted, y = .resid, data = m_all) +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")## Warning: `stat` is deprecated
https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=446&height=299&randomizer=1772254052
https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=446&height=299&randomizer=-2128913547
RStudio created the same best-fit model as we did: one that includes all variables that contribute to BMI (excluding weight and height)
bmif.data <- bmif %>% select(-c('wgt','hgt', 'htmeters', 'bmi_over'))
n.obs <- dim(bmif.data)[1]
train.index <- sample(1:n.obs,floor(.8*n.obs),replace=FALSE)
bmif.train <- bmif.data[train.index,]
bmif.test <- bmif.data[-train.index,]
sel.data <- bmif.train
m0 <- lm(bmi ~ ., data=sel.data)
summary(m0)##
## Call:
## lm(formula = bmi ~ ., data = sel.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7138 -0.7504 0.0154 0.7147 3.8772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.840368 1.463351 -6.041 3.64e-09 ***
## bia.di -0.138248 0.040310 -3.430 0.00067 ***
## bii.di -0.085033 0.038978 -2.182 0.02975 *
## bit.di -0.107745 0.056698 -1.900 0.05815 .
## che.de -0.043746 0.043490 -1.006 0.31511
## che.di 0.020517 0.051950 0.395 0.69311
## elb.di -0.164255 0.113702 -1.445 0.14939
## wri.di -0.153904 0.137758 -1.117 0.26461
## kne.di 0.415292 0.084381 4.922 1.28e-06 ***
## ank.di -0.147536 0.094164 -1.567 0.11799
## sho.gi 0.008842 0.019440 0.455 0.64948
## che.gi 0.062484 0.023941 2.610 0.00941 **
## wai.gi 0.146250 0.016955 8.626 < 2e-16 ***
## nav.gi 0.009049 0.015800 0.573 0.56720
## hip.gi 0.070091 0.029785 2.353 0.01912 *
## thi.gi 0.108844 0.033110 3.287 0.00111 **
## bic.gi 0.089441 0.050676 1.765 0.07837 .
## for.gi 0.237720 0.084470 2.814 0.00514 **
## kne.gi -0.080924 0.047155 -1.716 0.08695 .
## cal.gi 0.176656 0.042109 4.195 3.39e-05 ***
## ank.gi -0.027815 0.061941 -0.449 0.65365
## wri.gi -0.075842 0.130461 -0.581 0.56136
## age 0.003859 0.007642 0.505 0.61387
## sexm -1.454909 0.317857 -4.577 6.39e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.164 on 381 degrees of freedom
## Multiple R-squared: 0.8855, Adjusted R-squared: 0.8786
## F-statistic: 128.2 on 23 and 381 DF, p-value: < 2.2e-16
single.step.backwards <- function(data,response){
resp.indx <- which(names(data)==response)
y <- data[,resp.indx]
X <- data[,-resp.indx]
n.pred <- dim(X)[2]
if(n.pred > 1){
for(i in 1:n.pred){
print(paste0("Variable ", names(X)[i]," removed: Adjusted R-squared = ",
round(summary(lm(y~.,data=as.data.frame(X[,-i])))$adj.r.squared,5)))
}
}
else{
print("Model only contains one variable.")
}
}Our Root Mean Square Error (RMSE) is 1.091793, which means that our model would calculate a new BMI that would be off by 1.120032 on average. Our model calculated the BMI for a random sample as 24.97022, with a lower limit of 22.63397 and an upper limit of 27.30646. Our model overestimated the real BMI by 3.81851 and our prediction interval did not include the real value of 21.0918.
##
## Call:
## lm(formula = bmi ~ ., data = bmif.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7138 -0.7504 0.0154 0.7147 3.8772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.840368 1.463351 -6.041 3.64e-09 ***
## bia.di -0.138248 0.040310 -3.430 0.00067 ***
## bii.di -0.085033 0.038978 -2.182 0.02975 *
## bit.di -0.107745 0.056698 -1.900 0.05815 .
## che.de -0.043746 0.043490 -1.006 0.31511
## che.di 0.020517 0.051950 0.395 0.69311
## elb.di -0.164255 0.113702 -1.445 0.14939
## wri.di -0.153904 0.137758 -1.117 0.26461
## kne.di 0.415292 0.084381 4.922 1.28e-06 ***
## ank.di -0.147536 0.094164 -1.567 0.11799
## sho.gi 0.008842 0.019440 0.455 0.64948
## che.gi 0.062484 0.023941 2.610 0.00941 **
## wai.gi 0.146250 0.016955 8.626 < 2e-16 ***
## nav.gi 0.009049 0.015800 0.573 0.56720
## hip.gi 0.070091 0.029785 2.353 0.01912 *
## thi.gi 0.108844 0.033110 3.287 0.00111 **
## bic.gi 0.089441 0.050676 1.765 0.07837 .
## for.gi 0.237720 0.084470 2.814 0.00514 **
## kne.gi -0.080924 0.047155 -1.716 0.08695 .
## cal.gi 0.176656 0.042109 4.195 3.39e-05 ***
## ank.gi -0.027815 0.061941 -0.449 0.65365
## wri.gi -0.075842 0.130461 -0.581 0.56136
## age 0.003859 0.007642 0.505 0.61387
## sexm -1.454909 0.317857 -4.577 6.39e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.164 on 381 degrees of freedom
## Multiple R-squared: 0.8855, Adjusted R-squared: 0.8786
## F-statistic: 128.2 on 23 and 381 DF, p-value: < 2.2e-16
qplot(x = .fitted, y = .resid, data = model.best) +
geom_hline(yintercept = 0, linetype = "dashed") +
xlab("Fitted values") +
ylab("Residuals")## Warning: `stat` is deprecated
predictions.test <- predict(model.best,bmif.test)
y <- bmif.test$bmi
mse <- mean((y-predictions.test)^2)
(rmse <- sqrt(mse))## [1] 1.140995
## # A tibble: 1 x 24
## bia.di bii.di bit.di che.de che.di elb.di wri.di kne.di ank.di sho.gi
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 44.3 29.9 34 18.4 28.2 13.9 11.2 20.9 15 104.
## # ... with 14 more variables: che.gi <dbl>, wai.gi <dbl>, nav.gi <dbl>,
## # hip.gi <dbl>, thi.gi <dbl>, bic.gi <dbl>, for.gi <dbl>, kne.gi <dbl>,
## # cal.gi <dbl>, ank.gi <dbl>, wri.gi <dbl>, age <int>, sex <fct>,
## # bmi <dbl>
## fit lwr upr
## 1 21.38582 19.00634 23.76531
RMSE: 1.091793
Conditions Checks: https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=443&height=299&randomizer=1151768676 https://labs-az-02.oit.duke.edu:30623/graphics/plot.png?width=443&height=299&randomizer=-1781299405
Test: fit lwr upr 1 24.97022 22.63397 27.30646 >