People get motivated to exercise for different reasons. However, most people expect to lose fat, gain muscles, or stay being healthy by maintaining own workout routine. Improving health and getting a good body shape might help increase their confidence and performance at work. This analysis seeks to examine how people’s body shape is related to their habit and lifestyle and answer the following questions.
Because people tend to concern about becoming overweight and fat, this study focuses on people with heavy weights and compares a group of obese people and that of heavy people who stay healthy.
The National Survey on Drug Use and Health (NSDUH) offers dataset about health-related issues in the United States. In order to investigate the questions outlined above, I utilize their 2016 survey data for data analysis.
The NSDUH survey data does not contain respondents’ information about their body shape. Instead, they have weight, height, and Body mass index data in their demographic variable list. BMI is a useful measurement to assume a person’s body shape based on height and weight. A BMI of 25 to 29.9 is considered as “overweight”, and 30 and higher is considered as “obese.” However, it is often criticized that BMI is not the best way to determine healthiness of a person because the measurement does not consider a body fat at all. Some people can get a BMI higher than 30 because they gain muscles as a result of an intense workout. Therefore, the BMI variable alone should not be used as an indicator of body shape.
For the purpose of the analysis, I create a new variable BodyShape that represents a person’s body shape by combining two variables from the NSDUH variable list. The first variable is BMI2. While a BMI score does not directly indicate the person’s body shape as I explained above, it is still useful to determine whether a person’s body weight is too heavy or not in terms of physical health. Additionally, I use a health variable. It provides a respondent’s answer to a question about their overall health (Would you say your health in general is excellent, very good, good, fair, or poor?) and is measured in five-point scale from 1 (=Excellent) to 5 (=Poor). By putting them together, I define body shape as follows. If a person has a BMI score greater than 30 and describes their health as “Fair” or “Poor”, they are coded as “Obese.” If a person has a BMI score greater than 30 but still believe their health is “Excellent” or “Very Good”, they are coded as “Muscular.”
The NSDUH survey data has ten different variables about whether a person uses a particular drug in the past month, mrjmon(marijuana), crkmon(crack), cocmon(cocaine), hermon(heroin), hallucmon(hallucinogens), inhalmon(inhalants), methammon(methamphetamine), pnrnmmon(pain relievers), trqnmmon(tranquilizers), and stmnmmon(stimulants). Because I am interested in repondants’ general use of drugs, I build a new variable DrugUse to represent the sum total of values in these ten variables. If a person only used cocaine in the past month, the score is one. If a person consumed all of ten different drugs in the past month, it is coded as ten.
As for mental health, I make a new variable MentalHealth by combining the following six variables about psychological distress: DSTNRV30 (“HOW OFTEN FELT NERVOUS PAST 30 DAYS”); DSTHOP30 (“HOW OFTEN FELT HOPELESS PAST 30 DAYS”); DSTRST30 (“HOW OFTEN FELT RESTLESS/FIDGETY PAST 30 DAYS”), DSTEFF30 (“HOW OFTEN FELT SAD NOTHING COULD CHEER YOU UP”); DSTNGD30 (“HOW OFTEN FELT EVERYTHING EFFORT PAST 30 DAYS”). The compound variable is often called K6-scale and widely used in teh psychological research.
IncomeLevel is coded based on IRPINC3, which is a variable about a respondent’s personal income in 2016. IRPINC3 consists of six income groups ranging from “Less than $10,000” to “$75,000 or more.” The Pew Research Center defines a household income lower than the two-thirds of the median income as low-income class, that higher than the double of it as high-income class, and that between the two as middle-income class. While the definition is based on household income and its average score, the measurement is still useful in categorizing respondents into three different income groups by personal income. According to the U.S. census bureau, the average personal income in 2016 is $31,099. The double is $62,198 ,and the two-thirds is $20,733. However, IRPINC3 is not a continuous variable and, therefore, I define respondents whose personal income is lower than $20,000 (1 and 2 in IRPINC3) as a low-income class, those whose income is higher than $50,000 (6 and 7 in IRPINC3) as a high-income class, and the others (3,4, and 5 in IRPINC3) as a middle-income class.
Finally, I exclude samples under eighteen years old from the scope of the analysis by using CATAG3. They are still not fully grown and also do not care about their health as adults do.
Some people might think it is not reasonable to categorize people with over 30 in BMI as “Muscular” just because they believe they are healthy. It is possible that some respondents hide their real health and pretend to be healthy even if they are not. Moreover, it can be argued that because the question is about overall health, respondents can tell their mental health instead of what they think about their physical appearance.
However, I think Bodyshape is a valid variable for the following reasons. First and foremost, to get big enough to surpass 30 in BMI is not easy for most people. If you are 5 feet 9 inches, your body weight needs to be over 210 pounds to get 30 in BMI. Such a huge body mass enables you to have a distinguished body shape and encounter many opportunities to think about your body. If a person has an exercise routine and builds some muscles from it, they can confidently say “Excellent” or “Very good” about their health. On the other hand, a person who has such high BMI score only because of fat will not even try to pretend to be in a good shape and tell a lie about their health when their health status is too obvious to everyone because of their appearance. Moreover, people considered overweight and obese tended to be more conscious about their physical health than mental health.
First and foremost, I look at how the average score of BMI2 varies based on BodyShape.
Secondly, in order to answer the first question, I look at how the average score of DrugUse varies based on BodyShape by taking the following measures.
Thirdly, in order to answer the second question, I look at how the average score of MentalHealth varies based on BodyShape by taking the following measures.
Finally, in order to answer the third question, I look at how Incomelevel varies based on BodyShape by taking the following measures. * Create a crosstab to investigate how people are distributed across categories * Run a chi-squared test to determine whether DeathPenalty and Abortion are independent from one another, or are influencing one another
#install.packages("readr")
#install.packages("dplyr")
#install.packages("ggplot2")
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(ggplot2)
Data <- read_csv("~/Tsukasa/NY/CUNY/Class/Spring 2019/Programming for Social Research/SOC333_NSDUH_2016(1).csv")
## Parsed with column specification:
## cols(
## .default = col_double()
## )
## See spec(...) for full column specifications.
NewData <- Data %>%
#select samples over 17 years old
filter(CATAG3 > 1) %>%
#change variables' name
rename("Health" = health,
"BMI" = BMI2,
"Income"= IRPINC3) %>%
#recode and create variables
mutate(Health = ifelse(Health==5,"Not well",
ifelse(Health==4,"Not well",
ifelse(Health==3,"Moderately well",
ifelse(Health==2, "Well",
ifelse(Health==1, "Well", NA))))),
Nervous = ifelse(DSTNRV30 ==1,4,
ifelse(DSTNRV30 ==2,3,
ifelse(DSTNRV30 ==3,2,
ifelse(DSTNRV30 ==4,1,
ifelse(DSTNRV30 ==5,0,NA))))),
Hopeless = ifelse(DSTHOP30 ==1,4,
ifelse(DSTHOP30 ==2,3,
ifelse(DSTHOP30 ==3,2,
ifelse(DSTHOP30 ==4,1,
ifelse(DSTHOP30 ==5,0,NA))))),
Restless = ifelse(DSTRST30 ==1,4,
ifelse(DSTRST30 ==2,3,
ifelse(DSTRST30 ==3,2,
ifelse(DSTRST30 ==4,1,
ifelse(DSTRST30 ==5,0,NA))))),
Effort = ifelse(DSTEFF30 ==1,4,
ifelse(DSTEFF30 ==2,3,
ifelse(DSTEFF30 ==3,2,
ifelse(DSTEFF30 ==4,1,
ifelse(DSTEFF30 ==5,0,NA))))),
Sad = ifelse(DSTCHR30 ==1,4,
ifelse(DSTCHR30 ==2,3,
ifelse(DSTCHR30 ==3,2,
ifelse(DSTCHR30 ==4,1,
ifelse(DSTCHR30 ==5,0,NA))))),
Worthless = ifelse(DSTNGD30 ==1,4,
ifelse(DSTNGD30 ==2,3,
ifelse(DSTNGD30 ==3,2,
ifelse(DSTNGD30 ==4,1,
ifelse(DSTNGD30 ==5,0,NA))))),
DrugUse = mrjmon + crkmon + cocmon + hermon + hallucmon + inhalmon + methammon +
pnrnmmon +
trqnmmon + stmnmmon,
MentalHealth = Nervous + Hopeless + Restless + Effort + Sad + Worthless,
BodyShape = ifelse( Health == "Well" & BMI > 30, "Muscular",
ifelse( Health == "Not well" & BMI > 30, "Obese", NA)),
IncomeLevel = ifelse(Income < 3, "Low Income",
ifelse(Income > 5, "High Income",
ifelse(Income == 3 | Income == 4 | Income == 5, "Middle Income", NA))))
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
summarize(BMIMean=mean(BMI, na.rm=TRUE), BMISD=sd(BMI, na.rm=TRUE)) %>%
kable()
| BodyShape | BMIMean | BMISD |
|---|---|---|
| Muscular | 34.73374 | 4.362656 |
| Obese | 37.00643 | 5.437219 |
Do muscular people and obese people differ in use of drugs?
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
summarize(DrugUseMean=mean(DrugUse, na.rm=TRUE), DrugUseSD=sd(DrugUse, na.rm=TRUE)) %>%
kable
| BodyShape | DrugUseMean | DrugUseSD |
|---|---|---|
| Muscular | 0.1223906 | 0.4200767 |
| Obese | 0.1785252 | 0.5534777 |
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
summarize(DrugUseMean=mean(DrugUse, na.rm=TRUE), DrugUseSD=sd(DrugUse, na.rm=TRUE)) %>%
ggplot()+geom_col(aes(x=BodyShape, y=DrugUseMean, fill=BodyShape))+geom_segment(aes(x=BodyShape, xend=BodyShape, y=DrugUseMean+DrugUseSD, yend=DrugUseMean-DrugUseSD))+geom_label(aes(x=BodyShape, y=DrugUseMean, label=DrugUseMean))
Visualizing the distribution of DrugUse allows us to how people in each group respond to the questions on use of drugs.
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
ggplot()+geom_histogram(aes(x=DrugUse,fill=BodyShape))+facet_wrap(~BodyShape)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
RepMuscular <- NewData %>%
filter(BodyShape=="Muscular")
MSample <- replicate(10000, mean(sample(RepMuscular$DrugUse, 50), na.rm = TRUE)) %>%
data.frame() %>%
rename("mean"=1)
RepObese <- NewData %>%
filter(BodyShape=="Obese")
OSample <- replicate(10000, mean(sample(RepObese$DrugUse, 50), na.rm = TRUE)) %>%
data.frame() %>%
rename("mean"=1)
The sampling distribution of DrugUse of the obese group sits to the right of that of the muscular group. The result suggests that means of DrugUse of the two groups might be different from a statistical standpoint.
ggplot()+geom_histogram(data=MSample, aes(x=mean), fill="#FF0000"#red
)+geom_histogram(data=OSample, aes(x=mean), fill="#0000FF" #blue
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A p-value <.001 in the test result below indicates that these two groups is statistically different in means of DrugUse.
t.test(DrugUse~BodyShape, data=NewData)
##
## Welch Two Sample t-test
##
## data: DrugUse by BodyShape
## t = -4.413, df = 3410.5, p-value = 1.051e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.08107490 -0.03119441
## sample estimates:
## mean in group Muscular mean in group Obese
## 0.1223906 0.1785252
Do muscular people and obese people differ in mental health?
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
summarize(MentalHealthMean=mean(MentalHealth, na.rm=TRUE), MentalHealthSD=sd(MentalHealth, na.rm=TRUE)) %>%
kable
| BodyShape | MentalHealthMean | MentalHealthSD |
|---|---|---|
| Muscular | 3.671859 | 4.230010 |
| Obese | 6.482322 | 6.024623 |
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
summarize(MentalHealthMean=mean(MentalHealth, na.rm=TRUE), MentalHealthSD=sd(MentalHealth, na.rm=TRUE)) %>%
ggplot()+geom_col(aes(x=BodyShape, y=MentalHealthMean, fill=BodyShape))+geom_segment(aes(x=BodyShape, xend=BodyShape, y=MentalHealthMean+MentalHealthSD, yend=MentalHealthMean-MentalHealthSD))+geom_label(aes(x=BodyShape, y=MentalHealthMean, label=MentalHealthMean))
Visualizing the distribution of MentalHealth allows us to how people in each group respond to the questions about psychological distress.
NewData%>%
filter(!is.na(BodyShape)) %>%
group_by(BodyShape) %>%
ggplot()+geom_histogram(aes(x=MentalHealth,fill=BodyShape))+facet_wrap(~BodyShape)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 62 rows containing non-finite values (stat_bin).
RepMuscular <- NewData %>%
filter(BodyShape=="Muscular")
MSample <- replicate(10000, mean(sample(RepMuscular$MentalHealth, 50), na.rm = TRUE)) %>%
data.frame() %>%
rename("mean"=1)
RepObese <- NewData %>%
filter(BodyShape=="Obese")
OSample <- replicate(10000, mean(sample(RepObese$MentalHealth, 50), na.rm = TRUE)) %>%
data.frame() %>%
rename("mean"=1)
The sampling distribution of MentalHealth of the obese group sits to the right of that of the muscular group. The result suggests that means of MentalHealth of the two groups might be different from a statistical standpoint.
ggplot()+geom_histogram(data=MSample, aes(x=mean), fill="#FF0000"#red
)+geom_histogram(data=OSample, aes(x=mean), fill="#0000FF"#blue
)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A p-value <.001 in the test result below indicates that these two groups is statistically different in means of MentalHealth.
t.test(MentalHealth~BodyShape, data=NewData)
##
## Welch Two Sample t-test
##
## data: MentalHealth by BodyShape
## t = -20.458, df = 3204.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.079819 -2.541107
## sample estimates:
## mean in group Muscular mean in group Obese
## 3.671859 6.482322
Do muscular people and obese people differ in their income level?
table(NewData$IncomeLevel, NewData$BodyShape) %>%
prop.table(2) %>%
round(2) %>%
kable()
| Muscular | Obese | |
|---|---|---|
| High Income | 0.22 | 0.08 |
| Low Income | 0.43 | 0.67 |
| Middle Income | 0.35 | 0.25 |
NewData%>%
group_by(BodyShape,IncomeLevel)%>%
summarize(n=n())%>%
filter(!is.na(BodyShape))%>%
filter(!is.na(IncomeLevel))%>%
mutate(PercentOfSamples = n/sum(n))%>%
ggplot()+
geom_col(aes(x=BodyShape, y=PercentOfSamples, fill=IncomeLevel))
This is how many people should be in each category of response, if the variables are completely independent from one another.
chisq.test(NewData$BodyShape, NewData$IncomeLevel)[7] %>%
kable()
|
This is how many people are actually in each category of response.
chisq.test(NewData$BodyShape, NewData$IncomeLevel)[6] %>%
kable()
|
A p-value <.001 in the result below indicates that there is a statistically significant relationship between these two variables.
chisq.test(NewData$BodyShape, NewData$IncomeLevel)
##
## Pearson's Chi-squared test
##
## data: NewData$BodyShape and NewData$IncomeLevel
## X-squared = 431.98, df = 2, p-value < 2.2e-16
Is a person’s body shape related to their use of drugs?
Is a person’s body shape related to their mental health?
Is a person’s body shape related to their income?
The analytical results suggest that compared to muscular people, obese people tend to consume more drugs, cause a mental illness more frequently, and gain smaller income. While this analysis does not identify the causal direction of the three relationships, these findings still show that maintaining a muscular body shape can benefit the person in many ways.