Introduction

People get motivated to exercise for different reasons. However, most people expect to lose fat, gain muscles, or stay being healthy by maintaining own workout routine. Improving health and getting a good body shape might help increase their confidence and performance at work. This analysis seeks to examine how people’s body shape is related to their habit and lifestyle and answer the following questions.

Question 1: Is a person’s body shape related to their use of drugs?
Question 2: Is a person’s body shape related to their mental health?
Question 3: Is a person’s body shape related to their income?

Because people tend to concern about becoming overweight and fat, this study focuses on people with heavy weights and compares a group of obese people and that of heavy people who stay healthy.

Data

The National Survey on Drug Use and Health (NSDUH) offers dataset about health-related issues in the United States. In order to investigate the questions outlined above, I utilize their 2016 survey data for data analysis.

Variables

The NSDUH survey data does not contain respondents’ information about their body shape. Instead, they have weight, height, and Body mass index data in their demographic variable list. BMI is a useful measurement to assume a person’s body shape based on height and weight. A BMI of 25 to 29.9 is considered as “overweight”, and 30 and higher is considered as “obese.” However, it is often criticized that BMI is not the best way to determine healthiness of a person because the measurement does not consider a body fat at all. Some people can get a BMI higher than 30 because they gain muscles as a result of an intense workout. Therefore, the BMI variable alone should not be used as an indicator of body shape.

For the purpose of the analysis, I create a new variable BodyShape that represents a person’s body shape by combining two variables from the NSDUH variable list. The first variable is BMI2. While a BMI score does not directly indicate the person’s body shape as I explained above, it is still useful to determine whether a person’s body weight is too heavy or not in terms of physical health. Additionally, I use a health variable. It provides a respondent’s answer to a question about their overall health (Would you say your health in general is excellent, very good, good, fair, or poor?) and is measured in five-point scale from 1 (=Excellent) to 5 (=Poor). By putting them together, I define body shape as follows. If a person has a BMI score greater than 30 and describes their health as “Fair” or “Poor”, they are coded as “Obese.” If a person has a BMI score greater than 30 but still believe their health is “Excellent” or “Very Good”, they are coded as “Muscular.”

The NSDUH survey data has ten different variables about whether a person uses a particular drug in the past month, mrjmon(marijuana), crkmon(crack), cocmon(cocaine), hermon(heroin), hallucmon(hallucinogens), inhalmon(inhalants), methammon(methamphetamine), pnrnmmon(pain relievers), trqnmmon(tranquilizers), and stmnmmon(stimulants). Because I am interested in repondants’ general use of drugs, I build a new variable DrugUse to represent the sum total of values in these ten variables. If a person only used cocaine in the past month, the score is one. If a person consumed all of ten different drugs in the past month, it is coded as ten.

As for mental health, I make a new variable MentalHealth by combining the following six variables about psychological distress: DSTNRV30 (“HOW OFTEN FELT NERVOUS PAST 30 DAYS”); DSTHOP30 (“HOW OFTEN FELT HOPELESS PAST 30 DAYS”); DSTRST30 (“HOW OFTEN FELT RESTLESS/FIDGETY PAST 30 DAYS”), DSTEFF30 (“HOW OFTEN FELT SAD NOTHING COULD CHEER YOU UP”); DSTNGD30 (“HOW OFTEN FELT EVERYTHING EFFORT PAST 30 DAYS”). The compound variable is often called K6-scale and widely used in teh psychological research.

IncomeLevel is coded based on IRPINC3, which is a variable about a respondent’s personal income in 2016. IRPINC3 consists of six income groups ranging from “Less than $10,000” to “$75,000 or more.” The Pew Research Center defines a household income lower than the two-thirds of the median income as low-income class, that higher than the double of it as high-income class, and that between the two as middle-income class. While the definition is based on household income and its average score, the measurement is still useful in categorizing respondents into three different income groups by personal income. According to the U.S. census bureau, the average personal income in 2016 is $31,099. The double is $62,198 ,and the two-thirds is $20,733. However, IRPINC3 is not a continuous variable and, therefore, I define respondents whose personal income is lower than $20,000 (1 and 2 in IRPINC3) as a low-income class, those whose income is higher than $50,000 (6 and 7 in IRPINC3) as a high-income class, and the others (3,4, and 5 in IRPINC3) as a middle-income class.

Finally, I exclude samples under eighteen years old from the scope of the analysis by using CATAG3. They are still not fully grown and also do not care about their health as adults do.

Validity of the Bodyshape variable

Some people might think it is not reasonable to categorize people with over 30 in BMI as “Muscular” just because they believe they are healthy. It is possible that some respondents hide their real health and pretend to be healthy even if they are not. Moreover, it can be argued that because the question is about overall health, respondents can tell their mental health instead of what they think about their physical appearance.

However, I think Bodyshape is a valid variable for the following reasons. First and foremost, to get big enough to surpass 30 in BMI is not easy for most people. If you are 5 feet 9 inches, your body weight needs to be over 210 pounds to get 30 in BMI. Such a huge body mass enables you to have a distinguished body shape and encounter many opportunities to think about your body. If a person has an exercise routine and builds some muscles from it, they can confidently say “Excellent” or “Very good” about their health. On the other hand, a person who has such high BMI score only because of fat will not even try to pretend to be in a good shape and tell a lie about their health when their health status is too obvious to everyone because of their appearance. Moreover, people considered overweight and obese tended to be more conscious about their physical health than mental health.

Analysis Plan

First and foremost, I look at how the average score of BMI2 varies based on BodyShape.

Secondly, in order to answer the first question, I look at how the average score of DrugUse varies based on BodyShape by taking the following measures.

Calculate the mean and standard deviation of DrugUse and look at the differences between “Muscular” people and “Obese” people
Visualize the distribution as a bar chart and histogram
Plot a sampling distribution to check whether a t-test on the difference is likely to reach statistical significance.
Run a t-test to determine whether the difference in the average MentalHealth between the two groups is statistically significant or not

Thirdly, in order to answer the second question, I look at how the average score of MentalHealth varies based on BodyShape by taking the following measures.

Calculate the mean and standard deviation of MentalHealth and look at the differences between “Muscular” people and “Obese” people
Visualize the distribution as a bar chart and histogram
Plot a sampling distribution to check whether a t-test on the difference is likely to reach statistical significance.
Run a t-test to determine whether the difference in the average MentalHealth between the two groups is statistically significant or not

Finally, in order to answer the third question, I look at how Incomelevel varies based on BodyShape by taking the following measures. * Create a crosstab to investigate how people are distributed across categories * Run a chi-squared test to determine whether DeathPenalty and Abortion are independent from one another, or are influencing one another

Data Preparation

#install.packages("readr")
#install.packages("dplyr")
#install.packages("ggplot2")

library(readr) 
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
library(ggplot2)

Data <- read_csv("~/Tsukasa/NY/CUNY/Class/Spring 2019/Programming for Social Research/SOC333_NSDUH_2016(1).csv")

## Parsed with column specification:
## cols(
##   .default = col_double()
## )

## See spec(...) for full column specifications.

NewData <- Data %>%
  #select samples over 17 years old
  filter(CATAG3 > 1) %>%
  #change variables' name 
  rename("Health" = health, 
         "BMI" = BMI2,
         "Income"= IRPINC3) %>% 
  #recode and create variables
  mutate(Health = ifelse(Health==5,"Not well",
                  ifelse(Health==4,"Not well",
                  ifelse(Health==3,"Moderately well",
                  ifelse(Health==2, "Well", 
                  ifelse(Health==1, "Well", NA))))),
        Nervous = ifelse(DSTNRV30 ==1,4,
                   ifelse(DSTNRV30 ==2,3,
                   ifelse(DSTNRV30 ==3,2,
                   ifelse(DSTNRV30 ==4,1,
                   ifelse(DSTNRV30 ==5,0,NA))))),
         Hopeless = ifelse(DSTHOP30 ==1,4,
                   ifelse(DSTHOP30 ==2,3,
                   ifelse(DSTHOP30 ==3,2,
                   ifelse(DSTHOP30 ==4,1,
                   ifelse(DSTHOP30 ==5,0,NA))))),
         Restless = ifelse(DSTRST30 ==1,4,
                   ifelse(DSTRST30 ==2,3,
                   ifelse(DSTRST30 ==3,2,
                   ifelse(DSTRST30 ==4,1,
                   ifelse(DSTRST30 ==5,0,NA))))),
         Effort  = ifelse(DSTEFF30 ==1,4,
                   ifelse(DSTEFF30 ==2,3,
                   ifelse(DSTEFF30 ==3,2,
                   ifelse(DSTEFF30 ==4,1,
                   ifelse(DSTEFF30 ==5,0,NA))))),
         Sad  = ifelse(DSTCHR30 ==1,4,
                   ifelse(DSTCHR30 ==2,3,
                   ifelse(DSTCHR30 ==3,2,
                   ifelse(DSTCHR30 ==4,1,
                   ifelse(DSTCHR30 ==5,0,NA))))),
         Worthless = ifelse(DSTNGD30 ==1,4,
                   ifelse(DSTNGD30 ==2,3,
                   ifelse(DSTNGD30 ==3,2,
                   ifelse(DSTNGD30 ==4,1,
                   ifelse(DSTNGD30 ==5,0,NA))))),
        DrugUse = mrjmon + crkmon + cocmon + hermon + hallucmon + inhalmon + methammon + 
                  pnrnmmon + 
                   trqnmmon + stmnmmon,
        MentalHealth = Nervous + Hopeless + Restless + Effort + Sad + Worthless,
        BodyShape = ifelse( Health == "Well" & BMI > 30, "Muscular", 
                    ifelse( Health == "Not well" & BMI > 30, "Obese", NA)),
        IncomeLevel = ifelse(Income < 3, "Low Income",
                      ifelse(Income > 5, "High Income", 
                      ifelse(Income == 3 | Income == 4 | Income == 5, "Middle Income", NA))))

BMI score by BodyShape

The average BMI score of muscular respondents is 34.7, and that of obese respondents is 37.
BMI of obese respondents looks a bit more scattered than that of muscular respondents.

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   summarize(BMIMean=mean(BMI, na.rm=TRUE), BMISD=sd(BMI, na.rm=TRUE)) %>%
   kable()

BodyShape	BMIMean	BMISD
Muscular	34.73374	4.362656
Obese	37.00643	5.437219

Q1. Drug Use

Do muscular people and obese people differ in use of drugs?

Average Monthly Frequency of Drug Use (Mean and Standard Deviation)

The average DrugUse of muscular respondents is about 0.12 while that of obese people is about 0.18.
While both groups of people use some drugs far less than once in the past 30 days on the average, obese people consume them more frequently than muscular people.

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   summarize(DrugUseMean=mean(DrugUse, na.rm=TRUE), DrugUseSD=sd(DrugUse, na.rm=TRUE)) %>%
   kable

BodyShape	DrugUseMean	DrugUseSD
Muscular	0.1223906	0.4200767
Obese	0.1785252	0.5534777

Average Monthly Frequency of Drug Use (Bar Chart)

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   summarize(DrugUseMean=mean(DrugUse, na.rm=TRUE), DrugUseSD=sd(DrugUse, na.rm=TRUE)) %>%
  ggplot()+geom_col(aes(x=BodyShape, y=DrugUseMean, fill=BodyShape))+geom_segment(aes(x=BodyShape, xend=BodyShape, y=DrugUseMean+DrugUseSD, yend=DrugUseMean-DrugUseSD))+geom_label(aes(x=BodyShape, y=DrugUseMean, label=DrugUseMean))

Average Monthly Frequency of Drug Use (Population Distribution)

Visualizing the distribution of DrugUse allows us to how people in each group respond to the questions on use of drugs.

In both groups, the most common answer is that they did not use any drugs in the past month.
Disproportionately, the percentage of people who used some drugs within the obese group is higher than the muscular group.

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   ggplot()+geom_histogram(aes(x=DrugUse,fill=BodyShape))+facet_wrap(~BodyShape)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Average Monthly Frequency of Drug Use (Sampling Distribution)

Preparing Sampling Distribution Data

RepMuscular <- NewData %>%
  filter(BodyShape=="Muscular")
MSample <- replicate(10000, mean(sample(RepMuscular$DrugUse, 50), na.rm = TRUE)) %>%
  data.frame() %>%
  rename("mean"=1)

RepObese <- NewData %>%
  filter(BodyShape=="Obese")
OSample <- replicate(10000, mean(sample(RepObese$DrugUse, 50), na.rm = TRUE)) %>%
  data.frame() %>%
  rename("mean"=1)

Plotting Histogram of Sampling Distribution

The sampling distribution of DrugUse of the obese group sits to the right of that of the muscular group. The result suggests that means of DrugUse of the two groups might be different from a statistical standpoint.

ggplot()+geom_histogram(data=MSample, aes(x=mean), fill="#FF0000"#red 
                        )+geom_histogram(data=OSample, aes(x=mean), fill="#0000FF" #blue
                        )

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

T-test: Comparing Average Monthly Frequency of Drug Use by BodyShape

A p-value <.001 in the test result below indicates that these two groups is statistically different in means of DrugUse.

t.test(DrugUse~BodyShape, data=NewData)

## 
##  Welch Two Sample t-test
## 
## data:  DrugUse by BodyShape
## t = -4.413, df = 3410.5, p-value = 1.051e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.08107490 -0.03119441
## sample estimates:
## mean in group Muscular    mean in group Obese 
##              0.1223906              0.1785252

Q2. Mental Health

Do muscular people and obese people differ in mental health?

Average K6-MentalHealth score (Mean and Standard Deviation)

The average MentalHealth of muscular respondents is about 3.67 while that of obese people is approximately 6.48.

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   summarize(MentalHealthMean=mean(MentalHealth, na.rm=TRUE), MentalHealthSD=sd(MentalHealth, na.rm=TRUE)) %>%
   kable

BodyShape	MentalHealthMean	MentalHealthSD
Muscular	3.671859	4.230010
Obese	6.482322	6.024623

Average K6-MentalHealth score (Bar Chart)

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   summarize(MentalHealthMean=mean(MentalHealth, na.rm=TRUE), MentalHealthSD=sd(MentalHealth, na.rm=TRUE)) %>%
  ggplot()+geom_col(aes(x=BodyShape, y=MentalHealthMean, fill=BodyShape))+geom_segment(aes(x=BodyShape, xend=BodyShape, y=MentalHealthMean+MentalHealthSD, yend=MentalHealthMean-MentalHealthSD))+geom_label(aes(x=BodyShape, y=MentalHealthMean, label=MentalHealthMean))

Average K6-MentalHealth score (Population Distribution)

Visualizing the distribution of MentalHealth allows us to how people in each group respond to the questions about psychological distress.

The number of respondents who did not experience mental distress is disproportionately higher within the muscular group than the obese group.

NewData%>%
   filter(!is.na(BodyShape)) %>%
   group_by(BodyShape) %>%
   ggplot()+geom_histogram(aes(x=MentalHealth,fill=BodyShape))+facet_wrap(~BodyShape)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 62 rows containing non-finite values (stat_bin).

Average K6-MentalHealth score (Sampling Distribution)

Preparing Sampling Distribution Data

RepMuscular <- NewData %>%
  filter(BodyShape=="Muscular")
MSample <- replicate(10000, mean(sample(RepMuscular$MentalHealth, 50), na.rm = TRUE)) %>%
  data.frame() %>%
  rename("mean"=1)

RepObese <- NewData %>%
  filter(BodyShape=="Obese")
OSample <- replicate(10000, mean(sample(RepObese$MentalHealth, 50), na.rm = TRUE)) %>%
  data.frame() %>%
  rename("mean"=1)

Plotting Histogram of Sampling Distribution

The sampling distribution of MentalHealth of the obese group sits to the right of that of the muscular group. The result suggests that means of MentalHealth of the two groups might be different from a statistical standpoint.

ggplot()+geom_histogram(data=MSample, aes(x=mean), fill="#FF0000"#red 
                        )+geom_histogram(data=OSample, aes(x=mean), fill="#0000FF"#blue
                        )

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

T-test: Comparing Average K6-MentalHealth score by BodyShape

A p-value <.001 in the test result below indicates that these two groups is statistically different in means of MentalHealth.

t.test(MentalHealth~BodyShape, data=NewData)

## 
##  Welch Two Sample t-test
## 
## data:  MentalHealth by BodyShape
## t = -20.458, df = 3204.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.079819 -2.541107
## sample estimates:
## mean in group Muscular    mean in group Obese 
##               3.671859               6.482322

Q3. Income Level

Do muscular people and obese people differ in their income level?

IncomeLevel by BodyShape (Crosstab)

Within both groups, the low income has the highest percentage, and the middle income comes to the second, and the high income comes the last.
The percentage of the low income within the muscular group is lower than that within the obese group, but the obese group is smaller than the mucular group both in the percentages of the middle and high income.

  table(NewData$IncomeLevel, NewData$BodyShape) %>%
  prop.table(2) %>%
  round(2) %>%
  kable()

	Muscular	Obese
High Income	0.22	0.08
Low Income	0.43	0.67
Middle Income	0.35	0.25

IncomeLevel by BodyShape (Bar Chart)

NewData%>%
  group_by(BodyShape,IncomeLevel)%>%
  summarize(n=n())%>%
  filter(!is.na(BodyShape))%>% 
  filter(!is.na(IncomeLevel))%>% 
  mutate(PercentOfSamples = n/sum(n))%>% 
  ggplot()+
  geom_col(aes(x=BodyShape, y=PercentOfSamples, fill=IncomeLevel))

IncomeLevel by BodyShape (Chi-Squared Test)

Null Hyposethsis

This is how many people should be in each category of response, if the variables are completely independent from one another.

chisq.test(NewData$BodyShape, NewData$IncomeLevel)[7] %>%
  kable()

	High Income	Low Income	Middle Income
Muscular	1092.4882	2935.837	1911.6745
Obese	426.5118	1146.163	746.3255

Actual Observations

This is how many people are actually in each category of response.

chisq.test(NewData$BodyShape, NewData$IncomeLevel)[6] %>%
  kable()

	High Income	Low Income	Middle Income
Muscular	1332	2534	2074
Obese	187	1548	584

Chi-Squared Test

A p-value <.001 in the result below indicates that there is a statistically significant relationship between these two variables.

chisq.test(NewData$BodyShape, NewData$IncomeLevel)

## 
##  Pearson's Chi-squared test
## 
## data:  NewData$BodyShape and NewData$IncomeLevel
## X-squared = 431.98, df = 2, p-value < 2.2e-16

Conclusion

Question 1

Is a person’s body shape related to their use of drugs?

The average monthly frequency of use of some drugs is 0.12 in the group of muscular respondents.
The average monthly frequency of use of some drugs is 0.18 in the group of obese respondents.
A t-test confirms that the difference between the two groups in the average monthly frequency of drug use is statistically significant.
Obese people use some drugs more frequently than muscular people.

Question 2

Is a person’s body shape related to their mental health?

The average k6-score of mental health is 3.67 in the group of muscular respondents.
The average k6-score of mental health is 6.48 in the group of obese respondents.
A t-test confirms that the difference between the two groups in the average k6-score is statistically significant.
Obese people have more serious psychological distress than muscular people.

Question 3

Is a person’s body shape related to their income?

Forty-three percent of muscular respondents and sixty-seven percent of obese respondents are classified as low income.
Thirty-five percent of muscular respondents and twenty-five percent of obese respondents are classified as middle income.
Twenty-two percent of muscular respondents and eight percent of obese respondents are categorized as high income.
A chi-squared test confirms that whether a person has a muscular or obese body shape and their income level are not independent of one another.

The analytical results suggest that compared to muscular people, obese people tend to consume more drugs, cause a mental illness more frequently, and gain smaller income. While this analysis does not identify the causal direction of the three relationships, these findings still show that maintaining a muscular body shape can benefit the person in many ways.

NSDUH Analysis

Tsukasa Inoue

Introduction

Data

Variables

Validity of the Bodyshape variable

Analysis Plan

Data Preparation

BMI score by BodyShape

Q1. Drug Use

Average Monthly Frequency of Drug Use (Mean and Standard Deviation)

Average Monthly Frequency of Drug Use (Bar Chart)

Average Monthly Frequency of Drug Use (Population Distribution)

Average Monthly Frequency of Drug Use (Sampling Distribution)

Preparing Sampling Distribution Data

Plotting Histogram of Sampling Distribution

T-test: Comparing Average Monthly Frequency of Drug Use by BodyShape

Q2. Mental Health

Average K6-MentalHealth score (Mean and Standard Deviation)

Average K6-MentalHealth score (Bar Chart)

Average K6-MentalHealth score (Population Distribution)

Average K6-MentalHealth score (Sampling Distribution)

Preparing Sampling Distribution Data

Plotting Histogram of Sampling Distribution

T-test: Comparing Average K6-MentalHealth score by BodyShape

Q3. Income Level

IncomeLevel by BodyShape (Crosstab)

IncomeLevel by BodyShape (Bar Chart)

IncomeLevel by BodyShape (Chi-Squared Test)

Null Hyposethsis

Actual Observations

Chi-Squared Test

Conclusion

Question 1

Question 2

Question 3