Setup

Part 1: Data

What is the data?
  • the Behavioral Risk Factor Surveillance System is a survey collecting data on health characteristics of the non-institutionalized adult population in the US & some of its Territories
  • each row contains the survey responses of a participant in the survey in 2013
How was the data collected?
  • households were selected for participation using Random Digit Dialing in order to create a randomized sample
  • an adult household member at this number is then randomly selected to participate in the survey over the phone
Is the data generalizable?
  • the data is generalizable to the participating US States & Territories
  • the data is generalizable to adults over 18 years old who are not institutionalized
  • the sample does not include households outside of the US States and selected territories, and also does not contain any person under 18 years old or who is institutionalized, so cannot be generalizated to these populations
Can the data imply causality?
  • the data collected in the survey is not experimental, meaning there was no control
  • for example, if you were comparing respondents in Alabama to respondents in California, there would be many other potential variables aside from state of residence that could explain a difference in the metric of interest
  • because the data is observational, it cannot imply causality
  • however, it can show correlation

Part 2: Research questions

Research quesion 1: What is the relationship between perceived physical health and mental health? To answer this question, I will look at the following variables:
genhlth : General Health rated Poor –> Excellent
menthlth : Number of Days (out of 31) Mental Health Not Good

Research quesion 2: Is number of not-good mental health days different by gender? Is it different by state? To answer this question, I will look at the following variables:
sex : Respondent’s sex
X_state : Respondent’s state where they were surveyed
menthlth : Number of Days (out of 31) Mental Health Not Good

Research quesion 3: Finally, I will explore the relationship between bad mental health days, gender and reported general health? Are females more likely to report good general health despite having more bad mental health days?


Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1:

##            
##                   high        low     medium
##   Excellent 0.06467717 0.19821348 0.11355926
##   Very good 0.18121018 0.35110083 0.31544238
##   Good      0.29034201 0.30806971 0.33469294
##   Fair      0.27195791 0.10936918 0.17676209
##   Poor      0.19181274 0.03324680 0.05954332

Looking at the frequency of responses to ratings of a participants’ general health stratified by low, medium and high number of “mental health not good” days, it seems as though those with a low number of bad mental health days rate themselves as having excellent general health most frequently of the 3 groups, and rate themselves as having poor general health the least frequently.
The opposite is true for participants in the high bad mental health days group.



You can see in the stacked bar chart above that particpants who report the most frequent bad mental health days also report most frequently report poor general health.
Conversly, participants who reported the least bad mental health days reported the best overall general health.
It seems as though reported mental health is correlated with reported general health.

Research quesion 2:

## [1] 198066      1
## [1] 285079      1



The distributions look pretty similar, but I also want to compare the average # of bad mental health days in a month for each group:

## # A tibble: 2 x 5
##   sex    mean_days median_days sd_days      n
##   <fct>      <dbl>       <dbl>   <dbl>  <int>
## 1 Male        2.78           0    7.13 198066
## 2 Female      3.78           0    8.05 285079



Though the distributions between male & female look fairly similar, the female subset is more heavily skewed right than male, so the female mean number of days is higher. However, median is 0 for both groups, because the median is a more robust statistic, meaning it is less affected by skewedness.



States with the highest average percent of bad mental health days appear to be:
- Alabama
- Kentucky
- West Virginia

## # A tibble: 10 x 4
##    X_state       mean_days     n pct_bad_menthlth_days
##    <fct>             <dbl> <int>                 <dbl>
##  1 Alabama            4.44  6334                  14.8
##  2 West Virginia      4.40  5798                  14.7
##  3 Kentucky           4.31 10717                  14.4
##  4 Oklahoma           4.01  8114                  13.4
##  5 Arkansas           3.92  5139                  13.1
##  6 Puerto Rico        3.91  5945                  13.0
##  7 Mississippi        3.88  7298                  12.9
##  8 Oregon             3.85  5861                  12.8
##  9 Tennessee          3.85  5703                  12.8
## 10 California         3.76 11410                  12.5



This table shows the states with the highest average percent of bad mental health days in a month.

Research quesion 3: Are females more likely to report good general health despite having more bad mental health days?

## [1] 481487      3



Though reported general health does vary by group (low, medium and high # of bad mental health days), it does not seem to vary much by gender. It seems that of participants who reported a high number of bad mental health days, participants were slightly less likely to report poor general health if they were female, but this difference may be due to sampling error & would require statistical testing to determine if this effect of gender is real.