Author: Kristen Phan

Setup

This R markdown file performs exploratory data analysis (EDA) on a dataset called “Behavioral Risk Factor Surveillance System” (BRFSS) provided jointly by the CDC and sets it up for further analysis to answer 3 research questions outlined below.

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all of the states in the United States (US) and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division of Population Health at the National Center for Chronic Disease Prevention and Health Promotion. BRFSS is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US. The BRFSS was initiated in 1984, with 15 states collecting surveillance data on risk behaviors through monthly telephone interviews. Over time, the number of states participating in the survey increased; by 2001, 50 states, the District of Columbia, Puerto Rico, Guam, and the US Virgin Islands were participating in the BRFSS. Today, all 50 states, the District of Columbia, Puerto Rico, and Guam collect data annually and American Samoa, Federated States of Micronesia, and Palau collect survey data over a limited point- in-time (usually one to three months). In this document, the term “state” is used to refer to all areas participating in BRFSS, including the District of Columbia, Guam, and the Commonwealth of Puerto Rico.

The BRFSS objective is to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population. Factors assessed by the BRFSS in 2013 include tobacco use, HIV/AIDS knowledge and prevention, exercise, immunization, health status, healthy days — health-related quality of life, health care access, inadequate sleep, hypertension awareness, cholesterol awareness, chronic health conditions, alcohol consumption, fruits and vegetables consumption, arthritis burden, and seatbelt use.

Since 2011, BRFSS conducts both landline telephone- and cellular telephone-based surveys. In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing.

Health characteristics estimated from the BRFSS pertain to the non-institutionalized adult population, aged 18 years or older, who reside in the US. In 2013, additional question sets were included as optional modules to provide a measure for several childhood health and wellness indicators, including asthma prevalence for people aged 17 years or younger.

Source: https://d3c33hcgiwev3.cloudfront.net/_e34476fda339107329fc316d1f98e042_brfss_codebook.html?Expires=1590192000&Signature=G~dZbm4Uv5ZwU7bzmV7-2ept33hAUXHUX7uVZQR2ogVPdMaMqMJIv5p3vyW29rDPuyYdPyrUuiEM8xO8nSidktjrsNEZpTcxoJzLCDxGjr6dSX8~FgD--td3OwKSvJH3Y8Y9ppTO1Kp8h8pw1XKizXIG3H7n4CSmeuGrCsWGNRU_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

Part 1: Data

There are three observations that we can draw from the way the data was collected.

  1. Observatonal Data and Correlations: Because this is an observational study and not an experiment, observations about relationships between different behavioral risk factors (e.g. tobacco use, HIV/AIDS knowledge and prevention, exercise, immunization, health status, etc) indicate correlations and not causations.

  2. Random Sampling and Non-response Bias: The interviewers collected data from a randomly selected adult in a household. However, it was no indication of the response rate, signalling that the dataset might contain ‘non-response’ sampling bias.

  3. Representative Sample: All U.S. states and territories were participating in the survey, which signals that if the sampling bias discussed above is neglible, the data collected is representative of the U.S. non-institutionalized adult population, aged 18 years or older, who reside in the US.


Part 2: Research questions

Research quesion 1:

Objective: Examine the correlation between physical activity (measured by the number of mininutes or hours doing physical exercises) and mental health (measures by the number of days full of energy and datas depressed in the past 30 days).

Input variables: - qlhlth2: How Many Days Full Of Energy In Past 30 Days (364 obs excl. NA and 0) - qlmentl2: How Many Days Depressed In Past 30 Days - exerhmm1: Minutes Or Hours Walking, Running, Jogging, Or Swimming - exerhmm2: Minutes Or Hours Walking, Running, Jogging, Or Swimming

Research quesion 2:

Objective: Examine the correlation between veterans’ perception toward mental health treatment and their satisfaction with life

Input variables: - veteran3: Are You A Veteran (“Yes”, “No”, …) - mistrhlp: Mental Health Treatment Can Help People Lead Normal Life (1-5 with 1 = agree strongly, 5 = disagree strongly) - lsatisfy: Satisfaction With Life (1-4 with 1 = very satisfied, 4 = very disatisfied)

Research quesion 3:

Objective: Examine the relationship between mental health, emotional support received, and political engagement

Input variables: - menthlth: Number Of Days Mental Health Not Good - emtsuprt: How Often Get Emotional Support Needed - scntvot1: Did You Vote In The Last Presidential Election?


Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1:

Objective: Examine the correlation between physical activity (measured by the number of mininutes or hours doing physical exercises) and mental health (measures by the number of days full of energy and datas depressed in the past 30 days).

Input variables: - qlhlth2: How Many Days Full Of Energy In Past 30 Days (364 obs excl. NA and 0) - qlmentl2: How Many Days Depressed In Past 30 Days - exerhmm1: Minutes Or Hours Walking, Running, Jogging, Or Swimming - exerhmm2: Minutes Or Hours Walking, Running, Jogging, Or Swimming

First, we will filter out the 4 target measures (number of days full of energy, days depressed, and mins or hours of physical exercises)

## 'data.frame':    491775 obs. of  4 variables:
##  $ exerhmm1: int  NA 20 NA 30 NA 15 100 15 100 30 ...
##  $ exerhmm2: int  NA 10 NA NA NA 30 NA NA NA 100 ...
##  $ qlhlth2 : int  0 25 2 20 NA NA NA NA NA NA ...
##  $ qlmentl2: int  30 2 2 6 NA NA NA NA NA NA ...

Next, we will filter out incomplete records which have at least one NA in one of the four columns of the dataframe ‘brfss2013_q1’.

## 'data.frame':    140 obs. of  4 variables:
##  $ exerhmm1: int  20 15 30 130 45 30 100 30 300 100 ...
##  $ exerhmm2: int  10 100 20 200 25 30 300 30 600 30 ...
##  $ qlhlth2 : int  25 30 0 20 30 25 30 15 2 15 ...
##  $ qlmentl2: int  2 0 0 0 0 0 0 0 15 0 ...

For each of the 140 observations we obtained, we will compute the total number of miniutes spent on physical exercise.

## 'data.frame':    140 obs. of  3 variables:
##  $ exercise_time: int  30 115 50 330 70 60 400 60 900 130 ...
##  $ qlhlth2      : int  25 30 0 20 30 25 30 15 2 15 ...
##  $ qlmentl2     : int  2 0 0 0 0 0 0 0 15 0 ...

Now let’s plot the exercise time observed in our sample. The plot is right skewed.

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Next, we will rank each individual’s exercise time as low, medium, high, or very high depending on the actual exercise time - low threshold: exercise_time < quantiile 1 - the bottom 25% of the sample should meet the ‘low’ threshold - medium threshold: quantile 1 <= exercise_time < median - high threshold: median <= exercise_time < quantile 3 - very high threshold: excercise_time >= quantile 3 - the top 25% of the sample should meet the ‘very high’ threshold

## [1] "q1 = 69"
## [1] "med = 234"
## [1] "q3 = 399"

Now let’s rank exercise times!

## # A tibble: 4 x 2
##   exercise_time_rank count
##   <chr>              <int>
## 1 high                  25
## 2 low                   39
## 3 medium                51
## 4 very high             25
## 'data.frame':    140 obs. of  4 variables:
##  $ exercise_time     : int  30 115 50 330 70 60 400 60 900 130 ...
##  $ qlhlth2           : int  25 30 0 20 30 25 30 15 2 15 ...
##  $ qlmentl2          : int  2 0 0 0 0 0 0 0 15 0 ...
##  $ exercise_time_rank: chr  "low" "medium" "low" "high" ...
## [1] "days_depressed statistics:"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    2.55    2.00   30.00

Lastly, we will calculate the average number of days when the respondents report having full energy and the average number of days the respondents report feeling depressed for each group with different rankings in exercise time.

## # A tibble: 4 x 3
##   exercise_time_rank avg_days_full_energy avg_days_depressed
##   <chr>                             <int>              <int>
## 1 high                                 20                  0
## 2 low                                  20                  0
## 3 medium                               25                  0
## 4 very high                            25                  0

The analysis suggests a positive correlation between the amount of exercise time the respondents engaged in and the number of days when they reported having full energy in the last 30 days.

There were some outliners with a high number of days the respondents reported feeling depreessed. But at least 50% of the respondents reported 0 days of depression regardless of the level of exercise they engaged in.

Research quesion 2:

Objective: Examine the correlation between veterans’ perception toward mental health treatment and their satisfaction with life

Input variables: - veteran3: Are You A Veteran (“Yes”, “No”, …) - mistrhlp: Mental Health Treatment Can Help People Lead Normal Life (1-5 with 1 = agree strongly, 5 = disagree strongly) - lsatisfy: Satisfaction With Life (1-4 with 1 = very satisfied, 4 = very disatisfied)

First, we filter records of respondents who are veterans and along with their perception toward mental health treatment and satisfaction with life

##             menthlth_treatment_perception         life_satisfaction
##  Agree strongly            :419           Very satisfied   :303    
##  Agree slightly            :112           Satisfied        :277    
##  Neither agree nor disagree: 50           Dissatisfied     : 28    
##  Disagree slightly         : 16           Very dissatisfied:  1    
##  Disagree strongly         : 12

Now we calculate the average (mean) life satisfaction measure among all respondents and among the veteran respondents with different perceptions toward mental health treatment.

## [1] 1.616828
## # A tibble: 5 x 4
##   menthlth_treatment_pe~ vet_lsatis_avg national_lsatis_~ higher_lsatis_than_na~
##   <fct>                           <dbl>             <dbl> <chr>                 
## 1 Agree strongly                   1.50              1.62 yes                   
## 2 Agree slightly                   1.62              1.62 yes                   
## 3 Neither agree nor dis~           1.7               1.62 no                    
## 4 Disagree slightly                1.81              1.62 no                    
## 5 Disagree strongly                1.92              1.62 no

The results indicate a positive correlattion between positive perception toward mental health treatment and positive life satisfication reported among veteran respondents.

Additionally, veteran respondents who agree that mental health treatment leads to a normal life have a higher average life satisfaction rate than the national rate.

Research quesion 3:

Objective: Examine the relationship between mental health, emotional support received, and political engagement

Input variables: - menthlth: Number Of Days Mental Health Not Good in the past 30 days (0-30, NA) - emtsuprt: How Often Get Emotional Support Needed (1-5, 1=Always, 5=Never, NA) - scntvot1: Did You Vote In The Last Presidential Election? (1=Yes, 2=No, 8=Not Applicable, NA)

First, we filter out records that have complete information on mental health statust, emotional support received, and voting record (yes/no)

##  voted     days_bad_mental_health emotional_support_received
##  Yes:361   Min.   : 0.000         Always   :257             
##  No : 86   1st Qu.: 0.000         Usually  : 81             
##            Median : 0.000         Sometimes: 71             
##            Mean   : 4.539         Rarely   : 20             
##            3rd Qu.: 3.500         Never    : 18             
##            Max.   :30.000

Now we take a closer look at the mental health status and emotional support received by the respondents who did and did not vote in the last election

## # A tibble: 2 x 2
##   voted avg_days_bad_mental_health
##   <fct>                      <dbl>
## 1 Yes                         3.44
## 2 No                          9.14
## # A tibble: 2 x 2
##   voted `emotional_support_received (smaller = support received more often)`
##   <fct>                                                                <dbl>
## 1 Yes                                                                   1.73
## 2 No                                                                    2.06

The results show a positive correlation between mental health (measured by days with bad mental health), emotional support received (how often an individual receives the emotional support s/he needs), and political engagement (whether the individual voted in the last election). The more mentally health and support emotional support a person receives, the more s/he is likely to engage politically through voting.