An additional package of stats was added for further analysis
The Behavioral Risk Factor Surveillance System, otherwise known as the BRFSS, is a collaborative observational project between all the United States (US) states - including the District of Columbia (DC) and three US territories. Currently, the BRFSS is sponsored by the CDC National Center for Chronic Disease Prevention and Health Promotion, other CDC centers, as well as federal agencies such as Health Resources and Services Administration, Administration on Aging, Department of Veteran Affairs, and the Substance Abuse and Mental Health Services Administration. The BRFSS is the largest continuously conducted health survey in the world with more than 400,000 randomly selected adults interviewed annually. The BRFSS is designed to measure behavioral risk factors for non-institutionalized adults (individuals 18 years of age or older) residing in the US and territories.
The BRFSS began in 1984 with 15 states collecting surveillance data on major risk behaviors through monthly telephone interviews. The CDC developed a standard questionnaire for states to use with the initial topics including smoking, alcohol use, physical inactivity, diet, hypertension, and seat-belt use. By 1988, standardized sets of specific questions were added as optional modules. The BRFSS became a nationwide system by 1993, with a sample size of over 100,000. Currently, the survey is conducted by, with the assistance of the CDC, state health departments’ in-house interviewers, or contracted telephone call centers and universities continuously throughout the year. States (referring to the US states and territories) use a standardized main survey with optional modules with states additionally having the option to add questions as well that pertain to their region. The survey is conducted using a Random Digit Dialing (RDD) system for both landlines and cell phones. Cell phones were implemented in 2011. In conducting landline interviews, interviewers collect data from a randomly selected adult within the household; cellphone interviews are conducted with someone who resides in a private residence or college housing.
The BRFSS states its objective is to collect uniform, state-specific data on preventative health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population.
The 2013 BRFSS assessed tobacco use, HIV/AIDS knowledge, and prevention, exercise, immunization, health status, healthy days, health-related quality of life, healthcare access, inadequate sleep, hypertension awareness, cholesterol awareness, chronic health conditions, alcohol consumption, produce consumption, arthritis burden, and seat-belt use. In 2013, there were 22 optional standardized modules a state could choose to additionally use, a few of these optional modules are: sugar drink consumption, breast and cervical cancer screening, and reactions to race.
Since cellphone lines are conducted with individuals who reside in a private residence or college housing, the data may be biased to those who can afford a mobile line and reside in a private or college residence; meaning, data may exclude individuals who cannot afford a cellular line or landline, individuals who cannot afford a college education, and individuals who do not live in a private residence. Thus, results concluded from this study may only be generalizable to part of the population, those who have a landline or cellphone (and are in college or private residence).
An additional concern can be raised with the way the survey was conducted, respondents’ answers were not validated. This can lead to under-reporting, over-reporting, or respondents forgetting key information.
References
Behavioral Risk Factor Surveillance System Overview: BRFSS 2013
About BRFSS
BRFSS FAQs
BRFSS History
BRFSS History Fact Sheet
BRFSS 2013 Calculated Variables
BRFSS Questionaire
BRFSS 2013 Optional Modules by State
hlthpln1
Description: Do you have any kind of health care coverage, including health insurance, prepaid plans such as HMOs, or government plans such as Medicare, or Indian Health Service?
income2
Description: Respondant’s income level
_X_raceg21
Description: Respondant’s race classfied as Non-hispanic White or Non-White
Do education and gender affect general health?
genhlth
Description: Would you say that in general your health is: Excellant, Very Good, Good, Fair, Poor, Don’t Know/Not Sure, Refused, Not asked or Missing
educa
Description: What’s the highest grade or year of school you complete?
sex
Description: Respondant’s sex
mistrhlp
Description: Treatment can help people with mental illness lead normal lives. Do you agree slightly or strongly, or disagree slightly or strongly?
mistrhlpf
People are generally caring and sympathetic to people with mental illness. Do you agree slightly or strongly or disagree slightly or strongly?
mistmnt
Description: Are you now taking medicine or receiving treatment from a doctor or other health professional for any type of mental health condition or emotional problem?
_state
Description: State code
# Selection of relavant variables into a new dataset and omit NA responses
brfss2013_a <- brfss2013 %>% select(hlthpln1, income2, X_raceg21) %>% na.omit
# Checking to make sure the correct data columns were selected
colnames(brfss2013_a)## [1] "hlthpln1" "income2" "X_raceg21"
#Get counts for each variable
c_raceg21 <- brfss2013_a %>%
group_by(X_raceg21) %>%
summarise(counts = n())
c_raceg21## # A tibble: 2 x 2
## X_raceg21 counts
## <fct> <int>
## 1 Non-Hispanic White 322322
## 2 Non-White or Hispanic 91355
## # A tibble: 8 x 2
## income2 counts
## <fct> <int>
## 1 Less than $10,000 24774
## 2 Less than $15,000 26207
## 3 Less than $20,000 34133
## 4 Less than $25,000 40941
## 5 Less than $35,000 48068
## 6 Less than $50,000 60652
## 7 Less than $75,000 64344
## 8 $75,000 or more 114558
## # A tibble: 2 x 2
## hlthpln1 counts
## <fct> <int>
## 1 Yes 367077
## 2 No 46600
# Viewing the frequencies for income vs healthcare status
protable1 <- prop.table(table(brfss2013_a$income2, brfss2013_a$hlthpln1),2)
protable1##
## Yes No
## Less than $10,000 0.05004127 0.13744635
## Less than $15,000 0.05653037 0.11708155
## Less than $20,000 0.07149726 0.16927039
## Less than $25,000 0.08932458 0.17493562
## Less than $35,000 0.11215358 0.14804721
## Less than $50,000 0.14960621 0.12306867
## Less than $75,000 0.16613953 0.07206009
## $75,000 or more 0.30470719 0.05809013
Here we can see there appears to be some association between income level and if someone has healthcare. Let’s view a graphic representation.
#Graphical Representation
graph1 <- ggplot(data = brfss2013_a,
aes(x=income2, fill=hlthpln1)) +
geom_bar(position = "fill") +
coord_flip()+
ggtitle("Healthcare Status by Income Level")
graph1 <- graph1 +
xlab("Income Level") +
ylab("Proportions") +
scale_fill_discrete("Do you have healthcare coverage?")
graph1It would seem the lower your income, the greater your chances are of not having healthcare, but throughout each income bracket more than 50% have healthcare. What about race and healthcare status?
#Graphical Representation of Income Level vs Healthcare Coverage = Yes vs Race (divided into White vs Non-white)
plotdat = brfss2013_a %>%
#Taking a subset of the data so the RMarkdown "Knit" function doesn't take over an hour to process
sample_frac(.1) %>%
group_by(X_raceg21, income2) %>%
#Creating dummy variable for Healthcare, if they have healthcare it is coded as 1, if not 0.
mutate(prop = sum(hlthpln1 == "Yes")/n()) %>%
filter(row_number(hlthpln1) == 1)
plotdat %>%
group_by(X_raceg21, income2) %>%
ggplot(aes(income2,
prop,
group = X_raceg21,
color = X_raceg21)) +
geom_jitter(width = .2) +
geom_line() +
ggtitle("Healthcare Status stratified by Income and Race") +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Income Levels", y = "Proportion with Healthcare", color = "Population")Here we can see that Non-Whites or Hispanics on average are less likely to have healthcare, with a greater divide at income levels “Less than $15,000” and “Less than $25,000”. What is interesting is the smallest divide is at the lowest level; this could potentially be caused by people qualifying for government-run healthcare. Let’s explore more and create a Logistic Regression because the dataset has a binary response (outcome, dependent) variable called hlthpln1.
#Creating A Fresh Dataset so we can set some of the variables as numeric rather than factors
brfss2013_catreg <- brfss2013 %>%
select(hlthpln1, income2, X_raceg21) %>%
na.omit
#Changing factor variables to numeric
#Where 1=yes, 2=no
brfss2013_catreg$hlthpln1=as.factor(brfss2013_catreg$hlthpln1)
#Where 1=White, 2=Non-White
brfss2013_catreg$X_raceg21=as.numeric(brfss2013_catreg$X_raceg21)
#Where 1=Less than $10,000, 2=Less than $15,000, 3=Less than $20,000, 4=Less than $25,000, 5=Less than $35,000, 6=Less than $50,000, 7=Less than $75,000, 8=$75,000 or more
brfss2013_catreg$income2=as.numeric(brfss2013_catreg$income2)
#Running a Logistic Regression
catreg <- glm (hlthpln1~income2 + X_raceg21,
data = brfss2013_catreg,
family=binomial)
catreg##
## Call: glm(formula = hlthpln1 ~ income2 + X_raceg21, family = binomial,
## data = brfss2013_catreg)
##
## Coefficients:
## (Intercept) income2 X_raceg21
## -1.1765 -0.3195 0.5372
##
## Degrees of Freedom: 413676 Total (i.e. Null); 413674 Residual
## Null Deviance: 291200
## Residual Deviance: 263900 AIC: 263900
##
## Call:
## glm(formula = hlthpln1 ~ income2 + X_raceg21, family = binomial,
## data = brfss2013_catreg)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0044 -0.5238 -0.3681 -0.2834 2.5437
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.176479 0.020010 -58.80 <2e-16 ***
## income2 -0.319476 0.002286 -139.78 <2e-16 ***
## X_raceg21 0.537219 0.010999 48.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 291242 on 413676 degrees of freedom
## Residual deviance: 263892 on 413674 degrees of freedom
## AIC: 263898
##
## Number of Fisher Scoring iterations: 5
From the Logistic Regression model we see that income has a negative effect and race has a positive effect. Both the coefficient of race and income are significant (p>0.05). Looking at the plots we can see the residuals are far from the line, and this makes sense healthcare status is determined by more than just income and race, other factors that possibly contribute to healthcare status are age, employment status, marriage status, etc. While, this is a good start to creating a predictive model for healthcare status, more variables will have to be added. However, we can say that race and income are significant when building a model for healthcare status.
# Selection of relavant variables into a new dataset and omit NA responses
brfss2013_b <- brfss2013 %>% select(genhlth, sex, educa) %>% na.omit
# Checking to make sure the correct data columns were selected
colnames(brfss2013_b)## [1] "genhlth" "sex" "educa"
#Creating A Fresh Dataset so we can set some of the variables as numeric rather than factors
brfss2013_cor <- brfss2013 %>%
select(genhlth, sex, educa) %>%
na.omit
#Changing factor variables to numeric
#Where 1=Excellant, 2=Very Good, 3=Good,4= Fair,5=Poor
brfss2013_cor$genhlth=as.numeric(brfss2013_cor$genhlth)
#Where 0=Male, 1=Female
brfss2013_cor$sex=as.numeric(brfss2013_cor$sex==
"Female", 0, 1)
#Where 1=Never attended school or only kindergarten , 2=Grades 1 through 8 (Elementary), 3=Grades 9 through 11 (Some high school) , 4=Grade 12 or GED (High school graduate) , 5=College 1 year to 3 years (Some college or technical school), 6=College 4 years or more (College graduate),
brfss2013_cor$educa=as.numeric(brfss2013_cor$educa)Now that we have our variables as numeric representations (dummy variables), we can run a correlation to see if these variables correlate in any way.
## genhlth sex educa
## genhlth 1.00000000 0.01673846 -0.29371654
## sex 0.01673846 1.00000000 -0.01702893
## educa -0.29371654 -0.01702893 1.00000000
Now let’s plot the correlation to better view it.
## vars n mean sd median trimmed mad min max range skew kurtosis
## genhlth* 1 487561 2.58 1.10 3 2.52 1.48 1 5 4 0.36 -0.50
## sex* 2 487561 1.59 0.49 2 1.61 0.00 1 2 1 -0.37 -1.86
## educa* 3 487561 4.85 1.06 5 4.96 1.48 1 6 5 -0.62 -0.15
## se
## genhlth* 0
## sex* 0
## educa* 0
As we can see here, education and sex have almost no correlation. Genhlth and sex have almost no correlation. Interesting enough there is a weak negative correlation between education level and genhlth. We can infer that there is a weak association between being more education and poorer health. As for the summary statistics, since we are working with catagorical variables, the R package psych converts them to numerical values which should be interpreted cautiously (if at all).
# Selection of relavant variables into a new dataset and omit NA responses
brfss2013_c <- brfss2013 %>% select(mistrhlp, mistmnt, misphlpf, X_state) %>% na.omit
# Checking to make sure the correct data columns were selected
colnames(brfss2013_c)## [1] "mistrhlp" "mistmnt" "misphlpf" "X_state"
#Creating a proportion table
prop <- with(brfss2013_c, table(mistrhlp, mistmnt))%>%
prop.table(margin=2)
prop## mistmnt
## mistrhlp Yes No
## Agree strongly 0.81523437 0.71261822
## Agree slightly 0.13203125 0.21583518
## Neither agree nor disagree 0.01679687 0.02763422
## Disagree slightly 0.02363281 0.02928664
## Disagree strongly 0.01230469 0.01462574
## vars n mean sd median trimmed mad min max range skew kurtosis se
## Yes 1 5 0.2 0.35 0.02 0.2 0.02 0.01 0.82 0.8 1.02 -0.99 0.16
## No 2 5 0.2 0.30 0.03 0.2 0.02 0.01 0.71 0.7 0.89 -1.19 0.13
Looking at the proportion table (comparing the Yes and No columns), it would seem that regardless of if one receives mental healthcare help from a professional the majority of people agree that seeking help from a mental health professional can help live a normal life. However, of our group of interest it looks that the majority of individuals seeking help believe that it can lead to a normal life. Let’s look at the other statement that people are generally caring and sympathetic to people with mental illness and if one is seeking mental health care. Looking at the summary statistics we can tell that the data is heavily skewed, the positive skewness indicates that the right-handed tail is larger than the left-handed tail. The kurtosis tells us that the data is “light-tailed” meaning there is less in the tails than the normal distribution. The data’s “skewness” is probably from the differences in agree strongly.
#Creating a second proportion table
prop2 <- with(brfss2013_c, table(misphlpf, mistmnt))%>%
prop.table(margin=2)
prop2## mistmnt
## misphlpf Yes No
## Agree strongly 0.19003906 0.24983300
## Agree slightly 0.28906250 0.34876771
## Neither agree nor disagree 0.04355469 0.04918609
## Disagree slightly 0.28867188 0.24364519
## Disagree strongly 0.18867187 0.10856801
Looking at this proportion table, we can see that of the individuals seeking mental healthcare help from a professional that it is roughly split on agreeing or disagreeing with this statement. Since, this data was collected in 2013, it would be interesting to see how it compares to 2019’s data, to see if the stigma around mental health has changed. Let’s filter the data now so agreeing slightly or strongly is just agree, same with disagree, and we are looking at only those who are seeking mental health help from a professional.
#Filtering Data to only those seeking mental healthcare from a professional
brfss2013_mh <- select(brfss2013_c, misphlpf, mistmnt, mistrhlp, X_state) %>% filter(mistmnt=="Yes")
#Mutating variavles so strongly and slightly agree are now just agree and the same with disagree, and filtering out those who nethier agree or disagree
brfss2013_mh <- brfss2013_mh %>%
mutate(misphlpf = recode(misphlpf, "Agree strongly" = "Agree",
"Agree slightly" = "Agree",
"Disagree strongly"="Disagree",
"Disagree slightly"="Disagree"),
mistrhlp = recode(mistrhlp, "Agree strongly" = "Agree",
"Agree slightly" = "Agree",
"Disagree strongly"="Disagree",
"Disagree slightly"="Disagree")) %>%
filter(mistrhlp !="Neither agree nor disagree", misphlpf!="Neither agree nor disagree")
#Grouping by states, calculate counts and percentages
brfss2013_mh <- group_by(brfss2013_mh, X_state, mistrhlp, misphlpf) %>%
summarise(count=n()) %>%
mutate(pct=prop.table(count)*100)
brfss2013_mh## # A tibble: 24 x 5
## # Groups: X_state, mistrhlp [12]
## X_state mistrhlp misphlpf count pct
## <fct> <fct> <fct> <int> <dbl>
## 1 Alabama Agree Agree 47 72.3
## 2 Alabama Agree Disagree 18 27.7
## 3 Alabama Disagree Agree 1 33.3
## 4 Alabama Disagree Disagree 2 66.7
## 5 Massachusetts Agree Agree 292 51.9
## 6 Massachusetts Agree Disagree 271 48.1
## 7 Massachusetts Disagree Agree 9 29.0
## 8 Massachusetts Disagree Disagree 22 71.0
## 9 Minnesota Agree Agree 825 50.6
## 10 Minnesota Agree Disagree 806 49.4
## # ... with 14 more rows
So what is this telling us? In most states, if you agree to the first question, you have almost a 50/50 chance of agreeing to the second question that people are caring towards those with mental health issues, except for Alabama where it is more likely you will agree to both statements. In terms of disagreeing to the first question, across the states it looks as though you are more likely to disagree to the second statement.
graph2 <- ggplot(brfss2013_mh)+
aes(y=pct,
x=mistrhlp,
fill=misphlpf,
)+
geom_bar(stat="identity", position = "fill",
color="grey40")+
facet_grid(.~X_state)
graph2 <- graph2 + ggtitle("Beliefs About Mental Health Care By State") +
xlab("Can Those with Menthal Health Issues Seeking \nProfessional Mental Healthcare Lead a Normal Life? By State") +
ylab("Proportions") +
scale_fill_discrete("Are Individuals Caring to Those with Mental Health Issues?") +
theme(axis.text.x = element_text(colour="grey40", size=8, angle=90, hjust=.5, vjust=.5),
axis.text.y = element_text(colour="grey40", size=8),
axis.title.x= element_text(size=8, vjust=-1),
axis.title.y=element_text(size=8),
plot.title = element_text (hjust = 0.5, face="bold"),
legend.text = element_text(size=8, color="white"),
legend.position = "bottom",
legend.box = "vertical",
legend.title = element_text (size = 8, color="white"),
legend.background = element_rect(fill="grey40", color = "gray40")
)
graph2
Interestingly, individuals who are seeking mental health help from a professional and agree that seeking such help leads to a normal life are almost split down the middle when it comes to agreeing or disagreeing that individuals are caring to those with mental health issues across the states that did participate in this optional module, except for Alabama where individuals are more likely to agree to both statements. Those seeking mental health help from a professional that disagree that ‘seeking such help leads to a normal life’ seem more likely to disagree that individuals are caring to those with mental health issues. It would have been helpful for analysis if this module asked an additional question “Are you satisfied with your mental health care provider?” to understand if the individuals who disagree with both questions are potentially disagreeing because of dissatisfaction with care. With mental illness affecting 1 in 5 adults, the fact that so little states participated in this module is disappointing as it could give insight into which states need to improve on their mental health system.