To access the Dataset and Codebook click the text.

INTRODUCTION

Background of BRFSS

The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all the states in the United States and participating US territories and the Centers for Disease Control and Prevention (CDC). The BRFSS is administered and supported by CDC’s Population Health Surveillance Branch, under the Division of Population Health at CDC’s National Center for Chronic Disease Prevention and Health Promotion. The BRFSS is a system of ongoing health-related telephone surveys designed to collect data on health-related risk behaviors, chronic health conditions, health-care access, and use of preventive services from the non-institutionalized adult population (≥ 18 years) residing in the United States and participating areas.

BRFSS Work

BRFSS’s objective is to collect uniform state-specific data on health risk behaviors, chronic diseases and conditions, access to health care, and use of preventive health services related to the leading causes of death and disability in the United States. Factors assessed by the BRFSS in 2021 included health status and healthy days, exercise, hypertension awareness, cholesterol awareness, chronic health conditions, arthritis, tobacco use, fruits and vegetables consumption, and health-care access.

What’s new in BRFSS 2021 Data

Introduction of additional demographic characteristics (e.g., education level, marital status, home renter/owner) in addition to age-race/ethnicity-gender that improve the degree and extent to which the BRFSS sample properly reflects the socio-demographic make-up of individual states.

Data Description

During 2021, all 50 states, the District of Columbia, Guam, Puerto Rico, and US Virgin Islands collected BRFSS data. Data collected suing telephone survey using Disproportionate Stratified Sample (DSS) for 50 states landline sample.Guam, Puerto Rico, and US Virgin Islands used a simple random-sample design.

When was the data Collected

Data collected for year 2021 and published on August 12,2021.

How many observation and variables are there?

Data contains 438693 observations and 303 variables

Target Population

The target population (aged 18 years and older) for cellular telephone samples in 2021 consists of people residing in a private residence or college housing who have a working cellular telephone.

Research Question

The aim of this research is to see whether the number of bad mental health days of people is associated with the Marital status, status of Home, people on cholesterol medication, exercise status and the affordability to bear the cost of seeking medical doctor.

The Outcome Variable and the Predictors

The outcome Variable

Mental Health status in last 30 days of data collection.

How the outcome variable measured.(data types, categories, values)

The Predictors-

Marital status of people.
Home (renter/owner)- Whether people have own home or live in rental.
Taking Cholesterol Medication
Exercise status
Insurance Status

How the Predictor variable measured.(data types, categories, values)

To see mental health across the marital status, cholesterol medication patient and insurance status. What is the purpose of the research? The purpose of the research is to look for mental health status across marital status, people on medication to cholesterol and insurance status.

#Import data

Clean Data

ment_health <- health %>%
  select(`WEIGHT2`, `SMOKE100`, `MENTHLTH`, `PRIMINSR`, `MEDCOST1`, `EXERANY2`, `CHOLMED3`, `MARITAL`, `RENTHOM1`) %>%
  mutate(MENTHLTH = as.numeric(MENTHLTH)) %>%
  mutate(MENTHLTH =na_if(x= MENTHLTH, y = 88)) %>%
  mutate(MENTHLTH =na_if(x = MENTHLTH, y = 77)) %>%
  mutate(MENTHLTH= na_if(x= MENTHLTH, y= 99)) %>%
  mutate(EXERANY2 =na_if(x= EXERANY2, y= 7)) %>%
  mutate(EXERANY2= na_if(x= EXERANY2, y= 9)) %>%
  mutate(PRIMINSR =na_if(x= PRIMINSR, y= 77)) %>%
  mutate(PRIMINSR =na_if(x= PRIMINSR, y= 99)) %>%
  mutate(RENTHOM1= na_if(x= RENTHOM1, y= 7)) %>%
  mutate(RENTHOM1 =na_if(x= RENTHOM1, y= 9)) %>%
  mutate(MARITAL= na_if(x= MARITAL, y = 9)) %>%
  mutate(MEDCOST1= na_if(x= MEDCOST1, y= 7)) %>%
  mutate(MEDCOST1 =na_if(x= MEDCOST1, y= 9)) %>%
  mutate(CHOLMED3= na_if(x= CHOLMED3, y= 7)) %>%
  mutate(CHOLMED3 =na_if(x= CHOLMED3, y= 9)) %>%
  mutate(SMOKE100= na_if(x= SMOKE100, y= 7)) %>%
  mutate(SMOKE100 =na_if(x= SMOKE100, y= 9)) %>%
  mutate(WEIGHT2= na_if(x= WEIGHT2, y= 7777)) %>%
  mutate(WEIGHT2 =na_if(x= WEIGHT2, y= 9999)) %>%
  drop_na() %>%
  mutate(MARITAL = recode_factor(MARITAL, 
                                 `1` = "Married",
                                 `2` = "Divorced",
                                 `3` = "Widowed",
                                 `4` = "Seperated",
                                 `5` = "Never ,married",
                                 `6` = "A member of an unmarried couple")) %>%
  mutate(EXERANY2 = recode_factor(EXERANY2,
                                  `1` = "yes",
                                  `2` = "No")) %>%
  mutate(CHOLMED3 = recode_factor(CHOLMED3,
                                  `1` = "yes",
                                  `2` = "No")) %>%
  mutate(RENTHOM1 = recode_factor(RENTHOM1,
                                  `1` = "Own",
                                  `2` = "Rent",
                                  `3` = "Other aggangement")) %>%
  mutate(MEDCOST1 = recode_factor(MEDCOST1,
                                  `1` = "yes",
                                  `2` = "No"))

mental_health <- rename(ment_health, "Exercise" ="EXERANY2", "Mental_status" = "MENTHLTH", "Cholesterol_Medication" = "CHOLMED3", "Home_status" = "RENTHOM1", "Non_Medical_affordability"= "MEDCOST1")

Describing the Variables

table1(~ MARITAL + Exercise + Home_status + Cholesterol_Medication + Mental_status, render.continious = "median (IQR)", data= mental_health)

	Overall (N=115091)
MARITAL
Married	56368 (49.0%)
Divorced	17271 (15.0%)
Widowed	10658 (9.3%)
Seperated	2857 (2.5%)
Never ,married	22229 (19.3%)
A member of an unmarried couple	5708 (5.0%)
Exercise
yes	86028 (74.7%)
No	29063 (25.3%)
Home_status
Own	77297 (67.2%)
Rent	32248 (28.0%)
Other aggangement	5546 (4.8%)
Cholesterol_Medication
yes	30533 (26.5%)
No	84558 (73.5%)
Mental_status
Mean (SD)	10.7 (10.0)
Median [Min, Max]	6.00 [1.00, 30.0]

CrossTable(mental_health$MARITAL, prop.t=TRUE, prop.r=TRUE, prop.c=TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Row Total | 
## |-------------------------|
## 
## |                         Married |                        Divorced |
## |---------------------------------|---------------------------------|
## |                           56368 |                           17271 |
## |                           0.490 |                           0.150 |
## |---------------------------------|---------------------------------|
## 
## |                         Widowed |                       Seperated |
## |---------------------------------|---------------------------------|
## |                           10658 |                            2857 |
## |                           0.093 |                           0.025 |
## |---------------------------------|---------------------------------|
## 
## |                  Never ,married | A member of an unmarried couple |
## |---------------------------------|---------------------------------|
## |                           22229 |                            5708 |
## |                           0.193 |                           0.050 |
## |---------------------------------|---------------------------------|

CrossTable(mental_health$Home_status, prop.t=TRUE, prop.r=TRUE, prop.c=TRUE)

##    Cell Contents 
## |-------------------------|
## |                       N | 
## |           N / Row Total | 
## |-------------------------|
## 
## |               Own |              Rent | Other aggangement |
## |-------------------|-------------------|-------------------|
## |             77297 |             32248 |              5546 |
## |             0.672 |             0.280 |             0.048 |
## |-------------------|-------------------|-------------------|

ggplot(mental_health, aes(x= Mental_status)) +
  geom_histogram(binwidth=1, fill="#f03b20", color="#f03b20", alpha=0.5) +
   labs(title  = "Distribution of Mental Status Data ")

The histogram of the outcome variable is not a normal curve rather it is skewed, so the outcome continuous variable is non-parametric, Thus we are not calculating mean for this rather going to see the median value of this distribution.

summary(mental_health$Mental_status)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.00    6.00   10.75   15.00   30.00

plotting graph between the continuous outcome variable and the categorical predictor variable

First is for the Number of bad mental health days and Marital status of the population.

Hypothesis

Ho - There is no relation between bad mental health days and the marital status of person.

HA- There is relation between bad mental health days and the marital status of person.

  ggplot(mental_health, aes(x= MARITAL, y = Mental_status))+
    geom_jitter(aes(shape= MARITAL, color= MARITAL), alpha = .4)+
  scale_colour_manual(values = c("#5ab4ac","#d8b365","#e9a3c9","#a1d76a","#fec44f","#addd8e"))+
  geom_boxplot(aes(fill= MARITAL),alpha = .1)+
       theme_minimal()+
  labs(x= "Marital Status", y= "Bad Mental health", title= "Bad Mental health and Marital status")+
  theme (axis.title.x =element_text (size=11), axis.title.y = element_text (size=11),text=element_text(size=8, colour = "Black"))+
 theme(legend.position = "bottom")

Looking at graph we can see the highest number of bad mental health is of the seperated population and the lowest is for the married people.

Second Graph is between the number of bad mental health days and exercise status of the people.

Hypothesis

Ho - There is no relation between bad mental health days and exercise.

HA- There is relation between bad mental health days and exercise.

ggplot(mental_health,aes(x = Exercise, y = Mental_status)) + 
  geom_jitter(aes(color = Exercise), alpha = .3)+
   scale_colour_manual(values = c("#edf8b1","#bcbddc"))+
  geom_boxplot(aes(fill = Exercise), alpha = .1) +
  theme_minimal()+
  labs(x = "Exercise Status", y = "Bad Mental Health", title = "Bad Mental Days Vs Exercise")

Graph shows that those who do exercise have lower number of bad mental health days as compared to those who do not do exercise.

Third, graph is between number of bad mental health days and people taking cholesterol medication.

Hypothesis

Ho - There is no relation between bad mental health days and Cholesterol Medication prescription.

HA- There is relation between bad mental health days and cholesterol medication prescription.

ggplot(mental_health,aes(x = Cholesterol_Medication, y = Mental_status)) + 
  geom_jitter(aes(color = Cholesterol_Medication), alpha = .4)+
  scale_colour_manual(values = c("#9ebcda","#feb24c"))+
  geom_boxplot(aes(fill = Cholesterol_Medication), alpha = .1) +
  theme_minimal()+
  labs(x = "Cholesterol Medication Prescription", y = "Bad Mental Health", title = "Bad Mental Days Vs Cholesterol Medication")

Graph plot indicates people on cholesterol medication has more number of bad mental health days as compared to those who do not take.

Fourth graph is between number of bad mental health days and home status of people.

Hypothesis

Ho - There is no relation between bad mental health days and home status of people.

HA- There is relation between bad mental health days and home status of people.

ggplot(mental_health,aes(x = Home_status, y = Mental_status)) + 
  geom_jitter(aes(color = Home_status), alpha = .4)+
   scale_colour_manual(values = c("#99d8c9","#bcbddc", "#fc9272"))+
  geom_boxplot(aes(fill = Home_status), alpha = .1) +
  theme_minimal()+
  labs(x = "Home Status", y = "Bad Mental Health", title = "Bad Mental Days Vs Home Status")

Graph plot says people having own home have less number of bad mental health than those who live on rent and other aggrement.

Fifth, graph is between number of bad mental health days and inability to afford medical cost.

Hypothesis

Ho - There is no relation between bad mental health days and inability to afford medical cost.

HA- There is relation between bad mental health days and inability to afford medical cost.

ggplot(mental_health,aes(x = Non_Medical_affordability, y = Mental_status)) + 
  geom_jitter(aes(colour = Non_Medical_affordability), alpha = .4)+
  scale_colour_manual(values = c("#feb24c","#addd8e"))+
  geom_boxplot(aes(fill = Non_Medical_affordability), alpha = .1) +
  theme_minimal()+
  labs(x = "Not able to afford medical cost", y = "Bad Mental Health", title = "Bad Mental Days Vs Not able to afford medical cost")

graph plot indicate people who are unable to afford the cost to see medical doctor have higher number of bad mental health days as compared to those who can meet the cost of doctor.

Statistical Tests

We have concluded that the number of bad mental health days is not normally distributed. In addition, outliers exists. As a result, a Kruskal-Wallis rank sum test is more appropriate to compare the effectiveness of three or more separate predictor variable with continuous outcome variable. Mann-Whitney U test to compare the effectiveness of two separate predictor variable with continuous outcome variable.

Kruskal-Wallis rank sum test (Graph 1)

    kruskal.test(Mental_status ~ MARITAL, data = mental_health)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  Mental_status by MARITAL
## Kruskal-Wallis chi-squared = 2471.3, df = 5, p-value < 2.2e-16

Interpretation

As the p-value= 2.2e-16 is less than the significance level 0.05, we can conclude by rejecting Null Hypothesis that there are significant relation between the number of bad mental health days and the marital status of person.

Perform the Mann-Whitney U test (Graph 2)

Stat_Result<-wilcox.test(Mental_status ~ Exercise, data=mental_health, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
print(Stat_Result)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Mental_status by Exercise
## W = 943857102, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -2.999996 -2.999963
## sample estimates:
## difference in location 
##              -2.999996

Interpretation

The Mann-Whitney U test results in a two-sided test p-value = 2.2e-16, very close to zero. This indicates that we should reject the null hypothesis that there is highly significant difference across the exercise group and bad mental health status.

Perform the Mann-Whitney U test (Graph 3)

Stat_Result<-wilcox.test(Mental_status ~ Cholesterol_Medication, data=mental_health, na.rm=TRUE, paired=FALSE, exact=FALSE, conf.int=TRUE)
print(Stat_Result)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Mental_status by Cholesterol_Medication
## W = 1.349e+09, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  7.635844e-05 3.720061e-05
## sample estimates:
## difference in location 
##           1.448262e-05

Interpretation

Discussion

Considering the research question that is the predictor listed in the analysis associated to the number of bad mental health days of the population and we found that the null hypothesis was rejected and there were statistical significance with predictor variable to the outcome.

REFERENCES

Remington PL, Smith MY, Williamson DF, Anda RF, Gentry EM, Hogelin GC. Design, characteristics, and usefulness of state-based behavioral risk factor surveillance: 1981-87. Public Health Rep. 1988;103(4):366-375.
Federal Communications Commission USA. Universal Service Monitoring Report. 2020; OEA Releases 2021 Universal Service Monitoring Report | Federal Communications Commission (fcc.gov). pp 59, 61. Accessed July 2022.
Blumberg SJ, Luke JV. Wireless substitution: Early release of estimates from the National Health Interview Survey, July–December 2021. National Center for Health Statistics. May 2022. Available from: https://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless202205.pdf .Accessed August 5, 2022.
MP B. Improving standard poststratification techniques for random-digit-dialing telephone surveys. Surv Res Methods. 2008;2(1):9.

Biostats Checkpoint 2

Keshav Kumar

Nov18 2022

INTRODUCTION

Background of BRFSS

BRFSS Work

What’s new in BRFSS 2021 Data

Data Description

When was the data Collected

How many observation and variables are there?

Target Population

Research Question

The Outcome Variable and the Predictors

The outcome Variable

How the outcome variable measured.(data types, categories, values)

The Predictors-

How the Predictor variable measured.(data types, categories, values)

Clean Data

Describing the Variables

Hypothesis

Hypothesis

Hypothesis

Hypothesis

Hypothesis

Statistical Tests

Kruskal-Wallis rank sum test (Graph 1)

Interpretation

Perform the Mann-Whitney U test (Graph 2)

Interpretation

Perform the Mann-Whitney U test (Graph 3)

Interpretation

Discussion

REFERENCES