The codes below prepares the data for proper survey design analysis.Also, some variables were re-coded for the purpose of the research questions

#Load data for analysis
brfss <- readRDS("brfss_19.rds")
# Cleaning the  variable names for space, underscore & Uppercase Characters
renam<-names(brfss)
newnames<-tolower(gsub(pattern = "_",replacement =  "",x =  renam))
names(brfss)<-newnames
# subset Variables for the analysis 
brfsshm2 <-c("state","llcpwt","dispcode", "seqno","psu","genhlth","hlthpln1","persdoc2","racegr3","educa","ststr" )
brfss_19 <-brfss[brfsshm2] 
# Recodding variables for analysis purposes
homewk2 <- brfss_19 %>% 
      mutate(genhlth = car::Recode(genhlth, recodes="1:3='Good'; 4:5='Bad';else=NA",as.factor=T ),hlthpln1 = car::Recode(hlthpln1, recodes="1='Yes'; 2='No';else=NA",as.factor=T ),racegr3 = car::Recode(racegr3, recodes="1='NH-White';2='NH-Black'; 3:4='Other';5='Hispanic';else=NA",as.factor=T )) 
#  Survey Design
options(survey.lonely.psu = "adjust")

des<-svydesign(ids=~1, strata=~ststr, weights=~llcpwt, data = homewk2 )

a. Define a binary outcome variable of your choosing and define how yourecode the original variable.

For the purpose of this assignment, a variable that measures individual perception of health (genhlth) is employed as the outcome variable. The variable has 4 valid responses and was re-coded as a binary variable. The Response: excellent and good were recoded as good and fair and poor responses were re-coded as bad. Therefore, i am assuming that individuals that indicated excellent or good as their perception of health had good health status and those that indicated poor or fair on the survey as had bad health status

b. State a research question about what factors you believe will affect youroutcome variable.

  1. What percentage of Hispanics perceived their health has bad?

  2. What percentage of people with health insurance perceived their health to be bad?

c. Define at least 2 predictor variables, based on your research question. For this assignment, it’s best if these are categorical variables.

  • racegr3 - The race of individuals in the survey.
  • hlthpln1 - Health Insurance status. This variable is used to measure access to health care

d. Perform a descriptive analysis of the outcome variable by each of the variables you defined in part b. (e.g. 2 x 2 table, 2 x k table). Follow a similar approach to presenting your statistics as presented in Sparks 2009 (in the Google drive). This can be done easily using the tableone package!

I. Calculate descriptive statistics (mean or percentages) for each variable using no weights or survey design, as well as with full surveydesign and weights.

#not using survey design
twsd<-CreateTableOne(vars = c("racegr3", "hlthpln1"), strata = "genhlth", test = T, data = homewk2)
#t1<-print(t1, format="p")
print(twsd,format="p")
##                     Stratified by genhlth
##                      Bad   Good   p      test
##   n                  81830 335391            
##   racegr3 (%)                     <0.001     
##      Hispanic         13.0    8.2            
##      NH-Black         10.0    7.1            
##      NH-White         68.8   77.7            
##      Other             8.2    7.0            
##   hlthpln1 = Yes (%)  89.4   91.9 <0.001
  1. Calculate percentages, or means, for each of your independent variables for each level of your outcome variable and present this in a table, with appropriate survey-corrected test statistics. (tableone package helps)
#using survey design
tsd<-svyCreateTableOne(vars = c("racegr3", "hlthpln1"), strata = "genhlth", test = T, data = des)
#st1<-print(st1, format="p")
print(tsd, format="p")
##                     Stratified by genhlth
##                      Bad        Good        p      test
##   n                  47367607.9 204492860.2            
##   racegr3 (%)                               <0.001     
##      Hispanic              25.4        16.0            
##      NH-Black              13.5        11.4            
##      NH-White              54.2        63.6            
##      Other                  6.8         8.9            
##   hlthpln1 = Yes (%)       83.1        88.0 <0.001
  1. Are there substantive differences in the descriptive results between the analysis using survey design and that not using survey design?
  • Yes, there is substantive difference between the two table.