Behavioral Risk Factor Surveillance System (BRFSS) Week 5 Project of the course Introduction to Data and Probability in the course track Statistics with R.
Submitted by Olusola Afuwape
March 1st 2019
library(ggplot2)
library(dplyr)load("brfss2013.RData")The Behavioral Risk Factor Surveillance System (BFRSS) employes two set of samples which include a sample set from landline telephone respondents and another from cellular telephone respondents.
Landline sample engages disproportionate stratified sampling (DSS) with two strata classified into high density (listed + block telephone numbers) and medium density (not listed + block number).
Cellular telephone sample engages simple random sampling (SRS) of confirmed cellular area codes and prefix combination.
Scope of inference refers to the extent to which the studies can be applied. BRFSS employed random sampling but no random assignment. BRFSS is an observational study. Observational studies have the potential for confounding variables. Thus cannot adequately establish cause and effect conclusions. The scope of inference of BRFSS is generalizabilty. It can also provide some evidence of association but not causality. There is absence of random assignment and imposed explanatory variable that usually characterized causality but presence of random sampling and confounding variables which are the hallmarks observational studies.
Research quesion 1: What is the association between gender, age and HIV/AIDS test status? What pattern did this status take across states?
Research quesion 2: What is the relationship between gender, fruits consumption and health status? What can be inferred from this relationship?
Research quesion 3: What geographical pattern did race and income exhibit in the United States?
# Explore some details of the BRFSS data
brfss <- brfss2013
#summary(brfss) Get the summary of the data
#str(brfss) Get a more clearer summary of the data
dim(brfss) # View the dimension of the observations and variables of the data[1] 491775 330
colnames(brfss) # Get the names of the column [1] "X_state" "fmonth" "idate" "imonth" "iday"
[6] "iyear" "dispcode" "seqno" "X_psu" "ctelenum"
[11] "pvtresd1" "colghous" "stateres" "cellfon3" "ladult"
[16] "numadult" "nummen" "numwomen" "genhlth" "physhlth"
[21] "menthlth" "poorhlth" "hlthpln1" "persdoc2" "medcost"
[26] "checkup1" "sleptim1" "bphigh4" "bpmeds" "bloodcho"
[31] "cholchk" "toldhi2" "cvdinfr4" "cvdcrhd4" "cvdstrk3"
[36] "asthma3" "asthnow" "chcscncr" "chcocncr" "chccopd1"
[41] "havarth3" "addepev2" "chckidny" "diabete3" "veteran3"
[46] "marital" "children" "educa" "employ1" "income2"
[51] "weight2" "height3" "numhhol2" "numphon2" "cpdemo1"
[56] "cpdemo4" "internet" "renthom1" "sex" "pregnant"
[61] "qlactlm2" "useequip" "blind" "decide" "diffwalk"
[66] "diffdres" "diffalon" "smoke100" "smokday2" "stopsmk2"
[71] "lastsmk2" "usenow3" "alcday5" "avedrnk2" "drnk3ge5"
[76] "maxdrnks" "fruitju1" "fruit1" "fvbeans" "fvgreen"
[81] "fvorang" "vegetab1" "exerany2" "exract11" "exeroft1"
[86] "exerhmm1" "exract21" "exeroft2" "exerhmm2" "strength"
[91] "lmtjoin3" "arthdis2" "arthsocl" "joinpain" "seatbelt"
[96] "flushot6" "flshtmy2" "tetanus" "pneuvac3" "hivtst6"
[101] "hivtstd3" "whrtst10" "pdiabtst" "prediab1" "diabage2"
[106] "insulin" "bldsugar" "feetchk2" "doctdiab" "chkhemo3"
[111] "feetchk" "eyeexam" "diabeye" "diabedu" "painact2"
[116] "qlmentl2" "qlstres2" "qlhlth2" "medicare" "hlthcvrg"
[121] "delaymed" "dlyother" "nocov121" "lstcovrg" "drvisits"
[126] "medscost" "carercvd" "medbills" "ssbsugar" "ssbfrut2"
[131] "wtchsalt" "longwtch" "dradvise" "asthmage" "asattack"
[136] "aservist" "asdrvist" "asrchkup" "asactlim" "asymptom"
[141] "asnoslep" "asthmed3" "asinhalr" "harehab1" "strehab1"
[146] "cvdasprn" "aspunsaf" "rlivpain" "rduchart" "rducstrk"
[151] "arttoday" "arthwgt" "arthexer" "arthedu" "imfvplac"
[156] "hpvadvc2" "hpvadsht" "hadmam" "howlong" "profexam"
[161] "lengexam" "hadpap2" "lastpap2" "hadhyst2" "bldstool"
[166] "lstblds3" "hadsigm3" "hadsgco1" "lastsig3" "pcpsaad2"
[171] "pcpsadi1" "pcpsare1" "psatest1" "psatime" "pcpsars1"
[176] "pcpsade1" "pcdmdecn" "rrclass2" "rrcognt2" "rratwrk2"
[181] "rrhcare3" "rrphysm2" "rremtsm2" "misnervs" "mishopls"
[186] "misrstls" "misdeprd" "miseffrt" "miswtles" "misnowrk"
[191] "mistmnt" "mistrhlp" "misphlpf" "scntmony" "scntmeal"
[196] "scntpaid" "scntwrk1" "scntlpad" "scntlwk1" "scntvot1"
[201] "rcsgendr" "rcsrltn2" "casthdx2" "casthno2" "emtsuprt"
[206] "lsatisfy" "ctelnum1" "cellfon2" "cadult" "pvtresd2"
[211] "cclghous" "cstate" "landline" "pctcell" "qstver"
[216] "qstlang" "mscode" "X_ststr" "X_strwt" "X_rawrake"
[221] "X_wt2rake" "X_imprace" "X_impnph" "X_impeduc" "X_impmrtl"
[226] "X_imphome" "X_chispnc" "X_crace1" "X_impcage" "X_impcrac"
[231] "X_impcsex" "X_cllcpwt" "X_dualuse" "X_dualcor" "X_llcpwt2"
[236] "X_llcpwt" "X_rfhlth" "X_hcvu651" "X_rfhype5" "X_cholchk"
[241] "X_rfchol" "X_ltasth1" "X_casthm1" "X_asthms1" "X_drdxar1"
[246] "X_prace1" "X_mrace1" "X_hispanc" "X_race" "X_raceg21"
[251] "X_racegr3" "X_race_g1" "X_ageg5yr" "X_age65yr" "X_age_g"
[256] "htin4" "htm4" "wtkg3" "X_bmi5" "X_bmi5cat"
[261] "X_rfbmi5" "X_chldcnt" "X_educag" "X_incomg" "X_smoker3"
[266] "X_rfsmok3" "drnkany5" "drocdy3_" "X_rfbing5" "X_drnkdy4"
[271] "X_drnkmo4" "X_rfdrhv4" "X_rfdrmn4" "X_rfdrwm4" "ftjuda1_"
[276] "frutda1_" "beanday_" "grenday_" "orngday_" "vegeda1_"
[281] "X_misfrtn" "X_misvegn" "X_frtresp" "X_vegresp" "X_frutsum"
[286] "X_vegesum" "X_frtlt1" "X_veglt1" "X_frt16" "X_veg23"
[291] "X_fruitex" "X_vegetex" "X_totinda" "metvl11_" "metvl21_"
[296] "maxvo2_" "fc60_" "actin11_" "actin21_" "padur1_"
[301] "padur2_" "pafreq1_" "pafreq2_" "X_minac11" "X_minac21"
[306] "strfreq_" "pamiss1_" "pamin11_" "pamin21_" "pa1min_"
[311] "pavig11_" "pavig21_" "pa1vigm_" "X_pacat1" "X_paindx1"
[316] "X_pa150r2" "X_pa300r2" "X_pa30021" "X_pastrng" "X_parec1"
[321] "X_pastae1" "X_lmtact1" "X_lmtwrk1" "X_lmtscl1" "X_rfseat2"
[326] "X_rfseat3" "X_flshot6" "X_pneumo2" "X_aidtst3" "X_age80"
# head(brfss, 2) # View the first two rowsResearch quesion 1:
# Get the required variables
brfsshiv <- select(brfss2013, contains("hiv"))
colnames(brfsshiv)## [1] "hivtst6" "hivtstd3"
# The required variable for HIV/AIDS is hivtst6
# Other columns required are sex, X_state and X_ageg5yr
brfss_test <- brfss2013 %>% filter(!is.na(sex), !is.na(hivtst6), !is.na(X_ageg5yr)) %>%
select(sex, hivtst6, X_ageg5yr, X_state)
plot(brfss_test$X_ageg5yr, brfss_test$hivtst6, xlab = "Age", ylab = "HIV/AIDS Status", main = "Age variation with HIV/AIDS status")plot(brfss_test$sex, brfss_test$hivtst6, xlab = "Gender", ylab = "HIV/AIDS Status", main = "Gender compared with HIV/AIDS status")plot(brfss_test$X_state, brfss_test$hivtst6, xlab = "Sates", ylab = "HIV/AIDS Status", main = "HIV/AIDS status across states")Age range 30 to 39 has the highest HIV/AIDS test status with almost equal HIV/AIDS status between male and female. Connecticut has the highest Yes status for HIV/AIDS test status.
Research quesion 2:
# Variables required are sex, fruitju1, fruit1 and genhlth
brfss_fruit_health <- brfss2013 %>% filter(!is.na(sex), !is.na(fruitju1), !is.na(fruit1), !is.na(genhlth)) %>% select(sex, fruitju1, fruit1, genhlth)
ggplot(brfss_fruit_health, aes(fruitju1, ..count.., fill = genhlth)) + geom_histogram(binwidth = 8)ggplot(brfss_fruit_health, aes(fruit1, ..count.., fill = genhlth)) + geom_histogram(binwidth = 8)There is no really clear association between fruit consumption and general health status.
Research quesion 3:
brfss_race_income <- brfss2013 %>% filter(!is.na(rrclass2), !is.na(income2)) %>% select(rrclass2, income2)
plot(brfss_race_income$rrclass2, brfss_race_income$income2, xlab = "Race", ylab = "Income", main = "Relationship between income and race")