Brief Summary on the topic
What is risk factor surveillance?
Keeping track of the rates of risk factors which are the things or states in our daily lives that confers risk to our health is defined as a Risk Factor Surveillance.
There are two main Surveillance systems in the United States
- National Health and Nutrition Examination Survey(NHANES).
- Behavior Risk Factor Surveillance System (BRFSS).
We will be using BRFSS dataset
- BRFSS is a federal and state collaboration
- Data collectors call randomly generated landline phone numbers or cell phone numbers.
- Data from BRFSS are publicly available and can be used widely by CDC for healthcare program planning. And also independent researchers can use this data for in-depth reseacrch and analytics purpose.
Types of BRFSS Analytics
Descriptive Analysis
- Aimed at developing population-based rates.
- Dependent upon the sampling approach–uses “weights”.
- More often done by CDC and states.
Cross-Sectional Analysis
- Aimed at exploring cross-sectional associations (hinting to potential causes)
- Weighting is generally not used.
- More often done by independent researchers.
Resources
BRFSS Resource Provided by CDC.
Environment Setup
# Load the required packages (if packages are not available, install them first)
for (package in c('foreign','gtools','questionr','MASS','caret','readr','ggplot2','magrittr','ggthemes','dplyr','corrplot','caTools')) {
if (!require(package, character.only=T, quietly=T)) {
install.packages(package)
library(package,character.only=T)
}
}
# Package Desription
# foreign : Reads in "foreign" data types.
# gtools : Allows you to make macros.
# dplyr : Helps packages for calculating means and standard deviations.
# questionr : Allows you to do a weighted analysis.
# MASS : Allows you to do bivariate tests.
Reading and Cleaning the Data
##Get the Data
library(foreign)
# Read the csv file and save in an object called "BRFSS_data"
BRFSS_a <- read.xport("C:/CompleteMLProjects/Healthcare/BRFSS/Analytics/Data/LLCP2014.xpt")
colnames(BRFSS_a)
## [1] "X_STATE" "FMONTH" "IDATE" "IMONTH" "IDAY"
## [6] "IYEAR" "DISPCODE" "SEQNO" "X_PSU" "CTELENUM"
## [11] "PVTRESD1" "COLGHOUS" "STATERES" "LADULT" "NUMADULT"
## [16] "NUMMEN" "NUMWOMEN" "GENHLTH" "PHYSHLTH" "MENTHLTH"
## [21] "POORHLTH" "HLTHPLN1" "PERSDOC2" "MEDCOST" "CHECKUP1"
## [26] "EXERANY2" "SLEPTIM1" "CVDINFR4" "CVDCRHD4" "CVDSTRK3"
## [31] "ASTHMA3" "ASTHNOW" "CHCSCNCR" "CHCOCNCR" "CHCCOPD1"
## [36] "HAVARTH3" "ADDEPEV2" "CHCKIDNY" "DIABETE3" "DIABAGE2"
## [41] "LASTDEN3" "RMVTETH3" "VETERAN3" "MARITAL" "CHILDREN"
## [46] "EDUCA" "EMPLOY1" "INCOME2" "WEIGHT2" "HEIGHT3"
## [51] "NUMHHOL2" "NUMPHON2" "CPDEMO1" "INTERNET" "RENTHOM1"
## [56] "SEX" "PREGNANT" "QLACTLM2" "USEEQUIP" "BLIND"
## [61] "DECIDE" "DIFFWALK" "DIFFDRES" "DIFFALON" "SMOKE100"
## [66] "SMOKDAY2" "STOPSMK2" "LASTSMK2" "USENOW3" "ALCDAY5"
## [71] "AVEDRNK2" "DRNK3GE5" "MAXDRNKS" "FLUSHOT6" "FLSHTMY2"
## [76] "PNEUVAC3" "SHINGLE2" "FALL12MN" "FALLINJ2" "SEATBELT"
## [81] "DRNKDRI2" "HADMAM" "HOWLONG" "PROFEXAM" "LENGEXAM"
## [86] "HADPAP2" "LASTPAP2" "HADHYST2" "PCPSAAD2" "PCPSADI1"
## [91] "PCPSARE1" "PSATEST1" "PSATIME" "PCPSARS1" "BLDSTOOL"
## [96] "LSTBLDS3" "HADSIGM3" "HADSGCO1" "LASTSIG3" "HIVTST6"
## [101] "HIVTSTD3" "WHRTST10" "PDIABTST" "PREDIAB1" "INSULIN"
## [106] "BLDSUGAR" "FEETCHK2" "DOCTDIAB" "CHKHEMO3" "FEETCHK"
## [111] "EYEEXAM" "DIABEYE" "DIABEDU" "PAINACT2" "QLMENTL2"
## [116] "QLSTRES2" "QLHLTH2" "MEDICARE" "HLTHCVR1" "DELAYMED"
## [121] "DLYOTHER" "NOCOV121" "LSTCOVRG" "DRVISITS" "MEDSCOST"
## [126] "CARERCVD" "MEDBILL1" "ASBIALCH" "ASBIDRNK" "ASBIBING"
## [131] "ASBIADVC" "ASBIRDUC" "WTCHSALT" "LONGWTCH" "DRADVISE"
## [136] "ASTHMAGE" "ASATTACK" "ASERVIST" "ASDRVIST" "ASRCHKUP"
## [141] "ASACTLIM" "ASYMPTOM" "ASNOSLEP" "ASTHMED3" "ASINHALR"
## [146] "IMFVPLAC" "TETANUS" "HPVTEST" "HPLSTTST" "HPVADVC2"
## [151] "HPVADSHT" "CNCRDIFF" "CNCRAGE" "CNCRTYP1" "CSRVTRT1"
## [156] "CSRVDOC1" "CSRVSUM" "CSRVRTRN" "CSRVINST" "CSRVINSR"
## [161] "CSRVDEIN" "CSRVCLIN" "CSRVPAIN" "CSRVCTL1" "RRCLASS2"
## [166] "RRCOGNT2" "RRATWRK2" "RRHCARE3" "RRPHYSM2" "RREMTSM2"
## [171] "SCNTMNY1" "SCNTMEL1" "SCNTPAID" "SCNTWRK1" "SCNTLPAD"
## [176] "SCNTLWK1" "SCNTVOT1" "SXORIENT" "TRNSGNDR" "RCSGENDR"
## [181] "RCSRLTN2" "CASTHDX2" "CASTHNO2" "EMTSUPRT" "LSATISFY"
## [186] "CTELNUM1" "CELLFON2" "CADULT" "PVTRESD2" "CCLGHOUS"
## [191] "CSTATE" "LANDLINE" "HHADULT" "QSTVER" "QSTLANG"
## [196] "MSCODE" "X_STSTR" "X_STRWT" "X_RAWRAKE" "X_WT2RAKE"
## [201] "X_AGE80" "X_IMPRACE" "X_IMPNPH" "X_CHISPNC" "X_CPRACE"
## [206] "X_CRACE1" "X_IMPCAGE" "X_IMPCRAC" "X_IMPCSEX" "X_CLLCPWT"
## [211] "X_DUALUSE" "X_DUALCOR" "X_LLCPWT2" "X_LLCPWT" "X_RFHLTH"
## [216] "X_HCVU651" "X_TOTINDA" "X_LTASTH1" "X_CASTHM1" "X_ASTHMS1"
## [221] "X_DRDXAR1" "X_EXTETH2" "X_ALTETH2" "X_DENVST2" "X_PRACE1"
## [226] "X_MRACE1" "X_HISPANC" "X_RACE" "X_RACEG21" "X_RACEGR3"
## [231] "X_RACE_G1" "X_AGEG5YR" "X_AGE65YR" "X_AGE_G" "HTIN4"
## [236] "HTM4" "WTKG3" "X_BMI5" "X_BMI5CAT" "X_RFBMI5"
## [241] "X_CHLDCNT" "X_EDUCAG" "X_INCOMG" "X_SMOKER3" "X_RFSMOK3"
## [246] "DRNKANY5" "DROCDY3_" "X_RFBING5" "X_DRNKDY4" "X_DRNKMO4"
## [251] "X_RFDRHV4" "X_RFDRMN4" "X_RFDRWM4" "X_FLSHOT6" "X_PNEUMO2"
## [256] "X_RFSEAT2" "X_RFSEAT3" "X_RFMAM2Y" "X_MAM502Y" "X_MAM5021"
## [261] "X_RFPAP32" "X_RFPAP33" "X_RFPSA21" "X_RFBLDS2" "X_RFBLDS3"
## [266] "X_RFSIGM2" "X_COL10YR" "X_HFOB3YR" "X_FS5YR" "X_FOBTFS"
## [271] "X_CRCREC" "X_AIDTST3" "X_IMPEDUC" "X_IMPMRTL" "X_IMPHOME"
## [276] "RCSBRAC1" "RCSRACE1" "RCHISLA1" "RCSBIRTH"
Subset unwanted values from the dataset
#define object list of variables to be kept
BRFSSVarList <- c("VETERAN3",
"ALCDAY5",
"SLEPTIM1",
"ASTHMA3",
"X_AGE_G",
"SMOKE100",
"SMOKDAY2",
"SEX",
"X_HISPANC",
"X_MRACE1",
"MARITAL",
"GENHLTH",
"HLTHPLN1",
"EDUCA",
"INCOME2",
"X_BMI5CAT",
"EXERANY2")
# subset by varlist
BRFSS_b <- BRFSS_a[BRFSSVarList]
# check columns
colnames(BRFSS_b)
## [1] "VETERAN3" "ALCDAY5" "SLEPTIM1" "ASTHMA3" "X_AGE_G"
## [6] "SMOKE100" "SMOKDAY2" "SEX" "X_HISPANC" "X_MRACE1"
## [11] "MARITAL" "GENHLTH" "HLTHPLN1" "EDUCA" "INCOME2"
## [16] "X_BMI5CAT" "EXERANY2"
# check rows
nrow(BRFSS_b)
## [1] 464664
# SUBSET AND SEPERATE NUMBER OF VETERANS AND NON VETERANS FROM THE DATASET
BRFSS_c <- subset(BRFSS_b,VETERAN3==1)
# Check the variable value
# BRFSS_c$VETERAN3
# Check the number of rows for BRFSS_c
nrow(BRFSS_c)
## [1] 62120
# We can see 62120 number of VETERANS are in the dataset
# Also we now know that 464664 - 62120 = 402544 number of non veterans are in the dataset.
# ONLY KEEP ROWS WITH VALID ALCOHOL/EXPOSURE VARIABLE.
BRFSS_d <- subset(BRFSS_c, ALCDAY5 < 777 | ALCDAY5 == 888)
# Take a look at the data
# BRFSS_d$ALCDAY5
# Check the number of rows for BRFSS_d
nrow(BRFSS_d)
## [1] 58991
# 58991 are the number of veterans that do not consume Alcohol.
# Hence we know now 62120 - 58991 = 3129 number of veterans that consume Alcohol.
# EXCLUDING SLEEP TIME VARIABLES
# Only keep variable with valid sleep data
BRFSS_e <- subset(BRFSS_d,SLEPTIM1 < 77)
# Check the number of rows for BRFSS_e
nrow(BRFSS_e)
## [1] 58321
# 58321 are the number of veterans with valid sleep data.
# Hence we know now 58991 - 58321 = 670 number of veterans that have valid sleep pattern.
# EXCLUDING ASTHMA VARIABLES
# Only keep variables with valid Asthma data
BRFSS_f <- subset(BRFSS_e, ASTHMA3 < 7)
# Check the number of rows for BRFSS_f
nrow(BRFSS_f)
## [1] 58131
# 58131 are the number of veterans with valid Asthma data.
# Hence we know now 58321 - 58131 = 190 number of veterans have valid Asthma data.
Generating Exposure Variable
First, we will go to our exposure, alcohol. Make a grouping variable for alcohol, and indicator variables for drinking monthly and drinking weekly
From the Data Dictionary on the ALCDAY5 tab, we see that if ALCDAY5 falls in this range, 101 to 199, our ALCGRP variable should be coded as three, drink weekly. And those in the 201 to 299 range get a two for drink monthly. And the 888’s get a one for no drinks. And the rest are nine, for unknown.
Also If ALCGRP is two, the drink monthly flag will be one and everyone else gets a zero. If ALCGRP is three, the drink weekly flag will be one and everyone else gets a zero.
# Add Indicator variable for Veterans
# First make copy of the dataset
BRFSS_g <- BRFSS_f
# add the categorical variable set to 9 to the dataset
BRFSS_g$ALCGRP <- 9
# update according to data Dictionary
BRFSS_g$ALCGRP[BRFSS_g$ALCDAY5 < 200] <- 3
BRFSS_g$ALCGRP[BRFSS_g$ALCDAY5 >= 200 & BRFSS_g$ALCDAY5 <777] <- 2
BRFSS_g$ALCGRP[BRFSS_g$ALCDAY5 == 888] <- 1
# Check the variable
table(BRFSS_g$ALCGRP, BRFSS_g$ALCDAY5)
##
## 101 102 103 104 105 106 107 201 202 203 204
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 4353 3271 1890 1576
## 3 2454 1888 1334 634 686 309 2011 0 0 0 0
##
## 205 206 207 208 209 210 211 212 213 214 215
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 1431 564 362 415 28 1152 9 259 12 102 1090
## 3 0 0 0 0 0 0 0 0 0 0 0
##
## 216 217 218 219 220 221 222 223 224 225 226
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 27 14 28 1 1087 31 28 9 32 504 31
## 3 0 0 0 0 0 0 0 0 0 0 0
##
## 227 228 229 230 888
## 1 0 0 0 0 26169
## 2 40 149 81 4070 0
## 3 0 0 0 0 0
# Add flags
# Flags for Monthly drinkers
BRFSS_g$DRKMONTHLY <- 0
BRFSS_g$DRKMONTHLY[BRFSS_g$ALCGRP == 2] <- 1
table(BRFSS_g$ALCGRP,BRFSS_g$DRKMONTHLY)
##
## 0 1
## 1 26169 0
## 2 0 22646
## 3 9316 0
# Flags for Weekly drinkers
BRFSS_g$DRKWEEKLY <- 0
BRFSS_g$DRKWEEKLY[BRFSS_g$ALCGRP == 1] <- 1
table(BRFSS_g$ALCGRP,BRFSS_g$DRKWEEKLY)
##
## 0 1
## 1 0 26169
## 2 22646 0
## 3 9316 0
Generate outcome variables from data dictionary
First, we are going to clean up our outcome variable for sleep duration. Next, we will make sure we have binary variable or flag that is valid for our asthma outcome.
# We need to remove the rows with no information on sleep time and we want to turn our asthma variable into an indicator variable with only ones and zeroes.
# First make copy of the dataset
BRFSS_h <- BRFSS_g
# Make and test sleep variable
# First generate a SLEEPTIM2 variable that is a continuous variable for SLEEEPTIM1 and assign them all as NA
BRFSS_h$SLEPTIM2 <- NA
# Add and check for criteria that SLEPTIM1 cannot be NA and it cannot be 77 and cannot be 99.
BRFSS_h$SLEPTIM2[!is.na(BRFSS_h$SLEPTIM1) & BRFSS_h$SLEPTIM1 !=77 & BRFSS_h$SLEPTIM1 !=99] <- BRFSS_h$SLEPTIM1
# Check the variable
table(BRFSS_h$SLEPTIM1,BRFSS_h$SLEPTIM2)
##
## 1 2 3 4 5 6 7 8 9 10 11
## 1 38 0 0 0 0 0 0 0 0 0 0
## 2 0 134 0 0 0 0 0 0 0 0 0
## 3 0 0 465 0 0 0 0 0 0 0 0
## 4 0 0 0 1687 0 0 0 0 0 0 0
## 5 0 0 0 0 3690 0 0 0 0 0 0
## 6 0 0 0 0 0 11854 0 0 0 0 0
## 7 0 0 0 0 0 0 16557 0 0 0 0
## 8 0 0 0 0 0 0 0 17889 0 0 0
## 9 0 0 0 0 0 0 0 0 3426 0 0
## 10 0 0 0 0 0 0 0 0 0 1705 0
## 11 0 0 0 0 0 0 0 0 0 0 111
## 12 0 0 0 0 0 0 0 0 0 0 0
## 13 0 0 0 0 0 0 0 0 0 0 0
## 14 0 0 0 0 0 0 0 0 0 0 0
## 15 0 0 0 0 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0 0 0 0 0
## 17 0 0 0 0 0 0 0 0 0 0 0
## 18 0 0 0 0 0 0 0 0 0 0 0
## 20 0 0 0 0 0 0 0 0 0 0 0
## 21 0 0 0 0 0 0 0 0 0 0 0
## 22 0 0 0 0 0 0 0 0 0 0 0
## 24 0 0 0 0 0 0 0 0 0 0 0
##
## 12 13 14 15 16 17 18 20 21 22 24
## 1 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0
## 7 0 0 0 0 0 0 0 0 0 0 0
## 8 0 0 0 0 0 0 0 0 0 0 0
## 9 0 0 0 0 0 0 0 0 0 0 0
## 10 0 0 0 0 0 0 0 0 0 0 0
## 11 0 0 0 0 0 0 0 0 0 0 0
## 12 411 0 0 0 0 0 0 0 0 0 0
## 13 0 19 0 0 0 0 0 0 0 0 0
## 14 0 0 38 0 0 0 0 0 0 0 0
## 15 0 0 0 32 0 0 0 0 0 0 0
## 16 0 0 0 0 35 0 0 0 0 0 0
## 17 0 0 0 0 0 3 0 0 0 0 0
## 18 0 0 0 0 0 0 24 0 0 0 0
## 20 0 0 0 0 0 0 0 8 0 0 0
## 21 0 0 0 0 0 0 0 0 1 0 0
## 22 0 0 0 0 0 0 0 0 0 2 0
## 24 0 0 0 0 0 0 0 0 0 0 2
Make and test asthma variable
# Assign 9 to ASTHMA4
BRFSS_h$ASTHMA4 <- 9
# Then assign 1 to all who have reported ASTHMA
BRFSS_h$ASTHMA4[BRFSS_h$ASTHMA3 == 1] <- 1
# Then assign 0 to all who have reported ASTHMA
BRFSS_h$ASTHMA4[BRFSS_h$ASTHMA3 == 2] <- 0
# Check the variable
table(BRFSS_h$ASTHMA3,BRFSS_h$ASTHMA4)
##
## 0 1
## 1 0 5343
## 2 52788 0
Generating Age variables
# First make copy of the dataset
BRFSS_i <- BRFSS_h
# From the data dictionary by default set the value of all age groups to 0.
# Age group 18 to 24 we can keep it as reference group
BRFSS_i$AGE2 <- 0 # Age 25 to 34
BRFSS_i$AGE3 <- 0 # Age 35 to 44
BRFSS_i$AGE4 <- 0 # Age 45 to 54
BRFSS_i$AGE5 <- 0 # Age 55 to 64
BRFSS_i$AGE6 <- 0 # Age 65 and older
# set conditions to update the flags
BRFSS_i$AGE2[BRFSS_i$X_AGE_G == 2] <- 1
table(BRFSS_i$X_AGE_G,BRFSS_i$AGE2)
##
## 0 1
## 1 899 0
## 2 0 2657
## 3 3589 0
## 4 6543 0
## 5 10724 0
## 6 33719 0
BRFSS_i$AGE3[BRFSS_i$X_AGE_G == 3] <- 1
table(BRFSS_i$X_AGE_G,BRFSS_i$AGE3)
##
## 0 1
## 1 899 0
## 2 2657 0
## 3 0 3589
## 4 6543 0
## 5 10724 0
## 6 33719 0
BRFSS_i$AGE4[BRFSS_i$X_AGE_G == 4] <- 1
table(BRFSS_i$X_AGE_G,BRFSS_i$AGE4)
##
## 0 1
## 1 899 0
## 2 2657 0
## 3 3589 0
## 4 0 6543
## 5 10724 0
## 6 33719 0
BRFSS_i$AGE5[BRFSS_i$X_AGE_G == 5] <- 1
table(BRFSS_i$X_AGE_G,BRFSS_i$AGE5)
##
## 0 1
## 1 899 0
## 2 2657 0
## 3 3589 0
## 4 6543 0
## 5 0 10724
## 6 33719 0
BRFSS_i$AGE6[BRFSS_i$X_AGE_G == 6] <- 1
table(BRFSS_i$X_AGE_G,BRFSS_i$AGE6)
##
## 0 1
## 1 899 0
## 2 2657 0
## 3 3589 0
## 4 6543 0
## 5 10724 0
## 6 0 33719
MAKE SMOKING VARIABLES
# Make smoking variables
BRFSS_i$NEVERSMK <- 0
BRFSS_i$NEVERSMK [BRFSS_i$SMOKE100 == 2] <- 1
table(BRFSS_i$SMOKE100,BRFSS_i$NEVERSMK)
##
## 0 1
## 1 35267 0
## 2 0 22622
## 7 208 0
## 9 33 0
# Make grouping variable
BRFSS_i$SMOKGRP <- 9
BRFSS_i$SMOKGRP[BRFSS_i$SMOKDAY2 == 1 | BRFSS_i$SMOKDAY2 == 2] <- 1
BRFSS_i$SMOKGRP[BRFSS_i$SMOKDAY2 == 3 | BRFSS_i$NEVERSMK == 1] <- 2
table(BRFSS_i$SMOKGRP,BRFSS_i$SMOKDAY2)
##
## 1 2 3 7 9
## 1 6476 2095 0 0 0
## 2 0 0 26639 0 0
## 9 0 0 0 26 31
table(BRFSS_i$SMOKGRP,BRFSS_i$SMOKE100)
##
## 1 2 7 9
## 1 8571 0 0 0
## 2 26639 22622 0 0
## 9 57 0 208 33
BRFSS_i$SMOKER <- 0
BRFSS_i$SMOKER[BRFSS_i$SMOKGRP == 1] <- 1
table(BRFSS_i$SMOKGRP, BRFSS_i$SMOKER)
##
## 0 1
## 1 0 8571
## 2 49261 0
## 9 299 0
Make Sex variable
BRFSS_i$MALE <- 0
BRFSS_i$MALE[BRFSS_i$SEX == 1] <- 1
table(BRFSS_i$MALE, BRFSS_i$SEX)
##
## 1 2
## 0 0 5160
## 1 52971 0
Make Hispanic variable
BRFSS_i$HISPANIC <- 0
BRFSS_i$HISPANIC[BRFSS_i$X_HISPANC == 1] <- 1
table(BRFSS_i$HISPANIC, BRFSS_i$X_HISPANC)
##
## 1 2 9
## 0 0 55262 607
## 1 2262 0 0
Make Race variables
BRFSS_i$RACEGRP <- 9
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 1] <- 1
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 2] <- 2
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 3] <- 3
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 4] <- 4
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 5] <- 5
BRFSS_i$RACEGRP[BRFSS_i$X_MRACE1 == 6 | BRFSS_i$X_MRACE1 == 7] <- 6
table(BRFSS_i$RACEGRP , BRFSS_i$X_MRACE1)
##
## 1 2 3 4 5 6 7 77 99
## 1 49394 0 0 0 0 0 0 0 0
## 2 0 3939 0 0 0 0 0 0 0
## 3 0 0 930 0 0 0 0 0 0
## 4 0 0 0 557 0 0 0 0 0
## 5 0 0 0 0 261 0 0 0 0
## 6 0 0 0 0 0 656 1400 0 0
## 9 0 0 0 0 0 0 0 182 797
BRFSS_i$BLACK <- 0
BRFSS_i$ASIAN <- 0
BRFSS_i$OTHRACE <- 0
BRFSS_i$BLACK[BRFSS_i$RACEGRP == 2] <- 1
table(BRFSS_i$RACEGRP, BRFSS_i$BLACK)
##
## 0 1
## 1 49394 0
## 2 0 3939
## 3 930 0
## 4 557 0
## 5 261 0
## 6 2056 0
## 9 994 0
BRFSS_i$ASIAN[BRFSS_i$RACEGRP == 4] <- 1
table(BRFSS_i$RACEGRP, BRFSS_i$ASIAN)
##
## 0 1
## 1 49394 0
## 2 3939 0
## 3 930 0
## 4 0 557
## 5 261 0
## 6 2056 0
## 9 994 0
BRFSS_i$OTHRACE[BRFSS_i$RACEGRP == 3 | BRFSS_i$RACEGRP == 5 | BRFSS_i$RACEGRP == 6 | BRFSS_i$RACEGRP == 7] <- 1
table(BRFSS_i$RACEGRP, BRFSS_i$OTHRACE)
##
## 0 1
## 1 49394 0
## 2 3939 0
## 3 0 930
## 4 557 0
## 5 0 261
## 6 0 2056
## 9 994 0
Make Marital variables
BRFSS_i$MARGRP <- 9
BRFSS_i$MARGRP[BRFSS_i$MARITAL == 1 | BRFSS_i$MARITAL == 5] <- 1
BRFSS_i$MARGRP[BRFSS_i$MARITAL == 2 | BRFSS_i$MARITAL == 3 ] <- 2
BRFSS_i$MARGRP[BRFSS_i$MARITAL == 4] <- 3
table(BRFSS_i$MARGRP, BRFSS_i$MARITAL)
##
## 1 2 3 4 5 6 9
## 1 35855 0 0 0 4696 0 0
## 2 0 8396 7192 0 0 0 0
## 3 0 0 0 982 0 0 0
## 9 0 0 0 0 0 796 214
BRFSS_i$NEVERMAR <- 0
BRFSS_i$FORMERMAR <- 0
BRFSS_i$NEVERMAR[BRFSS_i$MARGRP == 3] <- 1
table(BRFSS_i$MARGRP, BRFSS_i$NEVERMAR)
##
## 0 1
## 1 40551 0
## 2 15588 0
## 3 0 982
## 9 1010 0
BRFSS_i$FORMERMAR[BRFSS_i$MARGRP == 2] <- 1
table(BRFSS_i$MARGRP, BRFSS_i$FORMERMAR)
##
## 0 1
## 1 40551 0
## 2 0 15588
## 3 982 0
## 9 1010 0
Make Genhealth variables
BRFSS_i$GENHLTH2 <- 9
BRFSS_i$GENHLTH2[BRFSS_i$GENHLTH == 1] <- 1
BRFSS_i$GENHLTH2[BRFSS_i$GENHLTH == 2] <- 2
BRFSS_i$GENHLTH2[BRFSS_i$GENHLTH == 3] <- 3
BRFSS_i$GENHLTH2[BRFSS_i$GENHLTH == 4] <- 4
BRFSS_i$GENHLTH2[BRFSS_i$GENHLTH == 5] <- 5
table(BRFSS_i$GENHLTH2, BRFSS_i$GENHLTH)
##
## 1 2 3 4 5 7 9
## 1 9016 0 0 0 0 0 0
## 2 0 18111 0 0 0 0 0
## 3 0 0 18797 0 0 0 0
## 4 0 0 0 8436 0 0 0
## 5 0 0 0 0 3569 0 0
## 9 0 0 0 0 0 100 102
BRFSS_i$FAIRHLTH <- 0
BRFSS_i$POORHLTH <- 0
BRFSS_i$FAIRHLTH [BRFSS_i$GENHLTH2 == 4] <- 1
table(BRFSS_i$FAIRHLTH, BRFSS_i$GENHLTH2)
##
## 1 2 3 4 5 9
## 0 9016 18111 18797 0 3569 202
## 1 0 0 0 8436 0 0
BRFSS_i$POORHLTH [BRFSS_i$GENHLTH2 == 5] <- 1
table(BRFSS_i$POORHLTH, BRFSS_i$GENHLTH2)
##
## 1 2 3 4 5 9
## 0 9016 18111 18797 8436 0 202
## 1 0 0 0 0 3569 0
Make health plan variables
BRFSS_i$HLTHPLN2 <- 9
BRFSS_i$HLTHPLN2[BRFSS_i$HLTHPLN1 == 1] <- 1
BRFSS_i$HLTHPLN2[BRFSS_i$HLTHPLN1 == 2] <- 2
table(BRFSS_i$HLTHPLN1, BRFSS_i$HLTHPLN2)
##
## 1 2 9
## 1 55795 0 0
## 2 0 2203 0
## 7 0 0 47
## 9 0 0 86
BRFSS_i$NOPLAN <- 0
BRFSS_i$NOPLAN [BRFSS_i$HLTHPLN2== 2] <- 1
table(BRFSS_i$NOPLAN, BRFSS_i$HLTHPLN2)
##
## 1 2 9
## 0 55795 0 133
## 1 0 2203 0
Make education variables
BRFSS_i$EDGROUP <- 9
BRFSS_i$EDGROUP[BRFSS_i$EDUCA == 1 | BRFSS_i$EDUCA == 2 | BRFSS_i$EDUCA == 3] <- 1
BRFSS_i$EDGROUP[BRFSS_i$EDUCA == 4] <- 2
BRFSS_i$EDGROUP[BRFSS_i$EDUCA == 5] <- 3
BRFSS_i$EDGROUP[BRFSS_i$EDUCA == 6] <- 4
table(BRFSS_i$EDGROUP, BRFSS_i$EDUCA)
##
## 1 2 3 4 5 6 9
## 1 33 704 1746 0 0 0 0
## 2 0 0 0 16241 0 0 0
## 3 0 0 0 0 17559 0 0
## 4 0 0 0 0 0 21742 0
## 9 0 0 0 0 0 0 106
BRFSS_i$LOWED <- 0
BRFSS_i$SOMECOLL <- 0
BRFSS_i$LOWED[BRFSS_i$EDGROUP == 1 | BRFSS_i$EDGROUP == 2 ] <- 1
table(BRFSS_i$LOWED, BRFSS_i$EDGROUP)
##
## 1 2 3 4 9
## 0 0 0 17559 21742 106
## 1 2483 16241 0 0 0
BRFSS_i$SOMECOLL [BRFSS_i$EDGROUP == 3] <- 1
table(BRFSS_i$SOMECOLL, BRFSS_i$EDGROUP)
##
## 1 2 3 4 9
## 0 2483 16241 0 21742 106
## 1 0 0 17559 0 0
Make income variables
BRFSS_i$INCOME3 <- BRFSS_i$INCOME2
BRFSS_i$INCOME3[BRFSS_i$INCOME2 >=77] <- 9
table(BRFSS_i$INCOME2, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 1 1165 0 0 0 0 0 0 0 0
## 2 0 2111 0 0 0 0 0 0 0
## 3 0 0 3148 0 0 0 0 0 0
## 4 0 0 0 4774 0 0 0 0 0
## 5 0 0 0 0 6491 0 0 0 0
## 6 0 0 0 0 0 9305 0 0 0
## 7 0 0 0 0 0 0 9636 0 0
## 8 0 0 0 0 0 0 0 15230 0
## 77 0 0 0 0 0 0 0 0 2132
## 99 0 0 0 0 0 0 0 0 4139
BRFSS_i$INC1 <- 0
BRFSS_i$INC2 <- 0
BRFSS_i$INC3 <- 0
BRFSS_i$INC4 <- 0
BRFSS_i$INC5 <- 0
BRFSS_i$INC6 <- 0
BRFSS_i$INC7 <- 0
BRFSS_i$INC1[BRFSS_i$INCOME3 == 1] <- 1
table(BRFSS_i$INC1, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 0 2111 3148 4774 6491 9305 9636 15230 6271
## 1 1165 0 0 0 0 0 0 0 0
BRFSS_i$INC2[BRFSS_i$INCOME3 == 2] <- 1
table(BRFSS_i$INC2, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 0 3148 4774 6491 9305 9636 15230 6271
## 1 0 2111 0 0 0 0 0 0 0
BRFSS_i$INC3[BRFSS_i$INCOME3 == 3] <- 1
table(BRFSS_i$INC3, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 2111 0 4774 6491 9305 9636 15230 6271
## 1 0 0 3148 0 0 0 0 0 0
BRFSS_i$INC4[BRFSS_i$INCOME3 == 4] <- 1
table(BRFSS_i$INC4, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 2111 3148 0 6491 9305 9636 15230 6271
## 1 0 0 0 4774 0 0 0 0 0
BRFSS_i$INC5[BRFSS_i$INCOME3 == 5] <- 1
table(BRFSS_i$INC5, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 2111 3148 4774 0 9305 9636 15230 6271
## 1 0 0 0 0 6491 0 0 0 0
BRFSS_i$INC6[BRFSS_i$INCOME3 == 6] <- 1
table(BRFSS_i$INC6, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 2111 3148 4774 6491 0 9636 15230 6271
## 1 0 0 0 0 0 9305 0 0 0
BRFSS_i$INC7[BRFSS_i$INCOME3 == 7] <- 1
table(BRFSS_i$INC7, BRFSS_i$INCOME3)
##
## 1 2 3 4 5 6 7 8 9
## 0 1165 2111 3148 4774 6491 9305 0 15230 6271
## 1 0 0 0 0 0 0 9636 0 0
Make BMI variables
BRFSS_i$BMICAT<- 9
BRFSS_i$BMICAT[BRFSS_i$X_BMI5CAT ==1] <- 1
BRFSS_i$BMICAT[BRFSS_i$X_BMI5CAT ==2] <- 2
BRFSS_i$BMICAT[BRFSS_i$X_BMI5CAT ==3] <- 3
BRFSS_i$BMICAT[BRFSS_i$X_BMI5CAT ==4] <- 4
table(BRFSS_i$BMICAT, BRFSS_i$X_BMI5CAT)
##
## 1 2 3 4
## 1 478 0 0 0
## 2 0 14340 0 0
## 3 0 0 25572 0
## 4 0 0 0 16871
## 9 0 0 0 0
BRFSS_i$UNDWT <- 0
BRFSS_i$OVWT <- 0
BRFSS_i$OBESE <- 0
BRFSS_i$UNDWT[BRFSS_i$BMICAT== 1] <- 1
table(BRFSS_i$UNDWT, BRFSS_i$BMICAT)
##
## 1 2 3 4 9
## 0 0 14340 25572 16871 870
## 1 478 0 0 0 0
BRFSS_i$OVWT[BRFSS_i$BMICAT== 3] <- 1
table(BRFSS_i$OVWT, BRFSS_i$BMICAT)
##
## 1 2 3 4 9
## 0 478 14340 0 16871 870
## 1 0 0 25572 0 0
BRFSS_i$OBESE[BRFSS_i$BMICAT== 4] <- 1
table(BRFSS_i$OBESE, BRFSS_i$BMICAT)
##
## 1 2 3 4 9
## 0 478 14340 25572 0 870
## 1 0 0 0 16871 0
Make exercise variables
BRFSS_i$EXERANY3<- 9
BRFSS_i$EXERANY3[BRFSS_i$EXERANY2 ==1] <- 1
BRFSS_i$EXERANY3[BRFSS_i$EXERANY2 ==2] <- 2
table(BRFSS_i$EXERANY3, BRFSS_i$EXERANY2)
##
## 1 2 7 9
## 1 44357 0 0 0
## 2 0 13641 0 0
## 9 0 0 57 75
BRFSS_i$NOEXER <- 0
BRFSS_i$NOEXER[BRFSS_i$EXERANY3 ==2] <- 1
table(BRFSS_i$NOEXER, BRFSS_i$EXERANY3)
##
## 1 2 9
## 0 44357 0 133
## 1 0 13641 0
nrow(BRFSS_i)
## [1] 58131
Write out analytic dataset
write.csv(BRFSS_i, file = "analytic.csv")
Finally now that we have a clean dataset in hand.Let us analyze this data.
#read in analytic table
analytic <- read.csv(file="C:/CompleteMLProjects/Healthcare/BRFSS/Analytics/Code/analytic.csv", header=TRUE, sep=",")
#Look at distribution of categorical outcome asthma
AsthmaFreq <- table(analytic$ASTHMA4)
AsthmaFreq
##
## 0 1
## 52788 5343
write.csv(AsthmaFreq, file = "AsthmaFreq.csv")
#what proportion of our dataset has ashtma?
PropAsthma <- 5343/52788
PropAsthma
## [1] 0.1012162
#Look at categorical outcome asthma by exposure, ALCGRP
AsthmaAlcFreq <- table(analytic$ASTHMA4, analytic$ALCGRP)
AsthmaAlcFreq
##
## 1 2 3
## 0 23498 20749 8541
## 1 2671 1897 775
write.csv(AsthmaAlcFreq, file = "AsthmaAlcFreq.csv")
Look at distribution of sleep duration
#summary statistics
summary(analytic$SLEPTIM2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 6.000 7.000 7.116 8.000 24.000
#look at histogram and box plot of total file
hist(analytic$SLEPTIM2,
main = "Histogram of SLEPTIM2",
xlab = "Class SLEPTIM2",
ylab = "Frequency",
xlim=c(0,15),
ylim=c(0,20000),
border = "red",
col= "yellow",
las = 1,
breaks = 24)

boxplot(analytic$SLEPTIM2, main="Box Plot of SLEPTIM2",
xlab="Total File", ylab="SLEPTIM2")

#See box plots of groups next to each other
boxplot(SLEPTIM2~ALCGRP, data=analytic, main="Box Plot of SLEPTIM2 by ALCGRP",
xlab="ALCGRP", ylab="SLEPTIM2")

Making frequencies per category
AsthmaFreq <- table(analytic$ASTHMA4)
AsthmaFreq
##
## 0 1
## 52788 5343
write.csv(AsthmaFreq, file = "AsthmaFreq.csv")
AlcFreq <- table(analytic$ALCGRP)
AlcFreq
##
## 1 2 3
## 26169 22646 9316
write.csv(AlcFreq , file = "AlcFreq.csv")
#USING MACROS
#install package gtools
#then call up library
library(gtools)
#use defmacro to define the macro
FreqTbl <-defmacro(OutputTable, InputVar, CSVTable,
expr={
OutputTable <- table(InputVar);
write.csv(OutputTable, file = paste0(CSVTable, ".csv"))
})
FreqTbl (AlcFreq, analytic$ALCGRP, "Alc")
FreqTbl (AgeFreq, analytic$X_AGE_G, "Age")
FreqTbl (SexFreq, analytic$SEX, "Sex")
FreqTbl (HispFreq, analytic$X_HISPANC, "Hisp")
FreqTbl (RaceFreq, analytic$RACEGRP, "Race")
FreqTbl (MaritalFreq, analytic$MARGRP, "Mar")
FreqTbl (EdFreq, analytic$EDGROUP, "Ed")
FreqTbl (IncFreq, analytic$INCOME3, "Inc")
FreqTbl (BMIFreq, analytic$BMICAT, "BMI")
FreqTbl (SmokeFreq, analytic$SMOKGRP, "Smok")
FreqTbl (ExerFreq, analytic$EXERANY3, "Exer")
FreqTbl (HlthPlanFreq, analytic$HLTHPLN2, "HlthPln")
FreqTbl (GenHlthFreq, analytic$GENHLTH2, "GenHlth")
Checking for No Asthma frequencies
### Subset dataset with only asthma people
asthmaonly <- subset(analytic, ASTHMA4 == 1)
table(asthmaonly$ASTHMA4)
##
## 1
## 5343
nrow(asthmaonly)
## [1] 5343
AsthmaFreq <- table(asthmaonly$ASTHMA4)
AsthmaFreq
##
## 1
## 5343
write.csv(AsthmaFreq, file = "Asthma.csv")
#USING MACROS
library(gtools)
#use defmacro to define the macro
FreqTbl <-defmacro(OutputTable, InputVar, CSVTable,
expr={
OutputTable <- table(InputVar);
write.csv(OutputTable, file = paste0(CSVTable, ".csv"))
})
FreqTbl (AlcGrpFreq, asthmaonly$ALCGRP, "Alc")
FreqTbl (AgeGrpFreq, asthmaonly$X_AGE_G, "Age")
FreqTbl (SexFreq, asthmaonly$SEX, "Sex")
FreqTbl (HispFreq, asthmaonly$X_HISPANC, "Hisp")
FreqTbl (RaceFreq, asthmaonly$RACEGRP, "Race")
FreqTbl (MaritalFreq, asthmaonly$MARGRP, "Mar")
FreqTbl (EdFreq, asthmaonly$EDGROUP, "Ed")
FreqTbl (IncFreq, asthmaonly$INCOME3, "Inc")
FreqTbl (BMIFreq, asthmaonly$BMICAT, "BMI")
FreqTbl (SmokeFreq, asthmaonly$SMOKGRP, "Smok")
FreqTbl (ExerFreq, asthmaonly$EXERANY3, "Exer")
FreqTbl (HlthPlanFreq, asthmaonly$HLTHPLN2, "HlthPln")
FreqTbl (GenHlthFreq, asthmaonly$GENHLTH2, "GenHlth")
Checking for No Asthma frequencies
#subset dataset with only asthma people
noasthmaonly <- subset(analytic, ASTHMA4 != 1)
table(noasthmaonly $ASTHMA4)
##
## 0
## 52788
nrow(noasthmaonly)
## [1] 52788
AsthmaFreq <- table(noasthmaonly$ASTHMA4)
AsthmaFreq
##
## 0
## 52788
write.csv(AsthmaFreq, file = "Asthma.csv")
#USING MACROS
library(gtools)
#use defmacro to define the macro
FreqTbl <-defmacro(OutputTable, InputVar, CSVTable,
expr={
OutputTable <- table(InputVar);
write.csv(OutputTable, file = paste0(CSVTable, ".csv"))
})
FreqTbl (AlcGrpFreq, noasthmaonly$ALCGRP, "Alc")
FreqTbl (AgeGrpFreq, noasthmaonly$X_AGE_G, "Age")
FreqTbl (SexFreq, noasthmaonly$SEX, "Sex")
FreqTbl (HispFreq, noasthmaonly$X_HISPANC, "Hisp")
FreqTbl (RaceFreq, noasthmaonly$RACEGRP, "Race")
FreqTbl (MaritalFreq, noasthmaonly$MARGRP, "Mar")
FreqTbl (EdFreq, noasthmaonly$EDGROUP, "Ed")
FreqTbl (IncFreq, noasthmaonly$INCOME3, "Inc")
FreqTbl (BMIFreq, noasthmaonly$BMICAT, "BMI")
FreqTbl (SmokeFreq, noasthmaonly$SMOKGRP, "Smok")
FreqTbl (ExerFreq, noasthmaonly$EXERANY3, "Exer")
FreqTbl (HlthPlanFreq, noasthmaonly$HLTHPLN2, "HlthPln")
FreqTbl (GenHlthFreq, noasthmaonly$GENHLTH2, "GenHlth")
Means and Standard Deviations
mean(analytic$SLEPTIM2)
## [1] 7.115756
sd(analytic$SLEPTIM2)
## [1] 1.468601
#load package plyr
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
#example
ddply(analytic,~ALCGRP,summarise,mean=mean(SLEPTIM2),sd=sd(SLEPTIM2))
## ALCGRP mean sd
## 1 1 7.126103 1.593871
## 2 2 7.090259 1.375262
## 3 3 7.148669 1.312195
#USING MACROS
library(gtools)
SumTbl <- defmacro(OutputTable, GroupVar, CSVTable,
expr={
OutputTable <- ddply(analytic,~GroupVar,summarise,mean=mean(SLEPTIM2),sd=sd(SLEPTIM2));
write.csv(OutputTable, file = paste0(CSVTable, ".csv"))
})
SumTbl (AlcGrpSum, analytic$ALCGRP, "Alc")
SumTbl (AgeGrpSum, analytic$X_AGE_G, "Age")
SumTbl (SexSum, analytic$SEX, "Sex")
SumTbl (HispSum, analytic$X_HISPANC, "Hisp")
SumTbl (RaceSum, analytic$RACEGRP, "Race")
SumTbl (MaritalSum, analytic$MARGRP, "Mar")
SumTbl (EdSum, analytic$EDGROUP, "Ed")
SumTbl (IncSum, analytic$INCOME3, "Inc")
SumTbl (BMISum, analytic$BMICAT, "BMI")
SumTbl (SmokeSum, analytic$SMOKGRP, "Smok")
SumTbl (ExerSum, analytic$EXERANY3, "Exer")
SumTbl (HlthPlanSum, analytic$HLTHPLN2, "HlthPln")
SumTbl (GenHlthSum, analytic$GENHLTH2, "GenHlth")
weights example
WeightVarList <- c("X_STATE", "X_LLCPWT", "ASTHMA3")
BRFSS_weights <- subset(BRFSS_a[WeightVarList])
colnames(BRFSS_weights)
## [1] "X_STATE" "X_LLCPWT" "ASTHMA3"
nrow(BRFSS_weights)
## [1] 464664
#use questionr package
library(questionr)
WeightedAsthma <- wtd.table(BRFSS_weights$ASTHMA3,
y=BRFSS_weights$X_STATE, weights = BRFSS_weights$X_LLCPWT, normwt = FALSE, na.rm = TRUE,
na.show = FALSE)
write.csv(WeightedAsthma, file = "WeightedAsthma.csv")
Table1 Chisq
#load MASS library
library(MASS)
#make table
AlcTbl = table(analytic$ASTHMA4, analytic$ALCGRP)
#run test
chisq.test(AlcTbl)
##
## Pearson's Chi-squared test
##
## data: AlcTbl
## X-squared = 58.823, df = 2, p-value = 1.686e-13
#make macro
library(gtools)
ChiTest <- defmacro(VarName, TblName, expr={
TblName = table(analytic$ASTHMA4, analytic$VarName);
chisq.test(TblName)})
ChiTest(ALCGRP, AlcTbl)
##
## Pearson's Chi-squared test
##
## data: AlcTbl
## X-squared = 58.823, df = 2, p-value = 1.686e-13
ChiTest(X_AGE_G, AgeTbl)
##
## Pearson's Chi-squared test
##
## data: AgeTbl
## X-squared = 54.193, df = 5, p-value = 1.913e-10
ChiTest(SEX, SexTbl)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: SexTbl
## X-squared = 250, df = 1, p-value < 2.2e-16
ChiTest(X_HISPANC, HispTbl)
##
## Pearson's Chi-squared test
##
## data: HispTbl
## X-squared = 4.7509, df = 2, p-value = 0.09297
ChiTest(RACEGRP, RaceTbl)
##
## Pearson's Chi-squared test
##
## data: RaceTbl
## X-squared = 97.668, df = 6, p-value < 2.2e-16
ChiTest(MARGRP, MarTbl)
##
## Pearson's Chi-squared test
##
## data: MarTbl
## X-squared = 51.822, df = 3, p-value = 3.269e-11
ChiTest(EDGROUP, EdTbl)
##
## Pearson's Chi-squared test
##
## data: EdTbl
## X-squared = 59.697, df = 4, p-value = 3.359e-12
ChiTest(INCOME3, IncTbl)
##
## Pearson's Chi-squared test
##
## data: IncTbl
## X-squared = 269.59, df = 8, p-value < 2.2e-16
ChiTest(BMICAT, BMITbl)
##
## Pearson's Chi-squared test
##
## data: BMITbl
## X-squared = 154.35, df = 4, p-value < 2.2e-16
ChiTest(SMOKGRP, SmokTbl)
##
## Pearson's Chi-squared test
##
## data: SmokTbl
## X-squared = 34.156, df = 2, p-value = 3.829e-08
ChiTest(EXERANY3, ExerTbl)
##
## Pearson's Chi-squared test
##
## data: ExerTbl
## X-squared = 116.25, df = 2, p-value < 2.2e-16
ChiTest(HLTHPLN2, HlthPlnTbl)
##
## Pearson's Chi-squared test
##
## data: HlthPlnTbl
## X-squared = 6.3515, df = 2, p-value = 0.04176
ChiTest(GENHLTH2, GenHlthTbl)
##
## Pearson's Chi-squared test
##
## data: GenHlthTbl
## X-squared = 929.84, df = 5, p-value < 2.2e-16
ANOVAS for Table 1
#example ANOVA
AlcANOVA <- lm(formula = SLEPTIM2 ~ ALCGRP, data = analytic)
summary(AlcANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ ALCGRP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1173 -1.1149 -0.1149 0.8839 16.8851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.113753 0.015596 456.129 <2e-16 ***
## ALCGRP 0.001171 0.008396 0.139 0.889
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 58129 degrees of freedom
## Multiple R-squared: 3.347e-07, Adjusted R-squared: -1.687e-05
## F-statistic: 0.01945 on 1 and 58129 DF, p-value: 0.8891
#make macro
library(gtools)
ANOVATest <- defmacro(VarName, TblName, expr={
TblName<- lm(formula = SLEPTIM2 ~ VarName, data = analytic);
summary(TblName)})
#call macro
ANOVATest (ALCGRP, AlcANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ ALCGRP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1173 -1.1149 -0.1149 0.8839 16.8851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.113753 0.015596 456.129 <2e-16 ***
## ALCGRP 0.001171 0.008396 0.139 0.889
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 58129 degrees of freedom
## Multiple R-squared: 3.347e-07, Adjusted R-squared: -1.687e-05
## F-statistic: 0.01945 on 1 and 58129 DF, p-value: 0.8891
ANOVATest (X_AGE_G, AgeANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ X_AGE_G, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3304 -0.8282 -0.0793 0.6696 17.4229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.823829 0.025087 232.15 <2e-16 ***
## X_AGE_G 0.251102 0.004737 53.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.434 on 58129 degrees of freedom
## Multiple R-squared: 0.04611, Adjusted R-squared: 0.0461
## F-statistic: 2810 on 1 and 58129 DF, p-value: < 2.2e-16
ANOVATest (X_HISPANC, HispANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ X_HISPANC, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1151 -1.1151 -0.1151 0.8849 16.8849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.074865 0.017791 397.667 <2e-16 ***
## X_HISPANC 0.020102 0.008217 2.446 0.0144 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 58129 degrees of freedom
## Multiple R-squared: 0.0001029, Adjusted R-squared: 8.573e-05
## F-statistic: 5.984 on 1 and 58129 DF, p-value: 0.01444
ANOVATest (RACEGRP, RaceANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ RACEGRP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1447 -1.1447 -0.1447 0.8553 16.9811
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.207593 0.008676 830.73 <2e-16 ***
## RACEGRP -0.062898 0.004239 -14.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 58129 degrees of freedom
## Multiple R-squared: 0.003773, Adjusted R-squared: 0.003756
## F-statistic: 220.1 on 1 and 58129 DF, p-value: < 2.2e-16
ANOVATest (MARGRP, MarANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ MARGRP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1312 -1.0962 -0.1312 0.8688 16.9389
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.166251 0.009925 722.058 < 2e-16 ***
## MARGRP -0.035043 0.005439 -6.443 1.18e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.468 on 58129 degrees of freedom
## Multiple R-squared: 0.0007136, Adjusted R-squared: 0.0006964
## F-statistic: 41.51 on 1 and 58129 DF, p-value: 1.181e-10
ANOVATest (EDGROUP, EdANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ EDGROUP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1284 -1.1026 -0.1155 0.8845 16.8974
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.076713 0.020434 346.323 <2e-16 ***
## EDGROUP 0.012928 0.006458 2.002 0.0453 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 58129 degrees of freedom
## Multiple R-squared: 6.893e-05, Adjusted R-squared: 5.172e-05
## F-statistic: 4.007 on 1 and 58129 DF, p-value: 0.04532
ANOVATest (INCOME3, IncANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ INCOME3, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1601 -1.0937 -0.1103 0.8731 16.9229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.010660 0.020018 350.218 < 2e-16 ***
## INCOME3 0.016604 0.003013 5.511 3.58e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.468 on 58129 degrees of freedom
## Multiple R-squared: 0.0005223, Adjusted R-squared: 0.0005051
## F-statistic: 30.37 on 1 and 58129 DF, p-value: 3.578e-08
ANOVATest (BMICAT, BMIANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ BMICAT, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2210 -1.0718 -0.1216 0.8784 16.8784
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.270774 0.019127 380.128 <2e-16 ***
## BMICAT -0.049735 0.005818 -8.549 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.468 on 58129 degrees of freedom
## Multiple R-squared: 0.001256, Adjusted R-squared: 0.001239
## F-statistic: 73.09 on 1 and 58129 DF, p-value: < 2.2e-16
ANOVATest (SMOKGRP, SmokANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ SMOKGRP, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1311 -1.1311 -0.1311 0.8689 17.0067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.855398 0.019435 352.74 <2e-16 ***
## SMOKGRP 0.137860 0.009774 14.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.466 on 58129 degrees of freedom
## Multiple R-squared: 0.003411, Adjusted R-squared: 0.003394
## F-statistic: 198.9 on 1 and 58129 DF, p-value: < 2.2e-16
ANOVATest (EXERANY3, ExerANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ EXERANY3, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1284 -1.1115 -0.1115 0.8885 16.8885
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.09463 0.01486 477.468 <2e-16 ***
## EXERANY3 0.01686 0.01082 1.559 0.119
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.469 on 58129 degrees of freedom
## Multiple R-squared: 4.18e-05, Adjusted R-squared: 2.46e-05
## F-statistic: 2.43 on 1 and 58129 DF, p-value: 0.119
ANOVATest (HLTHPLN2, HlthPlnANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ HLTHPLN2, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1191 -1.1191 -0.1191 0.8809 16.8809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.17916 0.01629 440.606 < 2e-16 ***
## HLTHPLN2 -0.06003 0.01431 -4.195 2.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.468 on 58129 degrees of freedom
## Multiple R-squared: 0.0003027, Adjusted R-squared: 0.0002855
## F-statistic: 17.6 on 1 and 58129 DF, p-value: 2.73e-05
ANOVATest (GENHLTH2, GenHlthANOVA)
##
## Call:
## lm(formula = SLEPTIM2 ~ GENHLTH2, data = analytic)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1852 -1.1019 -0.1019 0.8981 16.9397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.226787 0.015306 472.167 < 2e-16 ***
## GENHLTH2 -0.041631 0.005265 -7.907 2.69e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.468 on 58129 degrees of freedom
## Multiple R-squared: 0.001074, Adjusted R-squared: 0.001057
## F-statistic: 62.52 on 1 and 58129 DF, p-value: 2.689e-15
ttests for Table 1
t.test(analytic$SLEPTIM2~analytic$ASTHMA4)
##
## Welch Two Sample t-test
##
## data: analytic$SLEPTIM2 by analytic$ASTHMA4
## t = 5.8738, df = 6060, p-value = 4.485e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.09852347 0.19722952
## sample estimates:
## mean in group 0 mean in group 1
## 7.129348 6.981471
t.test(analytic$SLEPTIM2~analytic$SEX)
##
## Welch Two Sample t-test
##
## data: analytic$SLEPTIM2 by analytic$SEX
## t = 12.658, df = 6146.6, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.2342547 0.3201083
## sample estimates:
## mean in group 1 mean in group 2
## 7.140360 6.863178
The complete Descriptive analysis of BRFFS data is done. We know that Descriptive analysis can lead to Regression. In the next part we will proceed with doing Linear Regression for this analysis.