Primary Recoding
We make the following recodes.
.STATE: Not all states/territories are surveyed every year, creating gaps. We combine the FIPS codes for Census subregions to ensure that each region is represented every year.
.RFBMI5: Adults who have a body mass index greater than 25.00 Values: 1=No, 2=Yes, 9=Don’t Know/Refused/Missing Recode: 0=No, 1=Yes or Don’t Know/Refused/Missing (modal imputation, 8% missing)
.AGE.G: Six-level imputed age category Values: 1=18 to 24, 2=25 to 34, 3=35 to 44, 4=45 to 54, 5=55 to 64, 6=65 or older
.IMPRACE: Imputed race/ethnicity value Values: 1=White Non-Hispanic, 2=Black Non-Hispanic, 3=Asian Non-Hispanic, 4=American Indian/Alaskan Native Non-Hispanic, 5=Hispanic, 6=Other Race, Non-Hispanic Recode: 0=White, 1=Non-White
HISPANIC: Derived from IMPRACE Recode: 0=Non-Hispanic, 1=Hispanic
.SEX: Indicate sex of respondent Values: 1=Male, 2=Female, 9=Refused (.06%) Recode: 0=Not Known to Be Male, 1=Known to be Male (modal imputation)
.MARITAL: Are you..? Values: 1= Married, 2=Divorced, 3=Widowed, 4=Separated, 5=Never Married, 6=Unmarried Couple, 9=Refused, BLANK=Not asked or Missing Recode: 0=Did not Identify as Married, 1= Self-Identified as Married
.INCOME2: Is your annual household income from all sources: Values: 1=LT 10K, 2=LT 15K, 3=LT 20K, 4=LT 25K, 5=LT 35K, 6=LT 50K, 7=LT 75K, 8=GE 75K, 77=Don’t Know/Not Sure, 99= Refused, Not Asked or Missing Recode: 0=Not Identified as 75K or more, 1=Identified as 75K or more
EDUCA: What is the highest grade or year of school you completed? Values: 1=Never Attended School or Only Kindergarten, 2=Grades 1-8, 3=Grades 9-11, 4=Grade 12 or GED, 5=College 1-3 Years, 6=College 4 or more (graduate), 9=Refused, BLANK=Not Asked/Missing Recode: 0=Not Identified as College Grad, 1=College Graduate
EMPLOY1: Are you currently…? Values: 1=Employed for Wages, 2= Self-Employed, 3=Out of work >1 year, 4=Out of Work< 1 year, 5=A homemaker, 6= A student, 7=Retire, 8=Unable to Work, 9=Refused, BLANK=Not Asked/Missing Recode: 0=Did not Identify as Employed for Wages, 1=identified as Employed for Wages
VETERAN3: Have you ever served on active duty in the United States Armed Forces, either in the regular military or in a National Guard or military reserve unit? Values: 1=Yes, 2=No, 7=Don’t Know/Not Sure, 9=Refused, BLANK=Not Asked/Missing Recode: 0=Not Yes, 1=Yes
The next several variables have the same recode. CVDCRHD4:(Ever told) you had angina or coronary heart disease? CVDSTRK3:(Ever told) you had a stroke. CHCSCNCR:(Ever told) you had skin cancer? CHCOCNCR: (Ever told) you had any other types of cancer? CHCCOPD: (Ever told) you have chronic obstructive pulmonary disease, C.O.P.D., emphysema or chronic bronchitis? HAVARTH:(Ever told) you have some form of arthritis, rheumatoid arthritis, gout, lupus, or fibromyalgia? (Arthritis diagnoses include: rheumatism, polymyalgia rheumatica; osteoarthritis (not osteporosis); tendonitis, bursitis, bunion, tennis elbow; carpal tunnel syndrome, tarsal tunnel syndrome; joint infection, etc.) CHCKIDNY*: (Ever told) you have kidney disease? Do NOT include kidney stones, bladder infection or incontinence.(Incontinence is not being able to control urine flow.) Values: 1=Yes, 2=No, 7=Don’t Know/Not Sure, 9=Refused, BLANK=Not Asked/Missing Recode: 0=Not Yes, 1=Yes
DIABETE3: (Ever told) you have diabetes (If ´Yes´ and respondent is female, ask ´Was this only when you were pregnant?´ Values: 1=Yes, 2=Yes, Gestational Only, 3=No, 4=No, Pre-Diabetes, =Don’t Know/Not Sure, 9=Refused, BLANK=Not Asked/Missing Recode: 0=Not Yes, 1=Yes
MENTHLTH: Now thinking about your mental health, which includes stress, depression, and problems with emotions, for how many days during the past 30 days was your mental health not good? Values: 1-30=Days, 88=None, 77=Don’Know/Not Sure, 99=Refused, BLANK=Not Asked/Missing Recode: 0=Not Positive Number between 1-30, 1=Positive Number between 1-30
#Pre-processing Function
myf=function(mydata, myvars){
mydata=mydata[myvars] #reduce data set
mydata$X.STATE[mydata$X.STATE %in% c(9, 23, 25, 33, 44, 50)==TRUE]="NEW.ENGLAND"
mydata$X.STATE[mydata$X.STATE %in% c(17, 18, 26, 39, 55)==TRUE]="EAST.NORTH.CENTRAL"
mydata$X.STATE[mydata$X.STATE %in% c(1, 21, 28, 47)==TRUE]="EAST.SOUTH.CENTRAL"
mydata$X.STATE[mydata$X.STATE %in% c(34, 36, 42)==TRUE]="MIDDLE.ATLANTIC"
mydata$X.STATE[mydata$X.STATE %in% c(8, 30, 32, 35, 49, 56, 4, 16)==TRUE]="MOUNTAIN"
mydata$X.STATE[mydata$X.STATE %in% c(2, 6, 15, 41, 53)==TRUE]="PACIFIC"
mydata$X.STATE[mydata$X.STATE %in% c(10, 11, 12, 13, 24, 37, 45, 51, 54)==TRUE]="SOUTH.ATLANTIC"
mydata$X.STATE[mydata$X.STATE %in% c(19, 20, 27, 29, 31, 38, 46)==TRUE]="WEST.NORTH.CENTRAL"
mydata$X.STATE[mydata$X.STATE %in% c(5, 22, 40, 48)==TRUE]="WEST.SOUTH.CENTRAL"
mydata$X.STATE[mydata$X.STATE %in% c(60, 3, 81, 7, 64, 14, 66, 84, 86,
67, 89, 68, 71, 76, 69, 70, 95, 43, 72, 74, 52, 78, 79)==TRUE]="TERRITORIES"
mydata$X.STATE=as.factor(mydata$X.STATE)
mydata$X.RFBMI5[mydata$X.RFBMI5!=2]=0 #1=0, No = 0
mydata$X.RFBMI5[mydata$X.RFBMI5==2]=1 #2=1, Yes = 1
mydata$X.RFBMI5[is.na(mydata$X.RFBMI5)==TRUE]=0 #NA's
mydata$HISPANIC=mydata$X.IMPRACE #Create Hispanic Variable
mydata$HISPANIC[mydata$HISPANIC!=5]=0
mydata$HISPANIC[mydata$HISPANIC==5]=1
mydata$HISPANIC[is.na(mydata$HISPANIC)==TRUE]=0 #NA's
mydata$X.IMPRACE[mydata$X.IMPRACE!=1]=0 #1=0, No = 0
mydata$X.IMPRACE[is.na(mydata$X.IMPRACE)==TRUE]=0
mydata$SEX[mydata$SEX!=1]=0 #Male=1, Others=0
mydata$SEX[is.na(mydata$SEX)==TRUE]=0
mydata$MARITAL[mydata$MARITAL!=1]=0 #2 or higher = 0
mydata$MARITAL[is.na(mydata$MARITAL)==TRUE]=0
mydata$INCOME2[mydata$INCOME2!=8]=0 #77 or 99 = 0
mydata$INCOME2[mydata$INCOME2==8]=1 #GE 75K = 1
mydata$INCOME2[is.na(mydata$INCOME2)==TRUE]=0
mydata$EDUCA[mydata$EDUCA!=6]=0 #Not College Grad
mydata$EDUCA[mydata$EDUCA==6]=1 #College Grad
mydata$EDUCA[is.na(mydata$EDUCA)==TRUE]=0 #
mydata$EMPLOY1[mydata$EMPLOY1!=1]=0 #Not Employed for Wages
mydata$EMPLOY1[is.na(mydata$EMPLOY1)==TRUE]=0
mydata$VETERAN3[mydata$VETERAN3!=1]=0 #Not Veteran
mydata$VETERAN3[is.na(mydata$VETERAN3)==TRUE]=0
mydata$CVDCRHD4[mydata$CVDCRHD4!=1]=0 #Heart Disease
mydata$CVDCRHD4[is.na(mydata$CVDCRHD4)==TRUE]=0
mydata$CVDSTRK3[mydata$CVDSTRK3!=1]=0 #Stroke
mydata$CVDSTRK3[is.na(mydata$CVDSTRK3)==TRUE]=0
mydata$CHCSCNCR[mydata$CHCSCNCR!=1]=0 #Skin Cancer
mydata$CHCSCNCR[is.na(mydata$CHCSCNCR)==TRUE]=0
mydata$CHCOCNCR[mydata$CHCOCNCR!=1]=0 #other Cancer
mydata$CHCOCNCR[is.na(mydata$CHCOCNCR)==TRUE]=0
mydata$CHCCOPD1[mydata$CHCCOPD1!=1]=0 #COPD
mydata$CHCCOPD1[is.na(mydata$CHCCOPD1)==TRUE]=0
mydata$HAVARTH3[mydata$HAVARTH3!=1]=0 #Arthritis
mydata$HAVARTH3[is.na(mydata$HAVARTH3)==TRUE]=0
mydata$CHCKIDNY[mydata$CHCKIDNY!=1]=0 #Kidney
mydata$CHCKIDNY[is.na(mydata$CHCKIDNY)==TRUE]=0
mydata$DIABETE3[mydata$DIABETE3!=1]=0 #Diabetes
mydata$DIABETE3[is.na(mydata$DIABETE3)==TRUE]=0
mydata$MENTHLTH[mydata$MENTHLTH<=30]=1 #Mental Health
mydata$MENTHLTH[mydata$MENTHLTH>30]=0
mydata$MENTHLTH[is.na(mydata$MENTHLTH)==TRUE]=0
mydata$MENTHLTH[is.na(mydata$MENTHLTH)==TRUE]=0
colnames(mydata)=c( "Age", "Caucasian", "Male" ,"Married", "Income.75K",
"College.Graduate" ,"Employed.for.Wages", "Veteran","Overweight.Obese", "Heart.Disease",
"Stroke", "Skin.Cancer","Cancer" , "COPD" , "Arthritis",
"Depression", "Kidney.Disease" , "Diabetes","Stratum" , "Weights" ,
"Region", "Year", "Hispanic")
mydata=mydata[, c(1:2, 23, 3:22)] #Put Hispanic by Race
for (i in 1:22){mydata[,i]=as.numeric(mydata[,i])}# for checking descriptives
return(mydata)
}
print("Variable Recoding Function Loaded....")
## [1] "Variable Recoding Function Loaded...."