Author: Elhakim Ibrahim
Instructor: Corey Sparks, PhD1
March 30, 2020
The exercise is based on data for the State of Texas extracted from nationally representative 2016 Behavioral Risk Factor Surveillance System (BRFSS) SMART metro area survey data set were employed2.
setwd("C:/A/01/02S20/D283/H/04")
suppressPackageStartupMessages({
library(car)
library(VGAM)
library(dplyr)
library(gtools)
library(survey)
library(sjPlot)
library(sjmisc)
library(forcats)
library(stargazer)
library(questionr)
library(htmlTable)
library(sjlabelled)
})load(url("https://github.com/coreysparks/data/blob/master/brfss_2017.Rdata?raw=true"))
mydata <- brfss_17
rawnames <- names(mydata)
newnames <- tolower(gsub(pattern = "_", replacement = "", x = rawnames))
# above identifies and removes '_' from the variable names and changes all
# to lowercase
names(mydata) <- newnames
head(newnames, n = 48) [1] "dispcode" "statere1" "safetime" "hhadult" "genhlth" "physhlth"
[7] "menthlth" "poorhlth" "hlthpln1" "persdoc2" "medcost" "checkup1"
[13] "bphigh4" "bpmeds" "cholchk1" "toldhi2" "cholmed1" "cvdinfr4"
[19] "cvdcrhd4" "cvdstrk3" "asthma3" "asthnow" "chcscncr" "chcocncr"
[25] "chccopd1" "havarth3" "addepev2" "chckidny" "diabete3" "diabage2"
[31] "lmtjoin3" "arthdis2" "arthsocl" "joinpai1" "sex" "marital"
[37] "educa" "renthom1" "numhhol2" "numphon2" "cpdemo1a" "veteran3"
[43] "employ1" "children" "income2" "internet" "weight2" "height3"
Obesity status
Our outcome of interest is obesity status. Based on Centers for Disease Control and Prevention’s categorization, the outcome is operationalized as follows:
In the data, the bmi variable was computed as with implied 2 decimal places as shown above. Hence, the variable is divided by 100 to get the values that represent actual bmi’s.
The new bmi is then dummied as 0 = Not obese for bmi below 30 and 1 = Obese for bmi 30 and above
mydata$bmi <- mydata$bmi5/100
# define descriptive ordinal obesity status outcome variable,
mydata$obstatDES = Recode(mydata$bmi, recodes = "0:18.4999=NA;18.5:24.99999=\"Normal\";
25:29.99999=\"Overweight\";else=\"Obese\"",
as.factor = T)
summarytools::freq(mydata$obstatDES, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata$obstatDES
Type: Factor
Freq % Valid % Valid Cum. % Total % Total Cum.
---------------- -------- --------- -------------- --------- --------------
Normal 68577 30.2 30.2 29.7 29.7
Obese 82347 36.2 66.4 35.7 65.4
Overweight 76477 33.6 100.0 33.1 98.5
<NA> 3474 1.5 100.0
Total 230875 100.0 100.0 100.0 100.0
# define numeric ordinal obesity status outcome variable
mydata$obstatNUM = Recode(mydata$bmi, recodes = "0:18.4999=NA;18.5:24.99999=1;25:29.99999=2;else=3",
as.factor = F)
summarytools::freq(mydata$obstatNUM, round.digits = 1, cumul = T, report.nas = T)Frequencies
mydata$obstatNUM
Label: COMPUTED BODY MASS INDEX
Type: Numeric
Freq % Valid % Valid Cum. % Total % Total Cum.
----------- -------- --------- -------------- --------- --------------
1 68577 30.2 30.2 29.7 29.7
2 76477 33.6 63.8 33.1 62.8
3 82347 36.2 100.0 35.7 98.5
<NA> 3474 1.5 100.0
Total 230875 100.0 100.0 100.0 100.0
It is hypothesized that obesity status will not differ signifcantly by selected sociodemographic characteristics. To test the hypothesis, we would include age, race/ethnicity, education and income covariates in each of the empirical models.
Age group
mydata$agegrp <- cut(mydata$age80, breaks = c(0, 24, 39, 59, 79, 99))
# continuous age variable categorized
levels(mydata$agegrp)[1] "(0,24]" "(24,39]" "(39,59]" "(59,79]" "(79,99]"
Race/ethnic identity
[1] "1" "2" "3" "4" "5" "9"
mydata <- mydata %>% mutate(racegrp = factor(racegr3, levels = c(1, 2, 3, 4,
5), labels = c("NH White", "NH Black", "Other", "Other", "Hispanic")))
mydata$racegrp <- fct_relevel(mydata$racegrp, "NH White", "NH Black", "Hispanic")
levels(mydata$racegrp)[1] "NH White" "NH Black" "Hispanic" "Other"
Education attainment
mydata$edu <- Recode(mydata$educa, recodes = "1:2='Primary';3='Some HS';4='HS Grad';
5='Some Coll';6='Coll Grad';9=NA",
as.factor = T)
mydata$edu <- relevel(mydata$edu, "Coll Grad", "Primary", "Some HS", "HS Grad",
"Some Coll")
levels(mydata$edu)[1] "Coll Grad" "HS Grad" "Primary" "Some Coll" "Some HS"
Income group
mydata$income <- quantcut(mydata$incomg, q = seq(0, 1, by = 0.25), na.rm = TRUE)
# income factorized into quantile income group
mydata <- mydata %>% mutate(incomegrp = factor(income, levels = c("[1,3]", "(3,5)",
"5", "(5,9]"), labels = c("Quantile 1", "Quantile 2", "Quantile 3", "Quantile 4")))
levels(mydata$incomegrp)[1] "Quantile 1" "Quantile 2" "Quantile 3" "Quantile 4"
mydata <- mydata %>% select(obstatDES, obstatNUM, agegrp, racegrp, edu, incomegrp,
mmsawt, ststr)
# check number of observations with incomplete information
sum(!complete.cases(mydata))[1] 8878
# filter only observations with with complete records
mydata = mydata[complete.cases(mydata), ]
# recheck number of observations with incomplete information
sum(!complete.cases(mydata))[1] 0
I create a survey design object from complex survey parameters ids = PSU identifers, strata=strata identifiers, weights=case weights, data=data frame. In doing this, respondents with missing case weights are excluded from the analysis, while options(survey.lonely.psu = "adjust") was applied to facilitate calculation of within stratum variance for stratum with single PSU.
olr.fit <- svyolr(obstatDES ~ agegrp + racegrp + edu + incomegrp, design = design.object)
round(cbind(OR = exp(coef((olr.fit))), exp(confint((olr.fit)))), digits = 2) OR 2.5 % 97.5 %
agegrp(24,39] 1.86 1.72 2.01
agegrp(39,59] 2.40 2.23 2.58
agegrp(59,79] 2.53 2.35 2.72
agegrp(79,99] 1.71 1.54 1.91
racegrpNH Black 1.19 1.14 1.25
racegrpHispanic 1.32 1.25 1.39
racegrpOther 0.74 0.68 0.80
eduHS Grad 1.21 1.16 1.27
eduPrimary 1.15 1.05 1.27
eduSome Coll 1.15 1.10 1.19
eduSome HS 1.08 1.00 1.17
incomegrpQuantile 2 1.12 1.05 1.19
incomegrpQuantile 3 1.09 1.05 1.14
incomegrpQuantile 4 0.93 0.89 0.98
Normal|Obese 1.11 1.02 1.20
Obese|Overweight 5.20 5.13 5.27
Approach 1: Fit separate binary logit regression models and examine plots and coefficients of the models
Plot the coefficients
blr.fit1 <- svyglm(I(obstatNUM > 1) ~ agegrp + racegrp + edu + incomegrp, design = design.object,
family = "binomial")
blr.fit2 <- svyglm(I(obstatNUM > 2) ~ agegrp + racegrp + edu + incomegrp, design = design.object,
family = "binomial")
# define plot labels
labels = c("25-39", "40-59", "60-79", "80+", "NH Black", "Hispanic", "Other",
"Primary", "Some HS", "HS Grad", "Some Coll", "Quantile 2", "Quantile 3",
"Quantile 4")
plot(coef(blr.fit1)[-1], ylim = c(-3, 3), type = "l", xaxt = "n", ylab = "Beta",
xlab = "", main = c("Comparison of betas for proportional odds assumption"))
lines(coef(blr.fit2)[-1], col = 2, lty = 2)
axis(side = 1, at = 1:14, labels = F)
text(x = 1:14, y = -4, srt = 90, pos = 1, xpd = T, cex = 0.8, labels = labels)
legend("bottomright", col = c(1, 2), lty = c(1, 2), legend = c(">1", ">2"))
lines(coef(olr.fit)[c(-13:-26)], col = 4, lwd = 3)The figure shows that the assumption of proportionality is considerably violated. The deviation from porportional odds assumption is particulalrly striking per age and education patterns of the outcome.
Examine models odds ratios
# round(exp(cbind(coef(blr.fit1)[-1], coef(blr.fit2)[-1])),3)
round(cbind(OR = exp(coef((blr.fit1))), exp(confint((blr.fit1)))), digits = 2) OR 2.5 % 97.5 %
(Intercept) 0.61 0.56 0.66
agegrp(24,39] 2.40 2.23 2.59
agegrp(39,59] 3.56 3.31 3.83
agegrp(59,79] 3.34 3.10 3.59
agegrp(79,99] 1.54 1.39 1.70
racegrpNH Black 1.52 1.42 1.62
racegrpHispanic 1.53 1.42 1.64
racegrpOther 0.69 0.63 0.75
eduHS Grad 1.51 1.43 1.59
eduPrimary 1.81 1.55 2.10
eduSome Coll 1.38 1.32 1.45
eduSome HS 1.36 1.23 1.50
incomegrpQuantile 2 1.04 0.96 1.12
incomegrpQuantile 3 0.96 0.91 1.01
incomegrpQuantile 4 1.12 1.05 1.19
Approach 2: Fit comparative porportional odds, nonproportional odds and multinomial regression models and evaluate model fits
Fit porportional odds model
# fit proportional odds model
prop.fit <- vglm(as.ordered(obstatDES) ~ agegrp + racegrp + edu + incomegrp,
mydata, weights = mmsawt/mean(mmsawt, na.rm = T), family = cumulative(parallel = T,
reverse = T))
# round(cbind(OR=exp(coef((prop.fit))), exp(confint((prop.fit)))), digits=2)Fit nonporportional odds model
# fit nonproportional odds model
nprop.fit <- vglm(as.ordered(obstatDES) ~ agegrp + racegrp + edu + incomegrp,
mydata, weights = mmsawt/mean(mmsawt, na.rm = T), family = cumulative(parallel = F,
reverse = T))
# round(cbind(OR=exp(coef((nprop.fit))), exp(confint((nprop.fit)))),
# digits=2)Fit multinomial regression models
Compare AIC values of the models to determine the best fitting model
[1] 479964
[1] 469792
[1] 469754
The AIC values indicate that multinomial model, with the least value, is the best fitting model of the three models being evaluated
OR 2.5 % 97.5 %
(Intercept):1 0.26 0.24 0.27
(Intercept):2 0.36 0.34 0.37
agegrp(24,39]:1 2.87 2.77 2.98
agegrp(24,39]:2 1.99 1.92 2.07
agegrp(39,59]:1 4.29 4.13 4.45
agegrp(39,59]:2 2.93 2.82 3.04
agegrp(59,79]:1 3.63 3.49 3.77
agegrp(59,79]:2 3.04 2.93 3.16
agegrp(79,99]:1 1.20 1.13 1.28
agegrp(79,99]:2 1.86 1.76 1.97
racegrpNH Black:1 1.69 1.63 1.74
racegrpNH Black:2 1.34 1.29 1.39
racegrpHispanic:1 1.54 1.49 1.59
racegrpHispanic:2 1.51 1.46 1.56
racegrpOther:1 0.66 0.64 0.69
racegrpOther:2 0.71 0.69 0.74
eduHS Grad:1 1.74 1.68 1.79
eduHS Grad:2 1.32 1.28 1.36
eduPrimary:1 2.27 2.14 2.41
eduPrimary:2 1.39 1.30 1.48
eduSome Coll:1 1.59 1.55 1.64
eduSome Coll:2 1.21 1.18 1.25
eduSome HS:1 1.63 1.55 1.70
eduSome HS:2 1.12 1.07 1.17
incomegrpQuantile 2:1 0.94 0.90 0.97
incomegrpQuantile 2:2 1.16 1.11 1.20
incomegrpQuantile 3:1 0.82 0.80 0.85
incomegrpQuantile 3:2 1.11 1.08 1.14
incomegrpQuantile 4:1 1.34 1.29 1.38
incomegrpQuantile 4:2 0.89 0.86 0.92
The results indicating variations in obesity status among Texas residents are presented in the above table. The multinomial estimates compare factors predisposing individuals in Texas to risks of being overweight and obese relatice to normal weight. The results indicate curvilnear relationship between age and obesity status with the odds of being overweight (vs. normal) peaking at age 39-59 years and odds of being obese (vs. normal) at highest at age 59-79 years. Compared with NH Whites, the NH Blacks and the Hispanics were at signifcantly greater risks of abnormal weights (overweight and obese) versus normal weight, whereas those of Other racial/ethnic groups have significantly lower risk than the NH Whites. Attainment of level of education lower than completion of college graduation exposes an individual to greater risks of being overweight and obese (vs. normal weight). Lastly, those in second and third wealth quintile spectrum were less likely to be overweight but more likely to be obese compared to those in first wealth quintile category. Meanwhile, those in highest wealth quintile group were more likely to be overweight but less likely to be obese compared to those in first wealth quintile category.
This content adapts steps and examples from instructional materials authored by Dr. Corey Sparks and made available at https://rpubs.com/corey_sparks.
The 2016 BRFSS SMART metro area survey data are open source resources made available by the US Centers for Disease Control and Prevention at https://www.cdc.gov/brfss/smart/smart_2016.html.
Stargazer is authored by Marek Hlavac and full document can be downloaded from https://cran.r-project.org/web/packages/stargazer/stargazer.pdf