Author: Elhakim Ibrahim
Instructor: Corey Sparks, PhD1
February 24, 2020
This exercise seeks to evaluate if risk of obesity in the United States differ substantially by socioeconomic characteristics (age, race/ethinicity and income). Data for the State of Texas extracted from nationally representative 2016 Behavioral Risk Factor Surveillance System (BRFSS) SMART metro area survey data set were employed2. Comparison by the hypothesized characteristics will be based on odds ratios from logistic regression models. In addition, predicted values will be fitted to facilitate more intuitive understanding of the logistic regression model estimates.
setwd("C:/A/01/02S20/D283/H/03")
suppressPackageStartupMessages({
library(car)
library(dplyr)
library(gtools)
library(survey)
library(sjPlot)
library(sjmisc)
library(forcats)
library(stargazer)
library(questionr)
library(htmlTable)
library(kableExtra)
library(sjlabelled)
options(scipen = 5)
})load(url("https://github.com/coreysparks/data/blob/master/brfss_2017.Rdata?raw=true"))
mydata <- brfss_17
rawnames <- names(mydata)
newnames <- tolower(gsub(pattern = "_", replacement = "", x = rawnames))
# above identifies and removes '_' from the variable names and changes all
# to lowercase
names(mydata) <- newnames
head(newnames, n = 48) [1] "dispcode" "statere1" "safetime" "hhadult" "genhlth" "physhlth"
[7] "menthlth" "poorhlth" "hlthpln1" "persdoc2" "medcost" "checkup1"
[13] "bphigh4" "bpmeds" "cholchk1" "toldhi2" "cholmed1" "cvdinfr4"
[19] "cvdcrhd4" "cvdstrk3" "asthma3" "asthnow" "chcscncr" "chcocncr"
[25] "chccopd1" "havarth3" "addepev2" "chckidny" "diabete3" "diabage2"
[31] "lmtjoin3" "arthdis2" "arthsocl" "joinpai1" "sex" "marital"
[37] "educa" "renthom1" "numhhol2" "numphon2" "cpdemo1a" "veteran3"
[43] "employ1" "children" "income2" "internet" "weight2" "height3"
Obesity status
In the data, the bmi variable was computed as with implied 2 decimal places as shown above. Hence, the variable is divided by 100 to get the values that represent actual bmi’s. The new bmi is then dummied as 0 = Not obese for bmi below 30 and 1 = Obese for bmi 30 and above
mydata$bmi <- mydata$bmi5/100
# print(mydata$bmi)
mydata$bmigrp <- ifelse(mydata$bmi >= 30, 1, 0)
mydata$bmigrp <- as.factor(ifelse(mydata$bmigrp == 1, "Obese", "Not obese"))
# continuous bmi factorizedAge group
mydata$agegrp <- cut(mydata$age80, breaks = c(0, 24, 39, 59, 79, 99))
# continuous age variable categorized
levels(mydata$agegrp)[1] "(0,24]" "(24,39]" "(39,59]" "(59,79]" "(79,99]"
Race/ethnic identity
[1] "1" "2" "3" "4" "5" "9"
mydata <- mydata %>% mutate(racegrp = factor(racegr3, levels = c(1, 2, 3, 4,
5), labels = c("Non-Hispanic White", "Non-Hispanic Black", "Other", "Other",
"Hispanic")))
mydata$racegrp <- fct_relevel(mydata$racegrp, "Non-Hispanic White", "Non-Hispanic Black",
"Hispanic")
levels(mydata$racegrp)[1] "Non-Hispanic White" "Non-Hispanic Black" "Hispanic"
[4] "Other"
Income group
mydata$income <- quantcut(mydata$incomg, q = seq(0, 1, by = 0.25), na.rm = TRUE)
# income factorized into quantile income group
mydata <- mydata %>% mutate(incomegrp = factor(income, levels = c("[1,3]", "(3,5)",
"5", "(5,9]"), labels = c("Quantile 1", "Quantile 2", "Quantile 3", "Quantile 4")))
levels(mydata$incomegrp)[1] "Quantile 1" "Quantile 2" "Quantile 3" "Quantile 4"
I create a survey design object from complex survey parameters ids = PSU identifers, strata=strata identifiers, weights=case weights, data=data frame. In doing this, respondents with missing case weights are excluded from the analysis, while options(survey.lonely.psu = "adjust") was applied to facilitate calculation of within stratum variance for stratum with single PSU.
mydata$tx <- NA
mydata$tx[grep(pattern = "TX", mydata$mmsaname)] <- 1
# above extracts only cities in the State of Texas by looking for 'TX' in
# the MSA's name field in the mydata
mydata <- mydata %>% filter(tx == 1, is.na(mmsawt) == F)
options(survey.lonely.psu = "adjust")
design.object <- svydesign(ids = ~1, strata = ~ststr, weights = ~mmsawt, data = mydata)logit.model <- svyglm(bmigrp ~ agegrp + racegrp + incomegrp, design = design.object,
family = binomial)
summary(logit.model)
Call:
svyglm(formula = bmigrp ~ agegrp + racegrp + incomegrp, design = design.object,
family = binomial)
Survey design:
svydesign(ids = ~1, strata = ~ststr, weights = ~mmsawt, data = mydata)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.9675 0.2561 -7.682 1.77e-14 ***
agegrp(24,39] 1.1495 0.2363 4.866 1.16e-06 ***
agegrp(39,59] 1.6848 0.2342 7.194 6.90e-13 ***
agegrp(59,79] 1.3892 0.2438 5.699 1.25e-08 ***
agegrp(79,99] 0.4582 0.3891 1.178 0.2389
racegrpNon-Hispanic Black 0.2261 0.1792 1.262 0.2070
racegrpHispanic 0.2699 0.1310 2.060 0.0394 *
racegrpOther -0.4718 0.2594 -1.819 0.0690 .
incomegrpQuantile 2 -0.3750 0.1812 -2.070 0.0385 *
incomegrpQuantile 3 -0.2890 0.1326 -2.179 0.0293 *
incomegrpQuantile 4 -0.1285 0.1885 -0.682 0.4955
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 0.9723225)
Number of Fisher Scoring iterations: 4
In the above table, the summary function is applied to generate parameter coefficients with accompanying s.e., t-statistic, and p-value for each. But, we are equally interested in relative size of difference in gradients across categories compared with the reference category, not only in the direction of difference in the gradients. Hence, same results are represented below in odds ratios and accompanying confidence intervals using stargazer3 package.
# first set the rule for coefficients exponentiation
myexp <- function(x) {
exp(x)
}
stargazer(logit.model, ci = T, ci.level = 0.95, ci.separator = ",", digits = 2,
font.size = "normalsize", dep.var.caption = c("Outcome: Obesity likelihood"),
dep.var.labels.include = F, intercept.bottom = F, align = F, type = "text",
single.row = T, no.space = T, column.sep.width = "4pt", p.auto = F, t.auto = F,
covariate.labels = c("Intercept", "25-39", "40-59", "60-79", "80+", "NH Black",
"Hispanic", "Other", "Quantile 2", "Quantile 3", "Quantile 4"), column.labels = c("Odds ratios (25% CI,95% CI)"),
star.cutoffs = c(0.05, 0.01, 0.001), object.names = F, model.numbers = F,
apply.coef = myexp, report = "vcs*", title = "Odds ratios of Obesity by selected characteristics, BRFSS 2016",
notes = "Variables selected for exercise purpose only", notes.align = "l",
notes.label = "Notes:", keep.stat = c("n", "rsq", "ll", "aic", "bic", "adj.rsq"),
multicolumn = T, out = "tab002.htm")
Odds ratios of Obesity by selected characteristics, BRFSS 2016
==============================================================
Outcome: Obesity likelihood
--------------------------------------------
Odds ratios (25% CI,95% CI)
--------------------------------------------------------------
Intercept 0.14 (-0.36,0.64)***
25-39 3.16 (2.69,3.62)***
40-59 5.39 (4.93,5.85)***
60-79 4.01 (3.53,4.49)***
80+ 1.58 (0.82,2.34)
NH Black 1.25 (0.90,1.60)
Hispanic 1.31 (1.05,1.57)*
Other 0.62 (0.12,1.13)
Quantile 2 0.69 (0.33,1.04)*
Quantile 3 0.75 (0.49,1.01)*
Quantile 4 0.88 (0.51,1.25)
--------------------------------------------------------------
Observations 7,750
Log Likelihood -4,265.40
Akaike Inf. Crit. 8,552.80
==============================================================
Notes: *p<0.05; **p<0.01; ***p<0.001
Variables selected for exercise purpose only
The odds ratios indicating variations in likelihood of obesity among Texas residents are presented in the above table. Considering differentials by age groups, the model suggests a curvilinear relation between age and risk of obesity with the highest odds found among population aged 39-59 years (OR: 5.39; CI: 3.41-8.53), followed by those aged 59-79 years (OR: 4.01; CI: 2.49-6.47) compared with those aged less than 24 years. Also, income appears to be a siginficant factor: compared with those in lower 25% of income ladder, obesity propensity is signifcantly lower by approximately 31% and 25% for those in upper 50% and 75% of income ladder. Meanwhile, the likelihood of being obese is 1.31 times higher (p<0.05) for an average Hispanic relative to a Non-Hispanic White. Similar risk gradient is exhibited by an average Non-Hispanic Black, but the probability is falls outside significant thresholds.
mydata2 <- expand.grid(agegrp = levels(mydata$agegrp), racegrp = levels(mydata$racegrp),
incomegrp = levels(mydata$incomegrp))
fitted <- predict(logit.model, newdata = mydata2, type = "response")
mydata2$fitted <- round(fitted, 3) * 100mydata2[which(mydata2$racegrp == "Non-Hispanic White" & mydata2$agegrp == "(24,39]" &
mydata2$incomegrp == "Quantile 1"), ]mydata2[which(mydata2$racegrp == "Hispanic" & mydata2$agegrp == "(24,39]" &
mydata2$incomegrp == "Quantile 1"), ]mydata2[which(mydata2$racegrp == "Non-Hispanic White" & mydata2$agegrp == "(24,39]" &
mydata2$incomegrp == "Quantile 4"), ]mydata2[which(mydata2$racegrp == "Hispanic" & mydata2$agegrp == "(24,39]" &
mydata2$incomegrp == "Quantile 4"), ]Attemp is made to understand inherent variations in likelihood of obesity within the diverse groups of Texas population. Obesity risk disparities are examined between a hypothetical Non-Hispanic White and Hispanic person by specific age category and income status. As shown in the tables, a low-income (income quantile 1, lower 25%), middle-aged (24-39 years) Non-Hispanic White is averagely 6% point less likely to be obese than a Hispanic of similar income and age characteristics (i.e. estimated probability of being obese: Non-Hispanic White, 30.6% vs. Hispanic, 36.6%). Further, almost equal difference is observed between the two hypothetical individuals (in favor of the Non-Hispanic White) in the same age group but of highest income status (estimated probability of being obese: Non-Hispanic White, 20.0% vs. Hispanic, 33.7%).
This content adapts steps and examples from instructional materials authored by Dr. Corey Sparks and made available at https://rpubs.com/corey_sparks.
The 2016 BRFSS SMART metro area survey data are open source resources made available by the US Centers for Disease Control and Prevention at https://www.cdc.gov/brfss/smart/smart_2016.html.
Stargazer is authored by Marek Hlavac and full document can be downloaded from https://cran.r-project.org/web/packages/stargazer/stargazer.pdf