April 24, 2015

Presentation Structure

  • Introduction to Mortality Disparities in the US
  • The role of residential segregation
  • Research Questions
  • Data
  • Methods - The INLA approach to Bayesian analysis
  • Results & visualizations
  • Wrap up

Introduction

  • Black-White disparities in mortality rates persist
  • Most research focuses on individual level factors
    • SES, Health behaviors
  • More recent work is multilevel
    • Context of health, neighborhoood conditions
  • Role of residential segregation on aggregate mortality rates still poorly understood

Segregation & Mortality

  • Williams and Collins (2001) offer one of the first conceptual pieces to link segregation to poor health.
  • Segregation spatially and socially patterns:
    • Poverty
    • Economic and educational opportunities
    • Social order or disorder
    • Access to resources
  • Segregation could lead to better health outcomes (political representation, social support, cohesion)

Research Questions

  • Does the effect of segregation produce the same disparity in black and white mortality rates over time?
  • Do counties with persistently high segregation show the same mortality disadvantage for both black and white mortality rates?
  • Does segregation have any protective advantage on county-level mortality rates?
    • For black mortality specifically

Data

  • NCHS Compressed Mortality File
    • County - level counts of deaths by year, age, sex, race/ethnicity and cause of death
    • 1980 to 2010
    • Age, sex and race (white & black) specific rates for all US counties

    • In total: 35748276 deaths in the data
    • Standardized to 2000 Standard US population age structure
    • Rates stratified by race and sex for each county by year
    • n = 2 sexes * 2 races * 3106 counties * 31 years = 385144 observations
    • Analytic n = 315,808 nonzero rates

Data - Access

  • You can basically get these data from the CDC Wonder website
  • Supresses counts where the number of deaths is less than 10
  • Rates are labeled as "unreliable" when the rate is calculated with a numerator of 20 or less
    • Big problem for small population counties
    • Still a problem for large population counties!
  • Restricted use data allows access to ALL data

Data - Example

Bexar County, TX 1980 - 1982

##  cofips year     race_sex mortality
##   48029 1980 White Female       7.9
##   48029 1980 Black Female      10.0
##   48029 1980   White Male      13.5
##   48029 1980   Black Male      17.2
##   48029 1981 White Female       8.2
##   48029 1981 Black Female       9.8
##   48029 1981   White Male      12.9
##   48029 1981   Black Male      15.4
##   48029 1982 White Female       7.6
##   48029 1982 Black Female      10.1
##   48029 1982   White Male      12.9
##   48029 1982   Black Male      16.7

Data - Example

Bexar County, TX Temporal Trends 1980 - 2010

Data - Example

Spatial Distribution of White & Black Mortality in TX: 1980-2010 Average

Methods - Hierarchical Model

  • I specify a Bayesian Hierarchical model for the standardized mortality rate \[ \begin{align*} \ Y_{ij} & \sim N(\mu_{ij}, \sigma_y ^2) \\ \ \mu_{ij} & = \beta_{0} + x'\beta + v_j + u_j + \tau_t \\ \ v_j & \sim N(0, \sigma_v ^2) \\ \ u_j & \sim CAR(\bar u_j, \sigma_u ^2/n_j) \\ \ \tau_t & \sim N(0, \sigma_{\tau}^2) \\ \end{align*}\\ \]
  • We assume vague conjugate gamma priors for all the \(\sigma\)'s
  • We assume vague Normal priors for all the fixed effect \(\beta\)'s

Methods - Bayesian analysis

  • This type of model is commonly used in epidemiology and public health
  • Various types of data likelihoods may be used
  • Need to get at: \(p(\theta|y) \propto p(y|\theta)p(\theta)\)
  • Traditionally, we would get \(p(\theta|y)\) by:
    • either figure out what the full conditionals for all our model parameters are (hard)
    • Use some form of MCMC to arrive at the posterior marginal distributions for our parameters (time consuming)

Methods - INLA approach

  • Integrated Nested Laplace Approximation - Rue, Martino & Chopin (2009)
  • One of several techniques that approximate the marginal and conditional posterior densities
    • Laplace, PQL, E-M, Variational Bayes
  • Assumes all random effects in the model are latent, zero-mean Gaussian random field, \(x\) with some precision matrix
    • The precision matrix depends on a small set of hyperparameters
  • Attempts to construct a joint Gaussian approximation for \(p(x | \theta, y)\)
    • where \(\theta\) is a small subset of hyper-parameters

Methods - INLA approach

  • Apply these approximations to arrive at:
  • \(\tilde{\pi}(x_i | y) = \int \tilde{\pi}(x_i |\theta, y)\tilde{\pi}(\theta| y) d\theta\)

  • \(\tilde{\pi}(\theta_j | y) = \int \tilde{\pi}(\theta| y) d\theta_{-j}\)

  • where each \(\tilde{\pi}(. |.)\) is an approximated conditional density of its parameters

  • Approximations to \(\pi(x_i | y)\) are computed by approximating both \(\pi(\theta| y)\) and \(\pi(x_i| \theta, y)\) using numerical integration to integrate out the nuisance parameters.
    • This is possible if the dimension of \(\theta\) is small.
  • Approximations to \(\tilde{\pi}(\theta|y)\) are based on the Laplace appoximation of the marginal posterior density for \(\pi(x,\theta|y)\)
  • Their approach relies on numerical integration of the posterior of the latent field, as opposed to a pure Gaussian approximation of it

INLA in R

library(INLA)

Unstructured Model

mod1<-std_rate~male+black+lths+gini+pershigdis+black*pershigdis +f(year,model="iid") +f(conum, model="iid")

fit1<-inla(mod1, data=sdadata2, family="gaussian", num.threads = 2)

Spatially structured BYM model

mod2<-std_rate~male+black+lths+gini+pershigdis+black*pershigdis +f(conum, model="bym", graph="usagraph.gra") +f(year, model="iid")

fit2<-inla(mod2, data=sdadata2, family="gaussian", num.threads = 2)

Model Results

Fixed Effects Parameter Estimates
beta 2.5% BCI 97.5% BCI
(Intercept) -0.586 -0.623 -0.549
male 0.593 0.586 0.599
black 0.589 0.582 0.595
lths 0.006 0.006 0.006
gini -0.003 -0.004 -0.002
pershigdis 0.052 0.030 0.074
black:pershigdis -0.176 -0.209 -0.143

Model Results

Hyperparameter Estimates
beta 2.5% BCI 97.5% BCI
Gaussian Variance 2.0977 2.1150 2.0805
County_IID Variance 0.0004 0.0047 0.0001
County_Spatial Variance 0.0003 0.0023 0.0001
Segregation_Slope Variance 0.0000 0.0004 0.0000
Time_Intercept Variance 0.0036 0.0066 0.0021

Model Results

Model Results

Model Results

Discussion

  • We see that, while there is a persistence of the gap in black-white mortality:
    • The mortality rate appears to be converging
    • The convergence is faster in highly segregated areas
    • Suggests some evidence to support the Williams and Collins (2001) perspective
  • INLA allows for rapid deployment of Bayesian statistical models with latent Gaussian random effects
    • Faster and generally as accurate as MCMC
    • Potentially an attractive solution for problems where large data/complex models may make MCMC less desireable

Thank you!