Back to Outline

Introduction

This document presents summary plots of the estimated demographic distributions of MSM in WA State, by age, race and region. It also explores some implications of the observed patterns.

The age distributions of MSM in Washington State display a set of features that complicate the task of modeling HIV transmission dynamics over time. We describe the problem, identify why the observed patterns matter for the dynamics of HIV transmission, outline the possible solutions, and explain the choices we made.

The demographics

The data we present here are the estimated age distributions of MSM in WA state, broken down by race and region.

The estimates are based on the number of men in WA state by age, race and county (Census estimates), adjusted by the of fraction of men that are MSM by county (CAMP project estimates). County values are then aggregated up to region.

library(dplyr)
library(ggplot2)
#data("msm_all_age5_race_region")
all <- WApopdata::msm_all_age5_race_region

df2019 <- all$df2019 %>%
  group_by(race, region) %>%
  mutate(cond.pct = num / sum(num),
         cond.sum = sum(cond.pct)) %>%
  ungroup() 

Joint percents

These sum to 1 across the race x region x age group. Each value represents the percent of the total population for this race x region x age group.

In these plots you can see the overall age distributions, and the relative sizes of the groups. The size differences are so large that comparing the age distributions by group is difficult.

Region

df2019 %>%
  ggplot(aes(x = age.grp5, y = joint.pct, 
             group = race, color = race)) +
  geom_point() +
  geom_line() +
  facet_wrap(~ region) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

Race

df2019 %>%
  ggplot(aes(x = age.grp5, y = joint.pct, 
             group = region, color = region)) +
  geom_point() +
  geom_line() +
  facet_wrap(~ race)

Percents within group

These sum to 1 within each race x region. Each value represents the percent of this race x region subgroup that is this age.

In these plots you can directly compare the age distributions across each group; they are all on the same scale.

Region

df2019 %>%
  ggplot(aes(x = age.grp5, y = cond.pct, 
             group = race, color = race)) +
  geom_point() +
  geom_line() +
  facet_wrap(~ region) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

Race

df2019 %>%
  ggplot(aes(x = age.grp5, y = cond.pct, 
             group = region, color = region)) +
  geom_point() +
  geom_line() +
  facet_wrap(~ race) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

The problem

Essentially, the problem is that the age distributions are far from an equilibrium distribution that can be easily reproduced.

What do we expect an age distribution to look like?

That depends on what kinds of changes are happening in the population.

Stable populations (equilibrium)

Stable means no growth or decline. In this case the equilibrium age distribution would be roughly uniform in the 15-65 range: all people enter at birth, and mortality is pretty low for a long time. Mortality begins to reduce the fraction in groups after 65, so this part of the distribution will be monotonically declining.

Changing populations

Population size changes when entries and exits are not balanced. There are two ways this can happen: through vital dynamics (births and deaths), or migration.

If the population is growing (declining) only by vital dynamics, for a growing population the uniform distribution is replaced with a monotonically declining distribution (because each entering cohort is bigger than the one before it), and for a declining population the distribution will be monotonically increasing.

If the population is changing by migration, the age distribution can be almost anything, depending on where, in the age range, entries and exits are happening.

And what do we see in WA State?

  1. The age distributions are not uniform across the age range – This means they cannot be reproduced with a simple “everyone enters at age 15” process.

  2. The deviations are non-monotonic – This means they reflect both in- and out-migration.

  3. The deviations vary substantially by race and region – This means that each race x region subgroup would require different treatment.

Why this matters

The incidence of HIV, like other STIs, is highest in the 25-34 year old age group (see this report on WA State HIV incidence). For our model to reproduce the observed incidence, and differences in incidence by race and region, it needs to be able to reproduce the observed race x region specific age curves.

Solutions

Add migration to the model

It might appear as though this would relatively straightforward to implement using the changes across the age group directly to increase or decrease as needed to match the observed data. But it is way more complicated than that.

  • Introducing migrant flows would also require deciding what fraction of in- and out-migrants are HIV positive, for each race and region. That is something we don’t have data on, and it does not have a natural baseline assumption.

  • In addition, some of that migration will be within-WA, between regions, while some will be entering and exiting WA. It is more likely that we have data on this, but again, the HIV status of these flows is unknown.

If we wanted to do this, we would need to add this to the data collection. But even if we did collect the data, those migration patterns are not stable, and difficult to predict into the future.

Bottom line: not feasible for this project

Restrict the age range

This is commonly done in modeling studies of MSM – age is restricted to a relatively narrow range, where the uniform distribution better approximates the observed data.

We can’t do this because we need the full age range for costing.

Bottom line: not feasible for this project

What we did

  • Represent vital dynamics only

    Entry at 15yrs old, observed mortality rates generate exits, no migration.

  • Match the observed 2019 distribution in the 2019 model starting conditions

    We did this by reverse engineering that distribution: starting the demographic part of the model in 1940 and changing the sizes of the entering cohorts so that by 2019 the age distribution had the correct shape by race and region.

  • Restrict the forward projection period

    Because all entry is at 15 yrs old, this will begin to distort the population age curves. If run for many years (say 90) that distribution will eventually be flat (if no pop growth) or monotonically increasing (if pop decline) or decreasing (if pop increase). But the distortion is minimal if the projection period is relatively short, say 5-10 yrs.

  • Adjust for population change post hoc

    This is because we only have entry at age 15. Under these conditions, to match projected population growth or composition changes in the dynamic model, all of the population-level changes would have to occur in the 15 year old entering cohort: all of the population growth, and all of the composition changes. That would distort the distribution as well.

    So we instead implement population growth and composition changes via post hoc reweighting of the model results. In the model we use the 2019 distribution of 15 yr olds as the entering cohort distribution in each subsequent year. And for model outputs, incidence, prevalence and costs, we reweight by the projected actual distribution by age, race and region.

Source document: