msm.pop.totals_2019 <- WApopdata::msm.pop.totals_2019
PLWH_2019 <- msm.pop.totals_2019$MSM_PLWH_WA
pop2019 <- WApopdata::msm_all_age10_race_region_2019
df_2019 <- pop2019 %>%
mutate(age.grp10 = ifelse(age.grp10 == "55-65",
"55-64", age.grp10)) %>% left_join(PLWH_2019) %>%
mutate(prev2019 = UB/num)
msm.pop.totals_2015 <- WApopdata::msm.pop.totals_2015
PLWH_2015 <- msm.pop.totals_2015$MSM_PLWH_WA
pop2015 <- WApopdata::msm_all_age10_race_region_2015
df_2015 <- pop2015 %>%
mutate(age.grp10 = ifelse(age.grp10 == "55-65",
"55-64", age.grp10)) %>% left_join(PLWH_2015)%>%
mutate(prev2015 = UB/num)
msm.pop.totals_2014 <- WApopdata::msm.pop.totals_2014
PLWH_2014 <- msm.pop.totals_2014$MSM_PLWH_WA
pop2014 <- WApopdata::msm_all_age10_race_region_2014
df_2014 <- pop2014 %>%
mutate(age.grp10 = ifelse(age.grp10 == "55-65",
"55-64", age.grp10)) %>%
left_join(PLWH_2014) %>%
mutate(prev2014 = UB/num)
df <- df_2014 %>%
select(age.grp10, race,
region, pop_2014=num, HIVUB_2014=UB, prev2014) %>%
left_join(df_2015 %>%
select(age.grp10, race, region,
pop_2015=num, HIVUB_2015=UB, prev2015)) %>%
left_join(df_2019 %>%
select(age.grp10, race, region,
pop_2019=num, HIVUB_2019=UB, prev2019))
# mutate(HIVdiff19_15 = HIVUB_2019 - HIVUB_2015,
# pctdiff = HIVdiff19_15/HIVUB_2015)
We need HIV prevalence targets to calibrate and validate our epidemic model. Prevalence calculations are based on data from WADOH on PLWH (for 2014, 15 and 19), and estimates of the WA MSM population from a combination of census and survey data. The “Numerator” sections of this document focus on the WADOH data, the “Denominator” sections focus on the MSM population size estimates.
The purpose of this report is to get WADOH feedback on:
Whether our HIV prevalence estimates, and the trends they suggest, seem reasonable.
The 2019 estimates suggest Hispanic MSM have the highest overall prevalence. Question: Is this consistent with what you expect?
There is considerable uncertainty in the denominator estimates, with alternative values available from different sources, that has a very large impact on the resulting HIV prevalence estimates when broken down by age and race. Question: Which of these estimates seems more reasonable?
Note the interactive TOC on the left for help navigating this document. If you click a section, the subsections become visible.
HIV prevalence, as a rate, is defined by
\[ Prevalence = \frac{PLWH}{Population} \] To calibrate our model, we break this overall prevalence down by age group, race and region, and use the group specific estimates as targets for our simulated outcomes.
There is uncertainty in both the numerator and denominator of these estimates.
The numerator for us comes from the WA DOH HIV surveillance system, and represents persons who have been diagnosed with HIV, are living in the state, and whose risk category is identified as MSM. There are two aspects of these estimates worth noting:
Risk category is not known with certainty. We represent this here by having a lower bound estimate that includes the MSM and MSM/IDU groups, and an upper bound estimate that adds the CDC imputed MSM risk assignment from the NIR/NRR group.
These estimates do not include the undiagnosed cases.
Steve Erly provided the aggregate totals by age, race and region.
For this we need an estimate of the number of MSM living in WA state, broken down by age, race and region. We use two data sources for this estimate:
Our estimates of the number of MSM (the “Grey” estimates) are then
\[ MSM_{(cnty, age, race)}(2019) = Males_{(cnty, age, race)}(2019) * MSM.percent_{(cnty)}(2013) \]
This assumes the age and race profile for MSM is the same as that for all men within each county.
The county estimates are then aggregated up into 3 regions: King Co, Western Washington and Eastern Washington.
The Emory CAMP folks have published two new papers with estimates of MSM by State (rather than county): by age group (Jones et al., 2018) and race (Rosenberg et al., 2018). Links to all of these estimates are given in the References section.
These new estimates start with the same county-level estimates as the Grey estimates we have been using. But the age-specific estimates from Jones include modifications based on the General Social Survey estimates of same sex behavior by age, and the race-specific estimates of Rosenberg include modifications based on the ACS distribution of race by county.
Our Grey-based estimates of MSM in WA by age group and race are constructed assuming that the age/race distribution of MSM in each county matches the distribution for all males in that county. The new CAMP estimates do not make this assumption. So comparing our Grey-based WA state estimates to the new CAMP state estimates may highlight additional uncertainties in our our HIV prevalence targets.
All of the CAMP estimates (including Grey) are based on the 2014 population counts, so the first step is to update these for 2019. This is, in principle, straightforward. They report the number of MSM broken down by attribute, so we can take the percent in each attribute group, and multiply this by our 2019 Grey estimate total.
These are the data from the WA DOH that we use as numerators to construct prevalence.
The upper and lower bounds are plotted.
colors <- c("2019" = RColorBrewer::brewer.pal(3, "Spectral")[1],
"2015" = RColorBrewer::brewer.pal(3, "Spectral")[2],
"2014" = RColorBrewer::brewer.pal(3, "Spectral")[3])
# Raw PLWH data for WA
ggplot(data = PLWH_2014, aes(x=as.numeric(factor(age.grp10)))) +
geom_ribbon(aes(ymin = LB, ymax = UB, color="2014"),
size=.9, alpha=0.2) +
geom_ribbon(data = PLWH_2015,
aes(ymin = LB, ymax = UB, color="2015"),
size=.9, alpha=0.2) +
geom_ribbon(data = PLWH_2019,
aes(ymin = LB, ymax = UB, color="2019"),
size=.9, alpha=0.2) +
facet_grid(rows = vars(race), cols=vars(region),
scales = "free_y") +
labs(title = "PLWH by age, race and region",
caption = "Source: WADOH; Note Y axis scale changes by row",
x = "Age Group",
y = "Count",
color = "Year") +
scale_color_manual(values = colors) +
scale_x_continuous(breaks = 1:8,
labels = unique(PLWH_2014$age.grp10)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
# +
# guides(colour = guide_legend(override.aes = list(alpha = .1)))
“Group Index” means the list of age x race x region subgroups that we are estimating prevalence for. There are 10 x 3 x 3 = 90 index levels. Below we plot the 2014 HIV counts for each group on the x-axis and the 2015 and 2019 HIV counts on the y-axis. The diagonal line shows what we would expect if the 2015 and 2019 counts were equal to 2014. Deviations show which groups (and year) have changed the most.
Note the plot is interactive, so if you hover on a point, it will pop up information on the attributes of that group.
colors <- c("2019" = "red",
"2015" = "blue")
p <- ggplot(data=df,
aes(x=HIVUB_2014,
text = paste("age:", age.grp10,
"</br>race:", race,
"</br>region:", region))) +
geom_point(aes(y = HIVUB_2015, color = "2015"),
size=2, alpha = 0.5) +
geom_point(aes(y = HIVUB_2019, color = "2019"),
size=4, alpha = 0.2) +
geom_abline(aes(intercept=0, slope=1)) +
labs(title = "Group PLWH Count Comparisons Over Time",
x = "2014 HIV Counts",
y = "2015 or 2019 HIV Counts",
color = "Year") +
scale_color_manual(values = colors)
ggplotly(p)
#GGally::ggpairs(df)
These are based on Census estimates of the WA male population, and Grey estimates from the 2013 Emory CAMP survey of MSM by county in the US.
colors <- c("2019" = RColorBrewer::brewer.pal(3, "Spectral")[1],
"2015" = RColorBrewer::brewer.pal(3, "Spectral")[2],
"2014" = RColorBrewer::brewer.pal(3, "Spectral")[3])
ggplot(data = pop2014, aes(x=as.numeric(factor(age.grp10)))) +
geom_line(aes(y = num, color="2014"),
size=.9, alpha=1) +
geom_line(data = pop2015,
aes(y = num, color="2015"),
size=.9, alpha=1) +
geom_line(data = pop2019,
aes(y = num, color="2019"),
size=.9, alpha=1) +
facet_grid(rows = vars(race), cols=vars(region),
scales = "free_y") +
labs(title = "MSM Population by age, race and region",
caption = "Source: Census and Grey 2013 data; Note Y axis scale changes by row",
x = "Age Group",
y = "Count",
color = "Year") +
scale_color_manual(values = colors) +
scale_x_continuous(breaks = 1:8,
labels = unique(pop2014$age.grp10)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
WADOH estimates of PLWH in the numerator, Grey based estimates of MSM population in the denominator.
ggplot(data = df, aes(x=as.numeric(factor(age.grp10)))) +
geom_line(aes(y = prev2014, color="2014"),
size=.9, alpha=1) +
geom_line(aes(y = prev2015, color="2015"),
size=.9, alpha=1) +
geom_line(aes(y = prev2019, color="2019"),
size=.9, alpha=1) +
facet_grid(rows = vars(race), cols=vars(region),
scales = "free_y") +
labs(title = "MSM HIV Prevalence Rate by age, race and region",
caption = "Source: WADOH, Census and Grey 2013 data; Note Y axis scale changes by row",
x = "Age Group",
y = "Percent",
color = "Year") +
scale_color_manual(values = colors) +
scale_x_continuous(breaks = 1:8,
labels = unique(df$age.grp10)) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
The following plots compare our original Grey estimates to thee “alternate estimates” from CAMP. See the Introduction for more detail on these alternate estimates.
There is a large difference in the age distributions for our Grey-based estimates, and the new Jones estimate from CAMP.
The differences in the Grey estimates (2015 vs 2019) reflect the changing distribution of males by age in WA state – the %MSM is the same for these two estimates.
The differences in the 2019 estimates (Grey vs. Jones) reflect the difference in the estimated %MSM from these two analyses – the number of males by age is the same for these two estimates.
pop.age <- msm.pop.totals_2015$pop.age.all
p <- ggplot(data = msm.pop.totals_2015$pop.age.all,
aes(x = as.numeric(factor(age.grp10)))) +
geom_line(aes(y = num, color = "Grey 2015")) +
geom_point(aes(y = num, color = "Grey 2015")) +
scale_x_continuous(breaks = 1:8,
labels = msm.pop.totals_2015$pop.age.all$age.grp10) +
labs(title = "Estimated Number of MSM in WA by age",
x = "Age",
y = "Number",
color = "Estimate") +
geom_line(data = msm.pop.totals_2019$pop.age.all,
aes(x = as.numeric(factor(age.grp10)),
y = num, color = "Grey 2019")) +
geom_point(data = msm.pop.totals_2019$pop.age.all,
aes(x = as.numeric(factor(age.grp10)),
y = num, color = "Grey 2019")) +
geom_line(data = msm.pop.totals_2019$pop.age.all,
aes(x = as.numeric(factor(age.grp10)),
y = num.alt, color = "Jones 2019")) +
geom_point(data = msm.pop.totals_2019$pop.age.all,
aes(x = as.numeric(factor(age.grp10)),
y = num.alt, color = "Jones 2019"))
ggplotly(p)
The age comparison complicated by the fact that Jones et al. have a slightly different grouping: the bottom age group is 18-24, rather than 15-24, and they consolidate the top age groups into 55+. To address this, I adjusted their youngest group by 10/7, re-estimated the percentages in each group with the new total, and redistributed the count in 55+ proportional to the Grey estimates of the percent in each group (55-64, 65-74, 75-84 and 85+).
The differences in estimates of the number of MSM by race minimal. Comparison requires adjustment for the fact that Rosenberg’s estimates are for “Black alone” (BA), while our Grey-based estimates are for “Black alone or in combination” (BAC). Analysis of the WA census data for 2014-2019 shows a consistent fraction of 23% of the BAC group is in the “Combination” category. So we inflate Rosenberg’s BA estimates by 1/(1-0.23) (about 37%) to account for this.
dfbar <- msm.pop.totals_2019$pop.race.all %>%
left_join(msm.pop.totals_2015$pop.race.all %>%
select(race, num.2015=num)) %>%
select(race, Grey2019 = num, Grey2015 = num.2015,
Rosenberg = num.alt) %>%
pivot_longer(-race,
names_to = "source",
values_to = "num")
ggplot(dfbar, aes(x=race, y=num, fill=source)) +
geom_col(position = "dodge2", alpha = 0.5) +
labs(title = "Estimated Number of MSM in WA by race",
x = "Race",
y = "Number",
fill = "Estimate")
Note that we represent both forms of uncertainty here. The uncertainty in the numerator (number of PLWH) is shown as the upper and lower bounds; the uncertainty in the denominator (number of MSM) is shown as the comparison between estimates.
colors <- c("Grey" = "gray", "Jones" = "blue", "Rosenberg" = "green")
ggplot(msm.pop.totals_2019$pop.age.pos,
aes(x = as.numeric(factor(age.grp10)))) +
geom_ribbon(aes(ymin = prev_lb, ymax = prev_ub,
fill = "Grey"), alpha = 0.5) +
geom_ribbon(aes(ymin = prev_lb_alt, ymax = prev_ub_alt,
fill = "Jones"), alpha = 0.5) +
scale_x_continuous(breaks = 1:8,
labels = msm.pop.totals_2019$pop.age.pos$age.grp10) +
labs(title = "HIV prevalence estimates by age (2019)",
x = "Age",
y = "Prevalence",
caption = "2019: WADOH PLWH numerator, alternate denominators",
fill = "Estimate")
upper <- msm.pop.totals_2019$pop.race.pos %>%
select(race, Grey = prev_ub, Rosenberg = prev_ub_alt) %>%
pivot_longer(-race,
names_to = "source",
values_to = "upper")
lower <- msm.pop.totals_2019$pop.race.pos %>%
select(race, Grey = prev_lb, Rosenberg = prev_lb_alt) %>%
pivot_longer(-race,
names_to = "source",
values_to = "lower")
dfbar <- msm.pop.totals_2019$pop.race.pos %>%
rowwise() %>%
mutate(prev_mid = mean(c(prev_lb, prev_ub)),
prev_mid_alt = mean(c(prev_lb_alt, prev_ub_alt))) %>%
select(race, Grey = prev_mid, Rosenberg = prev_mid_alt) %>%
pivot_longer(-race,
names_to = "source",
values_to = "prev") %>%
left_join(lower) %>%
left_join(upper)
ggplot(dfbar, aes(x=race, y=prev, fill=source)) +
geom_col(position = "dodge2", alpha = 0.5) +
geom_errorbar(
aes(ymin = lower, ymax = upper),
position = position_dodge2(width = 0.5, padding = 0.5)
) +
labs(title = "HIV prevalence estimates by race (2019)",
x = "Race",
y = "Prevalence",
caption = "2019: WADOH PLWH numerator, alternate denominators",
fill = "Estimate")
Grey estimates by county:
Jones estimates by age:
Rosenberg estimates by race: