1 Overview

In this document, I will be reviewing some of the main functionality of the EpiModelWHAMPDX package

2 Understanding the epi object

Before we get into looking at the EpiModelWHAMPDX functions, it will be valuable to understand the structure of the epi object. The epi object is an object inside of the net sim object. For each epi parameter that is tracked, there is a dataframe, and each row corresponds to the time step, and each column corresponds to a different simulation:

## Reading in some example data
simnet <- readRDS(here("data", "examp_simnet.rds"))
one_examp <- simnet$epi$num.undx.ovr
one_examp[nrow(one_examp) - c(1:10), ]
##      sim1 sim2
## 4681  182  169
## 4680  182  169
## 4679  182  169
## 4678  183  168
## 4677  181  168
## 4676  181  169
## 4675  181  168
## 4674  181  169
## 4673  181  169
## 4672  180  167

Looking at the names of dataframes inside of the epi object, we see many similarities between the names

head(names(simnet$epi))
## [1] "i.num.employer"     "i.num.individual"   "i.num.medicarecaid"
## [4] "i.num.none"         "i.num.wship"        "i.num.ovr"

Here we see for i.num that we have i.num.employer for employer health insurance. So, since i.num is the number infected, i.num.employer is the number who are infected who are on employer health insurance. For this example, we are only considering a subset of the epi values that are tracked. They are:

map(strsplit(names(simnet$epi), "\\."), function(x){
  if (length(x) > 1) {
    return(paste0(x[-length(x)], collapse = "."))
  }else{
    return(x)
  }
}) %>% as.character() %>% unique()
## [1] "i.num"    "num"      "new.dx"   "num.undx"

So, we are only looking at the infected number i.num, number total num, newly diagnosed new.dx, and number undiagnosed num.undx.

3 Creating plot data

A major functionality of this package is to be able to provide summaries from the EpiModel epi object, broken down across different attributes. The specification for the desired plot is made with the use of a list object. For now, we will simply specify a plot with the number of individuals in the population:

simnet$epi$num <- simnet$epi$num.ovr

plot.list <- list(name = "Number",
                  # Name of the variable of interest.  This should
                  # match the name of the target variable name if you 
                  # have a target for the plot
                  sec_title = "Section Title",
                  plot_name = "Plot Name",
                  plot_cap = "Example Plot Caption",
                  plot_ylab = "Y-Label",
                  plt_type = "line",
                  vars = c("num."),
                  # variables as they appear in the epi object (without endings)
                  sum_fun = function(x) {x}
                  # A function of the variables above. The function here should
                  # take as many arguments as the length of the vars argument.
                  )

One of the main functions inside the package is called make_epi_plot_data. This function takes in the netsim object and the plot list and returns the object that will be used to create our plot:

plot_dat <- make_epi_plot_data(simnet, plot.list)
plot_dat
## $plot_title
## [1] "Plot Name"
## 
## $plot_cap
## [1] "Example Plot Caption"
## 
## $plot_ylab
## [1] "Y-Label"
## 
## $epi_data
## # A tibble: 87,932 x 7
##     year name             value meas   sub_cat_name       cat_name simno
##    <dbl> <chr>            <dbl> <chr>  <chr>              <chr>    <int>
##  1 1944. Number.ovr       13989 Number Overall            ovr          1
##  2 1944. Number.H          1595 Number Hispanic           race         1
##  3 1944. Number.B           949 Number Black              race         1
##  4 1944. Number.O         11445 Number Other              race         1
##  5 1944. Number.EasternWA  1350 Number Eastern Washington region       1
##  6 1944. Number.WesternWA  4638 Number Western Washington region       1
##  7 1944. Number.King       8001 Number King County        region       1
##  8 1944. Number.1          1967 Number 15-24              age.grp      1
##  9 1944. Number.2          3634 Number 25-34              age.grp      1
## 10 1944. Number.3          3112 Number 35-44              age.grp      1
## # … with 87,922 more rows
## 
## $impt_years
##         name year
## 1 mtch.demog 2019
## 2 adap.start 2014
## 3 pdap.start 2017
## 4 prep.start 2012
## 5 cure.start 1988
## 6   cure.end 1990

4 Creating the Plot

This plot data is in turn passed to the plot_epi function. It is worth noting that the plot_dat object will have data for breakdowns by attributes, so it is important to specify which attribute you wish to break over. Here we don’t break down across an attribute, so we specify brk_across == "ovr". Two lines appear here because the simulation we use here ran two separate simulations.

plot_epi(plot_dat, year_range = c(1980, 2030), brk_across = "ovr")

It is also possible look at this plot broken down across race:

plot_epi(plot_dat, year_range = c(1980, 2030), brk_across = "race")

5 A Second Example

Now, lets say we want to look at the proportion of individuals in the population who are infected. Because we only have i.num, num, new.dx, and num.undx, we cannot look at the proportion of individuals who are infected directly. However, we can create a new variable by specifying it in our plot list.

plot.list2 <- list(name = "prop.inf",
                   sec_title = "Prevalence",
                   plot_name = "Proportion Infected",
                   plot_cap = "Among individuals who have ever had a partner.",
                   plot_ylab = "Proportion",
                   plt_type = "line",
                   vars = c("num.", "i.num."),
                   sum_fun = function(x, y) {y / x}
)
plot_dat2 <- make_epi_plot_data(simnet, plot.list2)
plot_epi(plot_dat2, year_range = c(1980, 2030), brk_across = "region")

Because plot_epi returns a ggplot object, we can add additional arguments to this returned plot if we want:

plot_epi(plot_dat2, year_range = c(1980, 2030), brk_across = "region") + 
  coord_cartesian(ylim = c(0, 0.15))

6 A Third Example

Now, say we want to look at how close our simulation comes to a given target. The EpiModelWHAMPDX package expects the targets to be given in the following format:

target.df <- readRDS(here("data", "WHAMP.dx.targs.rds"))
target.df %>% filter(measure == "prev")
##       subcat         low        targ        high measure     cat
## 1        ovr 0.072436668 0.074466238 0.076495807    prev     ovr
## 2          1 0.005359092 0.005490565 0.005622038    prev age.grp
## 3          2 0.045395856 0.046493042 0.047590228    prev age.grp
## 4          3 0.078133932 0.080358534 0.082583136    prev age.grp
## 5          4 0.124441520 0.128515597 0.132589674    prev age.grp
## 6          5 0.138160197 0.141530050 0.144899902    prev age.grp
## 7          6 0.069692406 0.071692388 0.073692370    prev age.grp
## 8          B 0.106104879 0.111300485 0.116496091    prev    race
## 9          H 0.111336730 0.115685881 0.120035031    prev    race
## 10         O 0.065386448 0.066910658 0.068434867    prev    race
## 11 EasternWA 0.089665846 0.093321679 0.096977512    prev  region
## 12      King 0.072088232 0.073490228 0.074892225    prev  region
## 13 WesternWA 0.067831613 0.070396462 0.072961310    prev  region
##               sub_cat
## 1             Overall
## 2               15-24
## 3               25-34
## 4               35-44
## 5               45-54
## 6               55-65
## 7                 66+
## 8               Black
## 9            Hispanic
## 10              Other
## 11 Eastern Washington
## 12        King County
## 13 Western Washington

The columns:

  • subcat: The subcategory desired (as an example 1 meaning the lowest age group)
  • low: The lower bound for the target
  • targ: the best estimate for the target
  • high: the upper bound for the target
  • measure: the measure for which the target is given. This should match the name element in the plot list object.
  • cat: the overall category name (race / region / age.grp, etc.)
  • sub_cat: A prettied up version of subcat

Now, we wish to display this target against what was observed in the simulation. This measure is slightly different from what we saw above since we are now only interested in the diagnosed prevalence:

plot.list3 <- list(name = "prev",
                   sec_title = "Diagnosed Prevalence",
                   plot_name = "Proportion with diagnosed infection",
                   plot_cap = "Among individuals who have ever had a partner.",
                   plot_ylab = "Proportion",
                   plt_type = "line",
                   vars = c("num.", "i.num.", "num.undx."),
                   sum_fun = function(x, y, z) {(y - z) / x}
)
plot_dat3 <- make_epi_plot_data(simnet, plot.list3)
plot_epi(plot_dat3, year_range = c(1980, 2030),
         brk_across = "ovr", targets = target.df) +
  coord_cartesian(ylim = c(0, 0.15))

Above, we can see that there is a confidence interval shown for the overall display. To make sure the plot does not get too busy, this is not done when summaries are broken down by attribute:

plot_epi(plot_dat3, year_range = c(1980, 2030),
         brk_across = "race", targets = target.df) +
  coord_cartesian(ylim = c(0, 0.15))

7 A Fourth Example

In some cases, we want to use more complicated functions of our epi parameters. This is possible with the current functionality of the package. Here we first look at the number of newly diagnosed individuals at each time step:

plot.list4 <- list(name = "newdx",
                   sec_title = "Number of Newly Diagnosed",
                   plot_name = "Newly Diagnosed",
                   plot_cap = "",
                   plot_ylab = "Count",
                   plt_type = "line",
                   vars = c("new.dx."),
                   sum_fun = function(x) {x}
)
plot_dat4 <- make_epi_plot_data(simnet, plot.list4)
plot_epi(plot_dat4, year_range = c(1980, 2030),
         brk_across = "ovr", targets = target.df)

We see above that this summary is far too noisy to be useful. So instead, we can look at the number of individuals diagnosed in the last year:

rolling_year_sum <- function(x, years = 1, avg = FALSE){
  val <- data.table::frollsum(fill(data.frame(x), "x")$x, years * 52)
  if (avg) {val <- val / (years * 52)}
  return(val)
}

plot.list5 <- list(name = "newdx",
                   sec_title = "Number of Newly Diagnosed in the Last Year",
                   plot_name = "Newly Diagnosed",
                   plot_cap = "In the last year. The spike here is caused by the initiation \n of PrEP which forces individuals to test at a higher rate.",
                   plot_ylab = "Count",
                   plt_type = "line",
                   vars = c("new.dx."),
                   sum_fun = function(x) {rolling_year_sum(x)}
)
plot_dat5 <- make_epi_plot_data(simnet, plot.list5)
plot_epi(plot_dat5, year_range = c(1980, 2030),
         brk_across = "ovr", targets = target.df)