1 Notes on calculation of cumulative number of partner numbers

This document details the current procedure used to calculate the number of partners each node had during the previous year across all of the years of the simulation. The code used to make these calculations is found in two places.

  • Information used to calculate the cumulative number of partners is created inside of the mod.prevalence.R script and the prevalence_msm function.
  • This information is then compiled and reformatted inside of a separate script called cumulativeyearlypartnereships.R

1.1 EpiModel Data storage

Inside of EpiModel, there are three sources of information used to calculate the cumulative number of partners.

  • Every relationship that is formed during the Epimodel simulation is stored inside of a matrix which has the uid(universal id) of both the nodes, the relationship type, and the start and end time of each relationship.

  • At the end of each year in the Epimodel simulation, the cumulative number of partners over the past year for each node is calculated. For each nodeid / partner type that is listed in the matrix above, the total number of partnerships with the node of the specified type is calculated. This is stored inside of dat\(temp\)cuml_pnums.

  • The second source of information is the expanded attribute list. The usual attribute list contains information on all nodes that are currently inside of the network population. The attribute information on the individuals who exited the population is stored in a second list. The expanded attribute list is the combination of these two lists which gives the attribute information on all nodes that were ever in the network population.

  • The last source of information is a dataframe that has the total number of individuals in each age/race/region group at each time step.

1.2 Compilation and formatting

To calculate the cumulative number of partners during a given year, the following steps are completed:

  • The dataset with the cumulative number of partners (cuml_pnum) of the given year is selected. However at this point, all that is known is the uids (not the attributes of interest).

  • Next the three attributes (age, race, and region) are attached to these uids.

  • After this, for each attribute separately, each row in cuml_pnum dataset is sorted by attribute of the node, partnership type, and cumulative number of partners. The total number of observations in each attribute type, partnership type, and cumulative number of partners is tabulated.

  • The above tabulation will not include zero counts because only ties are counted (not non-ties). To calculate the number of zeros, another calculation is made. Here, we compare the total number of individuals with a cumulative number of partners greater than zero to the overall total (for a given attribute subgroup). The difference between these two values is the number of individuals with a cumulative number of partners of zero.

  • When the attribute is age group, a back calculation is done to find the age of each node during the year of interest (rather than their age the year they left the network or the simulation ended).

1.3 Code for construction

suppressPackageStartupMessages(library(tidyverse))
options(dplyr.summarise.inform = FALSE)
cuml_pnum_plot <- function(dat, attr_var = NULL, 
                           exl_zeros = FALSE, 
                           w_years = NULL,
                           jst_dat = FALSE){
  plist <- dat$temp[[1]]$cuml_pnums
  plist$uid <- plist$id 
  ## Calculate Year
  plist$year <- round((plist$at - dat$control$start) /
                              52 + dat$control$year_start)
  
  if (!is.null(w_years)) {
    plist <- plist %>% filter(year %in% w_years)
  }
  combined <- plist %>%
    mutate(mc = as.integer(ptype < 3), 
           mc_tot = mc * tot) %>%
    group_by(uid, at, year) %>%
    summarise(All = pmin(10, sum(tot)),
              MC = pmin(10, sum(mc_tot))) %>%
    pivot_longer(cols = c(All, MC), names_to = "ptype",
                 values_to = "tot") %>% 
    mutate(id = uid)
  combined$ptype <- dplyr::recode(combined$ptype, "All" = 4, 
                                  "MC" = 5)
  plist <- bind_rows(plist, combined)
  
  all_attr_df <- rbind(
    cbind(do.call(data.frame, dat$attr[[1]]),
          "rem_at" = dat$control$nsteps) ,
    dat$temp[[1]]$rem_atrs
  )
  edge_w_attr <- left_join(plist, all_attr_df, by = "uid")
  ## Fix age so it is age at given year (not at the end of the simulation)
  edge_w_attr$age <- edge_w_attr$age +
    (edge_w_attr$at - edge_w_attr$rem_at) / 52 #Time between measure (at) 
                                               # and when age saved (rem_at)
  age.breaks <- sim$param$netstats$demog$age.breaks
  edge_w_attr$age.grp <- cut(edge_w_attr$age, age.breaks,
                              labels = FALSE, include.lowest = TRUE)
  demog_inf <- dat$epi[[1]]$demog_edge_counts %>%
    ungroup() %>%
    filter(ptype == 0)
  demog_inf$year <- round((demog_inf$at - dat$control$start) / 
                            52 + dat$control$year_start)
  if (!is.null(w_years)) {
    demog_inf <- demog_inf %>% filter(year %in% w_years)
  }
  if (!is.null(attr_var)) {
    edge_w_attr$spl_var <- edge_w_attr[, attr_var, drop = TRUE]
    demog_inf$spl_var <- demog_inf[, attr_var, drop = TRUE] 
  }else{
    edge_w_attr$spl_var <- "Combined"
    demog_inf$spl_var <- "Combined"
  }
  max_id <- max(edge_w_attr$id)
  
  all_yrs <- unique(edge_w_attr$year)
  edge_w_attr <- edge_w_attr %>% select(year, spl_var, at, age, ptype, id, tot) %>%
    filter(age < 66)
  ## Adding zero counts for each network
  for (at_idx in 1:length(all_yrs)) {
    this_yr <- all_yrs[at_idx]
    num_obs <- demog_inf %>% filter(year == this_yr, age < 66) %>% 
      group_by(spl_var) %>% summarize(nodes = sum(count)) %>% 
      ungroup()
    ex_ties <- edge_w_attr %>% filter(year == this_yr, age < 66) 
    for (net_type in 1:length(unique(ex_ties$ptype))) {
      zeros <- ex_ties %>% filter(ptype == net_type) %>% 
        group_by(spl_var) %>% count() %>% ungroup()
      num_ad <- left_join(num_obs, zeros, by = "spl_var") %>%
        mutate(rows_to_add = pmax(0, nodes - n), 
               tot = 0, year = this_yr, 
               at = min(ex_ties$at), age = NA_real_,
               ptype = net_type) %>%
        tidyr::uncount(weights = rows_to_add, .id = "id") %>% 
        select(-nodes, -n)
      num_ad$id <- max_id + num_ad$id
      max_id <- max_id + nrow(num_ad)
      edge_w_attr <- rbind(edge_w_attr, num_ad)
    }
  }
  uni_year <- unique(edge_w_attr$year)
  edge_w_attr$year <- factor(x = edge_w_attr$year,
                       levels = sort(uni_year, decreasing = TRUE),
                       labels = sort(uni_year, decreasing = TRUE))
  bin_vals <-  0:max(edge_w_attr$tot)
  bin_labs <- c(0:(max(edge_w_attr$tot) - 1),
                paste0(max(edge_w_attr$tot), "+"))
  counts <- edge_w_attr %>% group_by(ptype, tot, at, spl_var, year) %>%
    count() %>% ungroup() %>% group_by(spl_var, year, ptype) %>% 
    mutate(Percentage = n / sum(n)) %>% ungroup()
  all_vals <- counts %>% expand(spl_var, year, ptype, tot)
  counts <- left_join(all_vals, counts)  %>% 
    mutate(ptype = c("Main", "Casl", "Inst", "All", "MC")[ptype])
  counts$Percentage[is.na(counts$Percentage)] <- 0
  if (exl_zeros) {counts <- counts %>% filter(tot > 0)}
  if (jst_dat) {
    return(counts)
  }else{
    ggplot(counts, aes(x = tot, y = Percentage,
                       fill = as.factor(ptype))) +
      geom_col(position = "dodge", alpha = 1) + 
      facet_grid(year ~ spl_var) + 
      scale_x_continuous(
        breaks = bin_vals, 
        labels = bin_labs 
      ) +
      xlab("Cumulative Partners in the Last Year") +
      ylab("Percent of Individuals") +
      theme(legend.position = "bottom")
  }
}

1.4 Overall cumulative number of partners

tbl_dat <- cuml_pnum_plot(sim, w_years = years_of_interest, jst_dat = TRUE)

expnd_dat <- tbl_dat %>% select(-n, -at) %>% mutate(Percentage = round(Percentage * 100, 2)) %>% pivot_wider(names_from = tot, values_from = Percentage) 

expnd_dat %>%
  select(-spl_var) %>%
  group_by("Network" = ptype) %>% 
  gt::gt(rowname_col = "year") %>% 
  gt::tab_spanner(
    label = "Cumulative Number of Partners", 
    columns = 3:13
  )
ptype Cumulative Number of Partners
0 1 2 3 4 5 6 7 8 9 10
Main
2030 Main 66.13 30.44 2.84 0.50 0.07 0.02 0.01 0.00 0.00 0.00 0.00
2025 Main 67.90 29.01 2.50 0.47 0.09 0.01 0.01 0.00 0.00 0.00 0.00
2020 Main 66.13 30.48 2.77 0.53 0.08 0.01 0.00 0.00 0.00 0.00 0.00
2019 Main 66.80 29.84 2.83 0.44 0.08 0.01 0.00 0.00 0.00 0.00 0.00
2014 Main 67.16 29.71 2.57 0.48 0.08 0.01 0.00 0.00 0.00 0.00 0.00
2011 Main 65.72 30.63 2.94 0.60 0.11 0.01 0.00 0.00 0.00 0.00 0.00
Casl
2030 Casl 33.54 31.35 19.61 9.54 3.68 1.35 0.44 0.16 0.09 0.10 0.14
2025 Casl 33.10 30.99 20.57 9.36 3.64 1.35 0.41 0.26 0.11 0.05 0.16
2020 Casl 31.29 32.03 21.16 9.66 3.82 1.15 0.44 0.17 0.15 0.05 0.08
2019 Casl 31.70 32.06 20.52 9.97 3.65 1.27 0.46 0.15 0.08 0.03 0.11
2014 Casl 32.91 31.86 20.56 9.54 3.39 0.99 0.40 0.13 0.08 0.03 0.13
2011 Casl 33.06 30.83 19.85 9.76 4.15 1.50 0.44 0.16 0.10 0.06 0.08
Inst
2030 Inst 43.23 17.69 6.61 3.84 2.62 2.29 1.70 1.48 1.22 1.17 18.15
2025 Inst 43.78 17.55 7.18 3.66 2.83 1.94 1.62 1.60 1.17 1.12 17.55
2020 Inst 45.48 17.05 6.31 3.74 2.75 2.16 1.89 1.57 1.01 1.05 16.98
2019 Inst 45.51 17.05 6.43 3.45 2.87 2.20 1.65 1.39 1.30 1.11 17.04
2014 Inst 45.63 17.19 5.97 3.33 2.74 2.42 1.98 1.51 1.21 1.00 17.02
2011 Inst 44.96 16.78 6.89 3.56 2.91 2.31 1.89 1.39 1.12 1.16 17.03
All
2030 All 6.46 19.40 18.70 13.04 8.04 5.11 3.17 2.44 1.67 1.62 20.36
2025 All 7.07 18.85 18.93 13.43 8.10 4.92 3.30 2.33 1.92 1.50 19.65
2020 All 6.95 19.06 18.97 13.43 8.42 4.98 3.26 2.53 1.87 1.45 19.08
2019 All 7.01 19.02 19.50 13.54 7.76 4.88 3.43 2.34 1.78 1.40 19.36
2014 All 7.31 19.12 19.58 13.08 8.13 4.67 3.26 2.39 1.85 1.54 19.06
2011 All 7.08 18.65 18.87 13.01 8.63 5.22 3.55 2.44 1.91 1.39 19.24
MC
2030 MC 19.39 33.86 24.39 13.09 5.72 2.18 0.70 0.29 0.12 0.11 0.15
2025 MC 20.45 32.27 25.05 13.14 5.57 2.04 0.78 0.34 0.14 0.05 0.18
2020 MC 18.59 32.33 26.06 13.64 6.02 2.08 0.70 0.25 0.17 0.06 0.09
2019 MC 18.83 32.90 25.40 13.82 5.64 2.22 0.72 0.23 0.09 0.03 0.12
2014 MC 19.72 32.99 25.76 13.33 5.39 1.77 0.53 0.24 0.08 0.04 0.13
2011 MC 19.34 32.75 24.09 13.82 6.26 2.34 0.79 0.30 0.13 0.08 0.08

1.5 Save Tables

tbl_dat_all <- cuml_pnum_plot(sim, w_years = years_of_interest,
                              jst_dat = TRUE)

tbl_dat_race <- cuml_pnum_plot(sim, w_years = years_of_interest,
                               attr_var = "race", jst_dat = TRUE)

tbl_dat_region <- cuml_pnum_plot(sim, w_years = years_of_interest,
                                 attr_var = "region", jst_dat = TRUE)

tbl_dat_age.grp <- cuml_pnum_plot(sim, w_years = years_of_interest,
                                  attr_var = "age.grp", jst_dat = TRUE)

saveRDS(list("all" = tbl_dat_all, "race" = tbl_dat_race,
             "region" = tbl_dat_region, "age.grp" = tbl_dat_age.grp),
        "../../Data/EpiModelSims/cuml_pnum_counts.rds")

1.6 Session Info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4    
## [5] readr_1.3.1     tidyr_1.1.2     tibble_3.0.4    ggplot2_3.3.2  
## [9] tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-148             fs_1.4.2                 lubridate_1.7.9         
##  [4] doParallel_1.0.15        RColorBrewer_1.1-2       httr_1.4.2              
##  [7] rprojroot_1.3-2          tools_4.0.2              backports_1.1.10        
## [10] R6_2.5.0                 DBI_1.1.0                lazyeval_0.2.2          
## [13] colorspace_1.4-1         withr_2.3.0              tidyselect_1.1.0        
## [16] compiler_4.0.2           ergm.multi_4.0-3965      cli_2.2.0               
## [19] rvest_0.3.6              gt_0.2.2                 xml2_1.3.2              
## [22] network_1.17.0-585       sass_0.2.0               scales_1.1.1            
## [25] checkmate_2.0.0          DEoptimR_1.0-8           robustbase_0.93-6       
## [28] digest_0.6.27            rmarkdown_2.4            networkDynamic_0.10.2   
## [31] pkgconfig_2.0.3          htmltools_0.5.0          egonet_0.0.1.0          
## [34] dbplyr_1.4.4             rlang_0.4.9              readxl_1.3.1            
## [37] rstudioapi_0.11          generics_0.1.0           jsonlite_1.7.1          
## [40] statnet.common_4.4.0-300 magrittr_2.0.1           Matrix_1.2-18           
## [43] Rcpp_1.0.5               munsell_0.5.0            fansi_0.4.1             
## [46] ape_5.4-1                lifecycle_0.2.0          stringi_1.5.3           
## [49] yaml_2.2.1               MASS_7.3-51.6            grid_4.0.2              
## [52] blob_1.2.1               parallel_4.0.2           crayon_1.3.4            
## [55] lattice_0.20-41          haven_2.3.1              splines_4.0.2           
## [58] hms_0.5.3                knitr_1.30               pillar_1.4.7            
## [61] EpiModel_2.0.3           codetools_0.2-16         lpSolve_5.6.15          
## [64] rle_0.9.2-223            srvyr_0.3.10             reprex_0.3.0            
## [67] glue_1.4.2               evaluate_0.14            trust_0.1-8             
## [70] mitools_2.4              modelr_0.1.8             deSolve_1.28            
## [73] vctrs_0.3.6              foreach_1.5.0            cellranger_1.1.0        
## [76] gtable_0.3.0             assertthat_0.2.1         tergmLite_2.2.0         
## [79] tergm_4.0.0-2013         xfun_0.18                broom_0.7.1             
## [82] survey_4.0               ergm_4.0-5824            coda_0.19-4             
## [85] survival_3.2-7           iterators_1.0.12         ellipsis_0.3.1          
## [88] here_0.1