Arrival Module Summary

1.1 EpiModel Data storage

Inside of EpiModel, there are three sources of information used to calculate the cumulative number of partners.

Every relationship that is formed during the Epimodel simulation is stored inside of a matrix which has the uid(universal id) of both the nodes, the relationship type, and the start and end time of each relationship.
At the end of each year in the Epimodel simulation, the cumulative number of partners over the past year for each node is calculated. For each nodeid / partner type that is listed in the matrix above, the total number of partnerships with the node of the specified type is calculated. This is stored inside of dat\(temp\)cuml_pnums.
The second source of information is the expanded attribute list. The usual attribute list contains information on all nodes that are currently inside of the network population. The attribute information on the individuals who exited the population is stored in a second list. The expanded attribute list is the combination of these two lists which gives the attribute information on all nodes that were ever in the network population.
The last source of information is a dataframe that has the total number of individuals in each age/race/region group at each time step.

1.2 Compilation and formatting

To calculate the cumulative number of partners during a given year, the following steps are completed:

The dataset with the cumulative number of partners (cuml_pnum) of the given year is selected. However at this point, all that is known is the uids (not the attributes of interest).
Next the three attributes (age, race, and region) are attached to these uids.
After this, for each attribute separately, each row in cuml_pnum dataset is sorted by attribute of the node, partnership type, and cumulative number of partners. The total number of observations in each attribute type, partnership type, and cumulative number of partners is tabulated.
The above tabulation will not include zero counts because only ties are counted (not non-ties). To calculate the number of zeros, another calculation is made. Here, we compare the total number of individuals with a cumulative number of partners greater than zero to the overall total (for a given attribute subgroup). The difference between these two values is the number of individuals with a cumulative number of partners of zero.
When the attribute is age group, a back calculation is done to find the age of each node during the year of interest (rather than their age the year they left the network or the simulation ended).

1.3 Code for construction

suppressPackageStartupMessages(library(tidyverse))
options(dplyr.summarise.inform = FALSE)
cuml_pnum_plot <- function(dat, attr_var = NULL, 
                           exl_zeros = FALSE, 
                           w_years = NULL,
                           jst_dat = FALSE){
  plist <- dat$temp[[1]]$cuml_pnums
  plist$uid <- plist$id 
  ## Calculate Year
  plist$year <- round((plist$at - dat$control$start) /
                              52 + dat$control$year_start)
  
  if (!is.null(w_years)) {
    plist <- plist %>% filter(year %in% w_years)
  }
  combined <- plist %>%
    mutate(mc = as.integer(ptype < 3), 
           mc_tot = mc * tot) %>%
    group_by(uid, at, year) %>%
    summarise(All = pmin(10, sum(tot)),
              MC = pmin(10, sum(mc_tot))) %>%
    pivot_longer(cols = c(All, MC), names_to = "ptype",
                 values_to = "tot") %>% 
    mutate(id = uid)
  combined$ptype <- dplyr::recode(combined$ptype, "All" = 4, 
                                  "MC" = 5)
  plist <- bind_rows(plist, combined)
  
  all_attr_df <- rbind(
    cbind(do.call(data.frame, dat$attr[[1]]),
          "rem_at" = dat$control$nsteps) ,
    dat$temp[[1]]$rem_atrs
  )
  edge_w_attr <- left_join(plist, all_attr_df, by = "uid")
  ## Fix age so it is age at given year (not at the end of the simulation)
  edge_w_attr$age <- edge_w_attr$age +
    (edge_w_attr$at - edge_w_attr$rem_at) / 52 #Time between measure (at) 
                                               # and when age saved (rem_at)
  age.breaks <- sim$param$netstats$demog$age.breaks
  edge_w_attr$age.grp <- cut(edge_w_attr$age, age.breaks,
                              labels = FALSE, include.lowest = TRUE)
  demog_inf <- dat$epi[[1]]$demog_edge_counts %>%
    ungroup() %>%
    filter(ptype == 0)
  demog_inf$year <- round((demog_inf$at - dat$control$start) / 
                            52 + dat$control$year_start)
  if (!is.null(w_years)) {
    demog_inf <- demog_inf %>% filter(year %in% w_years)
  }
  if (!is.null(attr_var)) {
    edge_w_attr$spl_var <- edge_w_attr[, attr_var, drop = TRUE]
    demog_inf$spl_var <- demog_inf[, attr_var, drop = TRUE] 
  }else{
    edge_w_attr$spl_var <- "Combined"
    demog_inf$spl_var <- "Combined"
  }
  max_id <- max(edge_w_attr$id)
  
  all_yrs <- unique(edge_w_attr$year)
  edge_w_attr <- edge_w_attr %>% select(year, spl_var, at, age, ptype, id, tot) %>%
    filter(age < 66)
  ## Adding zero counts for each network
  for (at_idx in 1:length(all_yrs)) {
    this_yr <- all_yrs[at_idx]
    num_obs <- demog_inf %>% filter(year == this_yr, age < 66) %>% 
      group_by(spl_var) %>% summarize(nodes = sum(count)) %>% 
      ungroup()
    ex_ties <- edge_w_attr %>% filter(year == this_yr, age < 66) 
    for (net_type in 1:length(unique(ex_ties$ptype))) {
      zeros <- ex_ties %>% filter(ptype == net_type) %>% 
        group_by(spl_var) %>% count() %>% ungroup()
      num_ad <- left_join(num_obs, zeros, by = "spl_var") %>%
        mutate(rows_to_add = pmax(0, nodes - n), 
               tot = 0, year = this_yr, 
               at = min(ex_ties$at), age = NA_real_,
               ptype = net_type) %>%
        tidyr::uncount(weights = rows_to_add, .id = "id") %>% 
        select(-nodes, -n)
      num_ad$id <- max_id + num_ad$id
      max_id <- max_id + nrow(num_ad)
      edge_w_attr <- rbind(edge_w_attr, num_ad)
    }
  }
  uni_year <- unique(edge_w_attr$year)
  edge_w_attr$year <- factor(x = edge_w_attr$year,
                       levels = sort(uni_year, decreasing = TRUE),
                       labels = sort(uni_year, decreasing = TRUE))
  bin_vals <-  0:max(edge_w_attr$tot)
  bin_labs <- c(0:(max(edge_w_attr$tot) - 1),
                paste0(max(edge_w_attr$tot), "+"))
  counts <- edge_w_attr %>% group_by(ptype, tot, at, spl_var, year) %>%
    count() %>% ungroup() %>% group_by(spl_var, year, ptype) %>% 
    mutate(Percentage = n / sum(n)) %>% ungroup()
  all_vals <- counts %>% expand(spl_var, year, ptype, tot)
  counts <- left_join(all_vals, counts)  %>% 
    mutate(ptype = c("Main", "Casl", "Inst", "All", "MC")[ptype])
  counts$Percentage[is.na(counts$Percentage)] <- 0
  if (exl_zeros) {counts <- counts %>% filter(tot > 0)}
  if (jst_dat) {
    return(counts)
  }else{
    ggplot(counts, aes(x = tot, y = Percentage,
                       fill = as.factor(ptype))) +
      geom_col(position = "dodge", alpha = 1) + 
      facet_grid(year ~ spl_var) + 
      scale_x_continuous(
        breaks = bin_vals, 
        labels = bin_labs 
      ) +
      xlab("Cumulative Partners in the Last Year") +
      ylab("Percent of Individuals") +
      theme(legend.position = "bottom")
  }
}

1.4 Overall cumulative number of partners

tbl_dat <- cuml_pnum_plot(sim, w_years = years_of_interest, jst_dat = TRUE)

expnd_dat <- tbl_dat %>% select(-n, -at) %>% mutate(Percentage = round(Percentage * 100, 2)) %>% pivot_wider(names_from = tot, values_from = Percentage) 

expnd_dat %>%
  select(-spl_var) %>%
  group_by("Network" = ptype) %>% 
  gt::gt(rowname_col = "year") %>% 
  gt::tab_spanner(
    label = "Cumulative Number of Partners", 
    columns = 3:13
  )

	ptype	Cumulative Number of Partners
	ptype	0	1	2	3	4	5	6	7	8	9	10
Main
2030	Main	66.13	30.44	2.84	0.50	0.07	0.02	0.01	0.00	0.00	0.00	0.00
2025	Main	67.90	29.01	2.50	0.47	0.09	0.01	0.01	0.00	0.00	0.00	0.00
2020	Main	66.13	30.48	2.77	0.53	0.08	0.01	0.00	0.00	0.00	0.00	0.00
2019	Main	66.80	29.84	2.83	0.44	0.08	0.01	0.00	0.00	0.00	0.00	0.00
2014	Main	67.16	29.71	2.57	0.48	0.08	0.01	0.00	0.00	0.00	0.00	0.00
2011	Main	65.72	30.63	2.94	0.60	0.11	0.01	0.00	0.00	0.00	0.00	0.00
Casl
2030	Casl	33.54	31.35	19.61	9.54	3.68	1.35	0.44	0.16	0.09	0.10	0.14
2025	Casl	33.10	30.99	20.57	9.36	3.64	1.35	0.41	0.26	0.11	0.05	0.16
2020	Casl	31.29	32.03	21.16	9.66	3.82	1.15	0.44	0.17	0.15	0.05	0.08
2019	Casl	31.70	32.06	20.52	9.97	3.65	1.27	0.46	0.15	0.08	0.03	0.11
2014	Casl	32.91	31.86	20.56	9.54	3.39	0.99	0.40	0.13	0.08	0.03	0.13
2011	Casl	33.06	30.83	19.85	9.76	4.15	1.50	0.44	0.16	0.10	0.06	0.08
Inst
2030	Inst	43.23	17.69	6.61	3.84	2.62	2.29	1.70	1.48	1.22	1.17	18.15
2025	Inst	43.78	17.55	7.18	3.66	2.83	1.94	1.62	1.60	1.17	1.12	17.55
2020	Inst	45.48	17.05	6.31	3.74	2.75	2.16	1.89	1.57	1.01	1.05	16.98
2019	Inst	45.51	17.05	6.43	3.45	2.87	2.20	1.65	1.39	1.30	1.11	17.04
2014	Inst	45.63	17.19	5.97	3.33	2.74	2.42	1.98	1.51	1.21	1.00	17.02
2011	Inst	44.96	16.78	6.89	3.56	2.91	2.31	1.89	1.39	1.12	1.16	17.03
All
2030	All	6.46	19.40	18.70	13.04	8.04	5.11	3.17	2.44	1.67	1.62	20.36
2025	All	7.07	18.85	18.93	13.43	8.10	4.92	3.30	2.33	1.92	1.50	19.65
2020	All	6.95	19.06	18.97	13.43	8.42	4.98	3.26	2.53	1.87	1.45	19.08
2019	All	7.01	19.02	19.50	13.54	7.76	4.88	3.43	2.34	1.78	1.40	19.36
2014	All	7.31	19.12	19.58	13.08	8.13	4.67	3.26	2.39	1.85	1.54	19.06
2011	All	7.08	18.65	18.87	13.01	8.63	5.22	3.55	2.44	1.91	1.39	19.24
MC
2030	MC	19.39	33.86	24.39	13.09	5.72	2.18	0.70	0.29	0.12	0.11	0.15
2025	MC	20.45	32.27	25.05	13.14	5.57	2.04	0.78	0.34	0.14	0.05	0.18
2020	MC	18.59	32.33	26.06	13.64	6.02	2.08	0.70	0.25	0.17	0.06	0.09
2019	MC	18.83	32.90	25.40	13.82	5.64	2.22	0.72	0.23	0.09	0.03	0.12
2014	MC	19.72	32.99	25.76	13.33	5.39	1.77	0.53	0.24	0.08	0.04	0.13
2011	MC	19.34	32.75	24.09	13.82	6.26	2.34	0.79	0.30	0.13	0.08	0.08

1.5 Save Tables

tbl_dat_all <- cuml_pnum_plot(sim, w_years = years_of_interest,
                              jst_dat = TRUE)

tbl_dat_race <- cuml_pnum_plot(sim, w_years = years_of_interest,
                               attr_var = "race", jst_dat = TRUE)

tbl_dat_region <- cuml_pnum_plot(sim, w_years = years_of_interest,
                                 attr_var = "region", jst_dat = TRUE)

tbl_dat_age.grp <- cuml_pnum_plot(sim, w_years = years_of_interest,
                                  attr_var = "age.grp", jst_dat = TRUE)

saveRDS(list("all" = tbl_dat_all, "race" = tbl_dat_race,
             "region" = tbl_dat_region, "age.grp" = tbl_dat_age.grp),
        "../../Data/EpiModelSims/cuml_pnum_counts.rds")

1.6 Session Info

sessionInfo()

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2     purrr_0.3.4    
## [5] readr_1.3.1     tidyr_1.1.2     tibble_3.0.4    ggplot2_3.3.2  
## [9] tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-148             fs_1.4.2                 lubridate_1.7.9         
##  [4] doParallel_1.0.15        RColorBrewer_1.1-2       httr_1.4.2              
##  [7] rprojroot_1.3-2          tools_4.0.2              backports_1.1.10        
## [10] R6_2.5.0                 DBI_1.1.0                lazyeval_0.2.2          
## [13] colorspace_1.4-1         withr_2.3.0              tidyselect_1.1.0        
## [16] compiler_4.0.2           ergm.multi_4.0-3965      cli_2.2.0               
## [19] rvest_0.3.6              gt_0.2.2                 xml2_1.3.2              
## [22] network_1.17.0-585       sass_0.2.0               scales_1.1.1            
## [25] checkmate_2.0.0          DEoptimR_1.0-8           robustbase_0.93-6       
## [28] digest_0.6.27            rmarkdown_2.4            networkDynamic_0.10.2   
## [31] pkgconfig_2.0.3          htmltools_0.5.0          egonet_0.0.1.0          
## [34] dbplyr_1.4.4             rlang_0.4.9              readxl_1.3.1            
## [37] rstudioapi_0.11          generics_0.1.0           jsonlite_1.7.1          
## [40] statnet.common_4.4.0-300 magrittr_2.0.1           Matrix_1.2-18           
## [43] Rcpp_1.0.5               munsell_0.5.0            fansi_0.4.1             
## [46] ape_5.4-1                lifecycle_0.2.0          stringi_1.5.3           
## [49] yaml_2.2.1               MASS_7.3-51.6            grid_4.0.2              
## [52] blob_1.2.1               parallel_4.0.2           crayon_1.3.4            
## [55] lattice_0.20-41          haven_2.3.1              splines_4.0.2           
## [58] hms_0.5.3                knitr_1.30               pillar_1.4.7            
## [61] EpiModel_2.0.3           codetools_0.2-16         lpSolve_5.6.15          
## [64] rle_0.9.2-223            srvyr_0.3.10             reprex_0.3.0            
## [67] glue_1.4.2               evaluate_0.14            trust_0.1-8             
## [70] mitools_2.4              modelr_0.1.8             deSolve_1.28            
## [73] vctrs_0.3.6              foreach_1.5.0            cellranger_1.1.0        
## [76] gtable_0.3.0             assertthat_0.2.1         tergmLite_2.2.0         
## [79] tergm_4.0.0-2013         xfun_0.18                broom_0.7.1             
## [82] survey_4.0               ergm_4.0-5824            coda_0.19-4             
## [85] survival_3.2-7           iterators_1.0.12         ellipsis_0.3.1          
## [88] here_0.1