This document details the current procedure used to calculate the number of partners each node had during the previous year across all of the years of the simulation. The code used to make these calculations is found in two places.
Inside of EpiModel, there are three sources of information used to calculate the cumulative number of partners.
Every relationship that is formed during the Epimodel simulation is stored inside of a matrix which has the uid(universal id) of both the nodes, the relationship type, and the start and end time of each relationship.
At the end of each year in the Epimodel simulation, the cumulative number of partners over the past year for each node is calculated. For each nodeid / partner type that is listed in the matrix above, the total number of partnerships with the node of the specified type is calculated. This is stored inside of dat\(temp\)cuml_pnums.
The second source of information is the expanded attribute list. The usual attribute list contains information on all nodes that are currently inside of the network population. The attribute information on the individuals who exited the population is stored in a second list. The expanded attribute list is the combination of these two lists which gives the attribute information on all nodes that were ever in the network population.
The last source of information is a dataframe that has the total number of individuals in each age/race/region group at each time step.
To calculate the cumulative number of partners during a given year, the following steps are completed:
The dataset with the cumulative number of partners (cuml_pnum) of the given year is selected. However at this point, all that is known is the uids (not the attributes of interest).
Next the three attributes (age, race, and region) are attached to these uids.
After this, for each attribute separately, each row in cuml_pnum dataset is sorted by attribute of the node, partnership type, and cumulative number of partners. The total number of observations in each attribute type, partnership type, and cumulative number of partners is tabulated.
The above tabulation will not include zero counts because only ties are counted (not non-ties). To calculate the number of zeros, another calculation is made. Here, we compare the total number of individuals with a cumulative number of partners greater than zero to the overall total (for a given attribute subgroup). The difference between these two values is the number of individuals with a cumulative number of partners of zero.
When the attribute is age group, a back calculation is done to find the age of each node during the year of interest (rather than their age the year they left the network or the simulation ended).
suppressPackageStartupMessages(library(tidyverse))
options(dplyr.summarise.inform = FALSE)
cuml_pnum_plot <- function(dat, attr_var = NULL,
exl_zeros = FALSE,
w_years = NULL,
jst_dat = FALSE){
plist <- dat$temp[[1]]$cuml_pnums
plist$uid <- plist$id
## Calculate Year
plist$year <- round((plist$at - dat$control$start) /
52 + dat$control$year_start)
if (!is.null(w_years)) {
plist <- plist %>% filter(year %in% w_years)
}
combined <- plist %>%
mutate(mc = as.integer(ptype < 3),
mc_tot = mc * tot) %>%
group_by(uid, at, year) %>%
summarise(All = pmin(10, sum(tot)),
MC = pmin(10, sum(mc_tot))) %>%
pivot_longer(cols = c(All, MC), names_to = "ptype",
values_to = "tot") %>%
mutate(id = uid)
combined$ptype <- dplyr::recode(combined$ptype, "All" = 4,
"MC" = 5)
plist <- bind_rows(plist, combined)
all_attr_df <- rbind(
cbind(do.call(data.frame, dat$attr[[1]]),
"rem_at" = dat$control$nsteps) ,
dat$temp[[1]]$rem_atrs
)
edge_w_attr <- left_join(plist, all_attr_df, by = "uid")
## Fix age so it is age at given year (not at the end of the simulation)
edge_w_attr$age <- edge_w_attr$age +
(edge_w_attr$at - edge_w_attr$rem_at) / 52 #Time between measure (at)
# and when age saved (rem_at)
age.breaks <- sim$param$netstats$demog$age.breaks
edge_w_attr$age.grp <- cut(edge_w_attr$age, age.breaks,
labels = FALSE, include.lowest = TRUE)
demog_inf <- dat$epi[[1]]$demog_edge_counts %>%
ungroup() %>%
filter(ptype == 0)
demog_inf$year <- round((demog_inf$at - dat$control$start) /
52 + dat$control$year_start)
if (!is.null(w_years)) {
demog_inf <- demog_inf %>% filter(year %in% w_years)
}
if (!is.null(attr_var)) {
edge_w_attr$spl_var <- edge_w_attr[, attr_var, drop = TRUE]
demog_inf$spl_var <- demog_inf[, attr_var, drop = TRUE]
}else{
edge_w_attr$spl_var <- "Combined"
demog_inf$spl_var <- "Combined"
}
max_id <- max(edge_w_attr$id)
all_yrs <- unique(edge_w_attr$year)
edge_w_attr <- edge_w_attr %>% select(year, spl_var, at, age, ptype, id, tot) %>%
filter(age < 66)
## Adding zero counts for each network
for (at_idx in 1:length(all_yrs)) {
this_yr <- all_yrs[at_idx]
num_obs <- demog_inf %>% filter(year == this_yr, age < 66) %>%
group_by(spl_var) %>% summarize(nodes = sum(count)) %>%
ungroup()
ex_ties <- edge_w_attr %>% filter(year == this_yr, age < 66)
for (net_type in 1:length(unique(ex_ties$ptype))) {
zeros <- ex_ties %>% filter(ptype == net_type) %>%
group_by(spl_var) %>% count() %>% ungroup()
num_ad <- left_join(num_obs, zeros, by = "spl_var") %>%
mutate(rows_to_add = pmax(0, nodes - n),
tot = 0, year = this_yr,
at = min(ex_ties$at), age = NA_real_,
ptype = net_type) %>%
tidyr::uncount(weights = rows_to_add, .id = "id") %>%
select(-nodes, -n)
num_ad$id <- max_id + num_ad$id
max_id <- max_id + nrow(num_ad)
edge_w_attr <- rbind(edge_w_attr, num_ad)
}
}
uni_year <- unique(edge_w_attr$year)
edge_w_attr$year <- factor(x = edge_w_attr$year,
levels = sort(uni_year, decreasing = TRUE),
labels = sort(uni_year, decreasing = TRUE))
bin_vals <- 0:max(edge_w_attr$tot)
bin_labs <- c(0:(max(edge_w_attr$tot) - 1),
paste0(max(edge_w_attr$tot), "+"))
counts <- edge_w_attr %>% group_by(ptype, tot, at, spl_var, year) %>%
count() %>% ungroup() %>% group_by(spl_var, year, ptype) %>%
mutate(Percentage = n / sum(n)) %>% ungroup()
all_vals <- counts %>% expand(spl_var, year, ptype, tot)
counts <- left_join(all_vals, counts) %>%
mutate(ptype = c("Main", "Casl", "Inst", "All", "MC")[ptype])
counts$Percentage[is.na(counts$Percentage)] <- 0
if (exl_zeros) {counts <- counts %>% filter(tot > 0)}
if (jst_dat) {
return(counts)
}else{
ggplot(counts, aes(x = tot, y = Percentage,
fill = as.factor(ptype))) +
geom_col(position = "dodge", alpha = 1) +
facet_grid(year ~ spl_var) +
scale_x_continuous(
breaks = bin_vals,
labels = bin_labs
) +
xlab("Cumulative Partners in the Last Year") +
ylab("Percent of Individuals") +
theme(legend.position = "bottom")
}
}
tbl_dat <- cuml_pnum_plot(sim, w_years = years_of_interest, jst_dat = TRUE)
expnd_dat <- tbl_dat %>% select(-n, -at) %>% mutate(Percentage = round(Percentage * 100, 2)) %>% pivot_wider(names_from = tot, values_from = Percentage)
expnd_dat %>%
select(-spl_var) %>%
group_by("Network" = ptype) %>%
gt::gt(rowname_col = "year") %>%
gt::tab_spanner(
label = "Cumulative Number of Partners",
columns = 3:13
)
| ptype | Cumulative Number of Partners | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
| Main | ||||||||||||
| 2030 | Main | 66.13 | 30.44 | 2.84 | 0.50 | 0.07 | 0.02 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2025 | Main | 67.90 | 29.01 | 2.50 | 0.47 | 0.09 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2020 | Main | 66.13 | 30.48 | 2.77 | 0.53 | 0.08 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2019 | Main | 66.80 | 29.84 | 2.83 | 0.44 | 0.08 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2014 | Main | 67.16 | 29.71 | 2.57 | 0.48 | 0.08 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2011 | Main | 65.72 | 30.63 | 2.94 | 0.60 | 0.11 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| Casl | ||||||||||||
| 2030 | Casl | 33.54 | 31.35 | 19.61 | 9.54 | 3.68 | 1.35 | 0.44 | 0.16 | 0.09 | 0.10 | 0.14 |
| 2025 | Casl | 33.10 | 30.99 | 20.57 | 9.36 | 3.64 | 1.35 | 0.41 | 0.26 | 0.11 | 0.05 | 0.16 |
| 2020 | Casl | 31.29 | 32.03 | 21.16 | 9.66 | 3.82 | 1.15 | 0.44 | 0.17 | 0.15 | 0.05 | 0.08 |
| 2019 | Casl | 31.70 | 32.06 | 20.52 | 9.97 | 3.65 | 1.27 | 0.46 | 0.15 | 0.08 | 0.03 | 0.11 |
| 2014 | Casl | 32.91 | 31.86 | 20.56 | 9.54 | 3.39 | 0.99 | 0.40 | 0.13 | 0.08 | 0.03 | 0.13 |
| 2011 | Casl | 33.06 | 30.83 | 19.85 | 9.76 | 4.15 | 1.50 | 0.44 | 0.16 | 0.10 | 0.06 | 0.08 |
| Inst | ||||||||||||
| 2030 | Inst | 43.23 | 17.69 | 6.61 | 3.84 | 2.62 | 2.29 | 1.70 | 1.48 | 1.22 | 1.17 | 18.15 |
| 2025 | Inst | 43.78 | 17.55 | 7.18 | 3.66 | 2.83 | 1.94 | 1.62 | 1.60 | 1.17 | 1.12 | 17.55 |
| 2020 | Inst | 45.48 | 17.05 | 6.31 | 3.74 | 2.75 | 2.16 | 1.89 | 1.57 | 1.01 | 1.05 | 16.98 |
| 2019 | Inst | 45.51 | 17.05 | 6.43 | 3.45 | 2.87 | 2.20 | 1.65 | 1.39 | 1.30 | 1.11 | 17.04 |
| 2014 | Inst | 45.63 | 17.19 | 5.97 | 3.33 | 2.74 | 2.42 | 1.98 | 1.51 | 1.21 | 1.00 | 17.02 |
| 2011 | Inst | 44.96 | 16.78 | 6.89 | 3.56 | 2.91 | 2.31 | 1.89 | 1.39 | 1.12 | 1.16 | 17.03 |
| All | ||||||||||||
| 2030 | All | 6.46 | 19.40 | 18.70 | 13.04 | 8.04 | 5.11 | 3.17 | 2.44 | 1.67 | 1.62 | 20.36 |
| 2025 | All | 7.07 | 18.85 | 18.93 | 13.43 | 8.10 | 4.92 | 3.30 | 2.33 | 1.92 | 1.50 | 19.65 |
| 2020 | All | 6.95 | 19.06 | 18.97 | 13.43 | 8.42 | 4.98 | 3.26 | 2.53 | 1.87 | 1.45 | 19.08 |
| 2019 | All | 7.01 | 19.02 | 19.50 | 13.54 | 7.76 | 4.88 | 3.43 | 2.34 | 1.78 | 1.40 | 19.36 |
| 2014 | All | 7.31 | 19.12 | 19.58 | 13.08 | 8.13 | 4.67 | 3.26 | 2.39 | 1.85 | 1.54 | 19.06 |
| 2011 | All | 7.08 | 18.65 | 18.87 | 13.01 | 8.63 | 5.22 | 3.55 | 2.44 | 1.91 | 1.39 | 19.24 |
| MC | ||||||||||||
| 2030 | MC | 19.39 | 33.86 | 24.39 | 13.09 | 5.72 | 2.18 | 0.70 | 0.29 | 0.12 | 0.11 | 0.15 |
| 2025 | MC | 20.45 | 32.27 | 25.05 | 13.14 | 5.57 | 2.04 | 0.78 | 0.34 | 0.14 | 0.05 | 0.18 |
| 2020 | MC | 18.59 | 32.33 | 26.06 | 13.64 | 6.02 | 2.08 | 0.70 | 0.25 | 0.17 | 0.06 | 0.09 |
| 2019 | MC | 18.83 | 32.90 | 25.40 | 13.82 | 5.64 | 2.22 | 0.72 | 0.23 | 0.09 | 0.03 | 0.12 |
| 2014 | MC | 19.72 | 32.99 | 25.76 | 13.33 | 5.39 | 1.77 | 0.53 | 0.24 | 0.08 | 0.04 | 0.13 |
| 2011 | MC | 19.34 | 32.75 | 24.09 | 13.82 | 6.26 | 2.34 | 0.79 | 0.30 | 0.13 | 0.08 | 0.08 |
tbl_dat_all <- cuml_pnum_plot(sim, w_years = years_of_interest,
jst_dat = TRUE)
tbl_dat_race <- cuml_pnum_plot(sim, w_years = years_of_interest,
attr_var = "race", jst_dat = TRUE)
tbl_dat_region <- cuml_pnum_plot(sim, w_years = years_of_interest,
attr_var = "region", jst_dat = TRUE)
tbl_dat_age.grp <- cuml_pnum_plot(sim, w_years = years_of_interest,
attr_var = "age.grp", jst_dat = TRUE)
saveRDS(list("all" = tbl_dat_all, "race" = tbl_dat_race,
"region" = tbl_dat_region, "age.grp" = tbl_dat_age.grp),
"../../Data/EpiModelSims/cuml_pnum_counts.rds")
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
## [5] readr_1.3.1 tidyr_1.1.2 tibble_3.0.4 ggplot2_3.3.2
## [9] tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-148 fs_1.4.2 lubridate_1.7.9
## [4] doParallel_1.0.15 RColorBrewer_1.1-2 httr_1.4.2
## [7] rprojroot_1.3-2 tools_4.0.2 backports_1.1.10
## [10] R6_2.5.0 DBI_1.1.0 lazyeval_0.2.2
## [13] colorspace_1.4-1 withr_2.3.0 tidyselect_1.1.0
## [16] compiler_4.0.2 ergm.multi_4.0-3965 cli_2.2.0
## [19] rvest_0.3.6 gt_0.2.2 xml2_1.3.2
## [22] network_1.17.0-585 sass_0.2.0 scales_1.1.1
## [25] checkmate_2.0.0 DEoptimR_1.0-8 robustbase_0.93-6
## [28] digest_0.6.27 rmarkdown_2.4 networkDynamic_0.10.2
## [31] pkgconfig_2.0.3 htmltools_0.5.0 egonet_0.0.1.0
## [34] dbplyr_1.4.4 rlang_0.4.9 readxl_1.3.1
## [37] rstudioapi_0.11 generics_0.1.0 jsonlite_1.7.1
## [40] statnet.common_4.4.0-300 magrittr_2.0.1 Matrix_1.2-18
## [43] Rcpp_1.0.5 munsell_0.5.0 fansi_0.4.1
## [46] ape_5.4-1 lifecycle_0.2.0 stringi_1.5.3
## [49] yaml_2.2.1 MASS_7.3-51.6 grid_4.0.2
## [52] blob_1.2.1 parallel_4.0.2 crayon_1.3.4
## [55] lattice_0.20-41 haven_2.3.1 splines_4.0.2
## [58] hms_0.5.3 knitr_1.30 pillar_1.4.7
## [61] EpiModel_2.0.3 codetools_0.2-16 lpSolve_5.6.15
## [64] rle_0.9.2-223 srvyr_0.3.10 reprex_0.3.0
## [67] glue_1.4.2 evaluate_0.14 trust_0.1-8
## [70] mitools_2.4 modelr_0.1.8 deSolve_1.28
## [73] vctrs_0.3.6 foreach_1.5.0 cellranger_1.1.0
## [76] gtable_0.3.0 assertthat_0.2.1 tergmLite_2.2.0
## [79] tergm_4.0.0-2013 xfun_0.18 broom_0.7.1
## [82] survey_4.0 ergm_4.0-5824 coda_0.19-4
## [85] survival_3.2-7 iterators_1.0.12 ellipsis_0.3.1
## [88] here_0.1