In the realm of epidemiology and public health, understanding social mixing patterns is critical for developing effective strategies to control the spread of infectious diseases. Social mixing matrices provide a structured framework for quantifying interactions between different demographic groups within a population, enabling the simulation of disease transmission dynamics. This document focuses on constructing a social mixing matrix for Vietnam, with a specific emphasis on interactions between different age groups. Given the limited availability of detailed data, our efforts concentrate on elucidating the patterns of social mixing between various age cohorts within the Vietnamese population. By delineating these interactions, we aim to provide valuable insights into the dynamics of disease transmission and inform the design of targeted public health interventions in Vietnam.
The study conducted a diary-based survey in North Vietnam (2007) to explore social contact patterns, crucial for infectious disease spread modeling. It utilized generalized estimating equations, accounting for household sampling and demographic weighting towards the Vietnamese population. Key findings include assortative mixing by age and differences in contact intensity within household and community settings compared to European data. The study emphasizes the importance of context-specific data for accurate infectious disease modeling, particularly in developing countries. For a detailed understanding of the methods and results, please refer to the full article here.
This work will use socialmixr package and conmat.
Firstly, load the package socialmixr and required packges
Then load the data from the social contact matrix survey of Horby et. al (2011), available on Zenodo.
The survey object returns a dictionary with contacts and participants datatable
summary(vietnam$participants)
part_id hh_id part_age part_gender part_occupation_detail country hh_size sday_id
Length:865 Length:865 Min. : 0.00 Length:865 Length:865 Vietnam:865 Min. :1.000 Mode:logical
Class :character Class :character 1st Qu.:14.00 Class :character Class :character 1st Qu.:3.000 NA's:865
Mode :character Mode :character Median :29.00 Mode :character Mode :character Median :4.000
Mean :30.52 Mean :3.918
3rd Qu.:45.00 3rd Qu.:5.000
Max. :90.00 Max. :9.000
day month year dayofweek
Mode:logical Mode:logical Min. :2007 Min. :0.000
NA's:865 NA's:865 1st Qu.:2007 1st Qu.:3.000
Median :2007 Median :3.000
Mean :2007 Mean :3.451
3rd Qu.:2007 3rd Qu.:4.000
Max. :2007 Max. :6.000
NA's :6
summary(vietnam$contacts)
part_id cont_id cnt_age_exact cnt_age_est_min cnt_age_est_max cnt_gender cnt_home cnt_work
Length:6675 Length:6675 Mode:logical Min. : 0.00 Min. : 5.0 Length:6675 Mode :logical Mode :logical
Class :character Class :character NA's:6675 1st Qu.: 6.00 1st Qu.: 15.0 Class :character FALSE:2317 FALSE:6009
Mode :character Mode :character Median :26.00 Median : 34.0 Mode :character TRUE :4358 TRUE :666
Mean :25.95 Mean : 37.7
3rd Qu.:35.00 3rd Qu.: 49.0
Max. :65.00 Max. :100.0
cnt_school cnt_transport cnt_leisure cnt_otherplace frequency_multi phys_contact duration_multi
Mode :logical Mode :logical Mode :logical Mode :logical Min. :1.000 Min. :1.000 Min. :1.000
FALSE:5827 FALSE:6435 FALSE:6564 FALSE:6366 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:3.000
TRUE :848 TRUE :240 TRUE :111 TRUE :309 Median :1.000 Median :2.000 Median :5.000
Mean :1.217 Mean :1.678 Mean :3.766
3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.:5.000
Max. :5.000 Max. :2.000 Max. :5.000
NA's :139 NA's :652 NA's :103
The contact_matrix function in R is
used to construct contact matrices from survey data. Specifically, it
allows customization of age groups, weighting by age, and applying
various filters to refine the analysis. Input is survey data from
Vietnam with age limits of (0,5,15,35,50,70) and weighting by age
enabled (weigh.age = TRUE), taking into account the age distribution of
the population and computes the mean of contacts per day for each age
group. For detailed usage and argument explanations, please refer to the
documentation.
mr <- vncm$matrix
mr
contact.age.group
[0,5) [5,15) [15,35) [35,50) [50,70) 70+
[1,] 1.0081382 1.0345059 2.679943 1.015341 0.4792535 0.06372883
[2,] 0.4333715 3.6475693 1.559493 1.266675 0.3174873 0.17016764
[3,] 0.5748874 0.7985706 3.030877 1.438238 0.7411519 0.19228457
[4,] 0.4100289 1.2210695 2.707548 2.775660 1.2706702 0.46192042
[5,] 0.3516335 0.5560628 2.534982 2.308633 2.5107631 0.79998885
[6,] 0.1091823 0.6959308 1.535690 1.959661 1.8679943 1.20843314
matrix_plot(mr, main = "Contact matrix")
Then we use bootstrapfor 1000 times:
mb <- Reduce("+", lapply(m["matrix", ], function(x) x / ncol(m)))
matrix_plot(mb, main = "Contact matrix using 1000-time bootstrap")
Comparing two matrices:
dm <- mr - mb
matrix_plot(dm, main = "Difference between two contact matrices")
#| warning: false
popM1= read_csv("popM.csv")
Rows: 5229 Columns: 18── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): name, age
dbl (16): country_code, 1950, 1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2020
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
popF1= read_csv("popF.csv")
Rows: 5229 Columns: 18── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): name, age
dbl (16): country_code, 1950, 1955, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2020
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
vnpopM = popM1 %>% filter(name=='Viet Nam')
vnpopF = popF1 %>% filter(name=='Viet Nam')
combine_and_transform <- function(dfM, dfF) {
combined_df <- bind_rows(dfM, dfF) %>%
mutate(lower.age.limit = case_when(
age == "100+" ~ as.integer(100), # Handle the "100+" case explicitly as integer
TRUE ~ as.integer(as.numeric(sub("^(\\d+)-.*", "\\1", age))) # Convert other cases to integer
),
country = "Viet Nam", # Add the country column
year = 2020, # Add the year column
population = `2020`) %>% # Use the "2020" column for population
# Group by the new `lower.age.limit` (now an integer) and sum the populations
group_by(lower.age.limit) %>%
summarise(population = sum(population, na.rm = TRUE),
country = first(country),
year = first(year)) %>%
ungroup() %>%
select(country, lower.age.limit, year, population) # Arrange the columns as desired
return(combined_df)
}
# Assuming `vnpopM` and `vnpopF` are your male and female dataframes
final_dataset <- combine_and_transform(vnpopM, vnpopF)
# Display the transformed dataset
final_dataset$population = final_dataset$population * 1000
polymod_survey_data <- get_polymod_population()
polymod_contact_data <- get_polymod_contact_data()
contact_model <- fit_single_contact_model(
contact_data <- polymod_contact_data,
population <- polymod_survey_data
)
#vnp <-wpp_age("Vietnam", 2020)
vn_2020_pop <- as_conmat_population(
data = final_dataset,
age = lower.age.limit,
population = population
)
synthetic_contact_matrix = predict_contacts(
model = contact_model,
population = vn_2020_pop,
age_breaks = c(0,5,15,35,50,70, Inf)
)
df_wide <- synthetic_contact_matrix %>%
pivot_wider(names_from = age_group_to, values_from = contacts)
# Convert to matrix, excluding the first column which contains the row names
contact_matrix <- as.matrix(df_wide[, -1])
# Set the row names of the matrix to be the first column of df_wide
row.names(contact_matrix) <- df_wide$age_group_from
matrix_plot(contact_matrix, main = "Contact matrix from conmat")