EVEN 2909: Intro to Global Engineering and Resilience
Instructors: Evan Thomas, Taylor Sharpe
For the first part of this lab, you will attempt to answer a simple question: How would the burden of disease be different for the United States if household drinking water quality was similar to the water quality in Boulder Creek? To answer this question, you will need to break it down into several sub-questions, including:
What is the current burden of disease associated with drinking water illnesses in the United States?
How would the burden of disease change based on the quality of water in Boulder Creek?
How many people would be affected by the impacted drinking water quality?
How much water do these people drink on average each day?
How many bacteria would they be ingesting, and how many of them would get sick as a result of ingesting more pathogens?
What would be the total health impact, expressed in disability adjusted life years (DALYs)?
In other words, you will be conducting a quantitative microbial risk assessment (QMRA) to estimate the increased burden of disease associated with reduced drinking-water quality. We will rely heavily on these publications to guide this analysis:
What do we mean by “DALYs”? Watch these videos and make sure you understand the concept in terms of public health. After watching these videos, you should be able to verbally and mathematically explain what DALYs are.
Here is an overview video which explains what a DALY is with a specific example: https://www.youtube.com/watch?v=DPHDM7jSb2o
Here is a wider overview that talks about how DALYs are used to portray the full burden of disease associated with a particular disease, and how they are used to quantify the success of interventions: https://www.youtube.com/watch?v=Exce4gy7aOk
To determine the current burden of disease from unsafe water, you’ll want to check out the Global Burden of Disease (GBD) Visualization at the Institute of Health Metrics and Evaluation (IHME): https://vizhub.healthdata.org/gbd-compare/.
Select the third icon down on the far left-hand side (“Risks by Cause”), select level four on the slider, and select “United States of America” for location. Then select “Unsafe water, sanitation and hygiene” as the Category, and ensure”Overview level 1” is selected in the dropdown on the upper right side. Find “Unsafe water” in the rankings. What are the total number of DALYs, the percent of total DALYs, and the DALY rate (number of DALYs per 100,000)?
Now navigate to the arrow diagram showing the risks for each cause and make sure that level four is still selected on the slider and United States is selected for location. What are the total DALYs, the percent of total DALYs, the DALY rate, and the cause ranking for unsafe water in the United States? For ranking, change the Category back to “All Risk Factors” and look at the plot.
Now, do the same thing (determine the total DALYs, the percent of total DALYs, the DALY rate, and the cause ranking) for Sub-Saharan Africa in the most recent year available.
Enteric infections from unsafe water sources for United States in 2015:
Diarrheal diseases from unsafe water sources for Sub-Saharan Africa in 2015:
What was the population of the United States in the most recent year available?
See: http://ghdx.healthdata.org/geography/united-states-america
library(data.table)
population = 327978730
How much water do people drink on average per day? Should the amount of water consumed be averaged or simulated? What are the benefits of conducting a simulation versus a single estimate based on average values?
If you’re not familiar with generating random numbers in R, here’s an
example to simulate the approximate number of liters that a person could
drink each day in a year. The function sample() returns a
randomly-ordered version of its input. To see how this works, try
running the command, sample(c(1,2,3))
a few times in the
console.
The function called generate_water() is defined below, and can be called to return a value that is either 1, 2, or 5. The value will be selected randomly, to simulate an individual person consuming that many liters of water per day. The number provided inside the parentheses when the function is called will determine how many people we are representing.
To generate a data table that randomly samples the amount of water 100,000 people would drink in a year, we will use the replicate() function. Think for a moment about why we are choosing the number 100,000 to generate these datatables.
# function to randomly selected between 1, 2, and 5 liters of water per person per day
generate_water = function(n=1) {
sample(c(1,2,5),n,replace = TRUE)
}
#Generates one single year of water consumption
generate_water(365)
#Generates a 365 x 100,000 matrix
dt_water = data.table(replicate(365,generate_water(100000)))
How many organisms would people be ingesting on average per liter of water? What is the probability of infection, given some number of microbes ingested? What is the probability of illness, given infection?
Here are two different ways to randomly select the number of organisms present in a liter of water: one samples from a log-normal distribution and the other samples from an empirical (in other words, experimental) distribution based on water quality testing of Boulder Creek. How would someone justify using one approach or the other? How do the two data tables compare?
# generate number of organisms consumed empirically (based on log-normal distribution)
organisms_lnorm <- function(n = 1, mean = 1) {
rlnorm(n, meanlog = mean, sdlog = mean)
}
# generate number of organisms consumed empirically (based on E. coli water testing)
organisms_empirical <- function(n = 1) {
sample(c(14,75,9,18,14,22,8,14,6,12,26,6,1,2,8,4),n,replace = TRUE)
}
# now create two new data tables with the number of organism per liter for 100,000 people for both distributions
dt_lnorm <- data.table(replicate(365,organisms_lnorm(100000,1)))
dt_empirical <- data.table(replicate(365,organisms_empirical(100000)))
show(dt_lnorm); show(dt_empirical)
You now have two data table that represent the number of organisms 100,000 people would ingest in each liter of water for a year. Going forward, we will use the modeled normal distribution numbers rather than the empirical ones.
First, we’ll assume that the number of organisms in the water represent the number of campylobacter bacteria, which you can read more about here: https://www.who.int/news-room/fact-sheets/detail/campylobacter
Ingesting one or more organisms doesn’t necessarily mean someone will get sick - that is actually more of a probabilistic question. Therefore, we need to determine each person’s daily dose of organisms, their probability of infection, and their probability of illness. This is done using a probability function called a “Beta Poisson” function, which determines the odds of infection due to the dose. The shape of the function is determined by two parameters, called “alpha” and “n50,” which differ based on the disease in question.
Make sure you understand what these values mean (hint: read the Brown/Clasen article at the top of the lab). In the example below, we are looking at the health outcomes of exposure to Campylobacter ssp.
# number of microbes ingested per day
dose = dt_water * dt_lnorm
# probability density function for dose-response
beta_poisson = function(dose, alpha, n50) {
1 - ( 1 + (dose / n50) * ( 2^(1/alpha) - 1 ) )^(-alpha)
}
# generate probability of daily infection from campylobacter (beta-poisson function)
p_inf_daily <- beta_poisson(dose, 0.145, 896) #alpha = 0.145, n50 = 896
# generate probability of illness given infection (0.3 for campylobacter)
p_illness_inf = 0.3
p_illness <- p_inf_daily * p_illness_inf
You should now have a large matrix with the risk of daily infection due to ingesting different amounts of campylobacter bacteria over the course of one year for 100,000 individuals, and a matrix with the risk of daily illness.
Now we will use the disease burden associated with campylobacteriosis (try saying that five times really fast) to calculate the total number of disability adjusted life years associated with increased exposure to campylobacter bacteria. To do this, we have to make justifiable assumptions about the proportion of the population that is susceptible to infection and the disease burden associated with campylobacteriosis.
# percent of the population susceptible to infection (campylobacter)
susceptible = 1.00
# disease burden per case (campylobacter) - This is from TABLE 2.1
db_case = 4.6e-3
# generate DALYs per year
DALYs_year = p_illness * db_case * susceptible
# generate total DALYs per 100,000 people
DALYs_total <- sum(DALYs_year)
show(DALYs_total)
Recall that Campylobacter is one of four main pathogens associated with water-borne illness. We’re interested in a more complete picture, so we need to repeat this analysis for the other three primary drivers to finally answer our question.
You will need to modify the code above to re-run the simulation with parameters from the other three illnesses.
You should then have estimates of the total DALYs due to infections from campylobacter, rotavirus, shinga toxin-producing E. coli (STEC 0157), and cryptosporidium. Make notes of the total DALYs associated with each disease, and keep them for later use.
How does the newly calculated disease burden assuming water quality similar to Boulder Creek compare to the disease burden reflected in IHME? How does it compare with other causes of disease burden in the United States (https://vizhub.healthdata.org/gbd-compare/)? How about in Low SDI countries?
In the previous part of this lab you were able to simulate a quantitative microbial risk assessment to estimate the change in disease burden resulting from reduced water quality. In the second part of this lab we are asking a similar question: How does disease burden change with varying levels of air quality?
In the same way that water quality contributes to both premature death and morbidity, air quality has a large impact on global public health. According to the WHO, in 2019, 99% of the world’s population lived in place where air quality levels are unhealthy, and this is estimated to have caused 4.2 million premature deaths annually. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health
For this exercise, you will use the public-facing version of HAPIT, developed by Ajay Pillarisetti, located here: https://hapit.shinyapps.io/HAPIT/. We will examine the public health impact of replacing a Three-Stone Fire with either a Rocket Stove (high-efficiency cookstove) or an LPG stove, which is similar to many gas stoves in the US.
A very large contributor to DALYs associated with poor air quality is the technology used to cook food and boil water - specifically, the type of stove used. Here are some fine particulate matter concentrations from research:
Three-Stone Fire: Pre-Intervention Concentration (three-stone fire): \(386 \mu g/m^3\) using wood cooking fuel from Balakrishnan et al., 2013
Rocket Stove: Post-Intervention Concentration (rocket stove): \(216 \mu g/m^3\) using wood cooking fuel from HAPIT
LPG Stove: Post-Intervention Concentration (LPG stove): \(54 \mu g/m^3\) using LPG cooking fuel from HAPIT
You will use the Household Air Pollution Intervention Tool (HAPIT) to compare the relative impact of different technologies based on a variety of user-determined parameters. In particular, you will attempt to determine the most effective intervention in terms of DALYs averted, and think also about the cost-effectiveness of an intervention. You will use the following inputs:
For this exercise, you will use the public-facing version of HAPIT, developed by Ajay Pillarisetti, located here: https://hapit.shinyapps.io/HAPIT/.
It’s important to note that one of the reasons DALYs are a useful tool is that they can be used to more directly compare not just health impacts, but the cost-effectiveness of a potential intervention: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339959/
“In recent years, the most common approach has involved the use of thresholds based on per capita gross domestic product (GDP). Under this approach – which has been promoted by the World Health Organization’s Choosing Interventions that are Cost–Effective (WHO-CHOICE) project – an intervention that, per disability-adjusted life-year (DALY) avoided, costs less than three times the national annual GDP per capita is considered cost–effective, whereas one that costs less than once the national annual GDP per capita is considered highly cost–effective.”
Often, decision-making in a public health or global engineering context will come down to exactly this: based on the number of DALYs averted by a particular intervention, and the COST of that intervention, how should money be spent in order to minimize the burden of morbidity and mortality?