Introduction

In 2015, Flint, Michigan experienced a public water health crisis in
which the city’s drinking water became contaminated with lead, exposing
thousands of residents to toxic conditions.The crisis began in 2014 when
Flint switched it’s water supply from Lake Huron to the Flint River as a
means to cut costs, saving estimated 5 million USD over two years. To
swiftly switch the supplies, Officials skipped the corrosion control
treatment step. With the naturally high chloride levels within Flint
river, this corrodes the pipes causing lead to be exposed into the
water.
Lead is used across the ageing infrastructure across the United
States. Water systems add corrosion control chemicals to prevent pipe
corrosion and limit contamination. Although there is no completely safe
amount of lead consumption, the regulation limit allowed by the lead and
copper rule (LCR) is 15ppb (parts per billion). If more than 10% of
residential exceeds this limit, action is required.
Despite these enforced rules, Michigan Department of Environmental
Quality (MDEQ) initially reported that Flint’s drinking water met the
safety standards. However, Investigators and resident sampling reveled
significantly higher lead levels. In some cases lead concentrations were
6 times higher than the 15 ppb limit, in other cases, samples were 25
times higher than the limit. Further investigation showed that the
official MDEQ testing procedures underestimated contamination due to
flawed sampling methods, including the exclusion of high value samples
from their analysis.
Lead is a toxic heavy metal and can be extremely harmful towards
human health. It particularly towards children, as exposure can result
in irreversible neurological damage which include, learning
disabilities, behavioral problems, reduced IQ and slowed growth. In
adults, exposure to lead has been linked with kidney damage,
hypertension and cardiovascular disease. As lead accumulates within the
body overtime, it can lead to long term biological consequences. During
the Flint crisis, studies showed that the percentage of children with
elevated blood leads nearly double after the water supply switch,
showing the direct biological impact of environmental contamination.
Understanding how lead concentration across residential households
and conditions is crucial in identifying the different high risk areas
and in improving public health responses.This study specifically
compares lead concentration measurements collected by Michigan
Department of Environmental Quality (MPEQ) and the Virginia Tech (VT)
research team. Two sampling programs that have produced different
estimated contamination levels, which raises a crucial questions
regarding the sampling methodology and data validity.
Lead concentration in drinking water samples, measured in parts per
billion (ppb). This variable represents the level of contamination
present in each sample and is an indicator of the potential health risk.
The greater the value is corresponding to the greater the exposure to
lead and the increased danger to residents
Explanatory variables of this dataset includes factors that may
influence the lead concentration levels. These includes the sampling
location, sampling conditions and the testing period(before or after
interventions). The infrastructure of the water system can also
influence the lead concentration levels as the presence of lead pipes
and the difference in water chemistry may contribute towards differences
in lead levels.
Main Research Question:
Is there a significant difference in lead concentration samples
gathered in Virginia Tech and Michigan Department of Environmental
Quality (MDEQ)
Null Hypothesis = There is no difference in lead concentration
between matched MDEQ and Virginia Tech samples (mean paired difference =
0)
Alternative Hypothesis = There is a significant difference in lead
concentration between matched MDEQ and Virginia Tech samples (mean
paired difference is not 0)
Exploring the Data (Descriptive Statistics)
flint_mdeq %>% summarise(
n = sum(!is.na(lead2)),
mean = mean(lead2, na.rm = TRUE),
median = median(lead2, na.rm = TRUE),
sd = sd(lead2, na.rm = TRUE),
se = sd(lead2, na.rm = TRUE) / sqrt(n)
)
## # A tibble: 1 × 5
## n mean median sd se
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 69 5.72 3 8.34 1.00
flint_vt %>% summarize(
n = sum(!is.na(lead)),
mean = mean(lead, na.rm = TRUE),
median = median(lead, na.rm = TRUE),
sd = sd(lead, na.rm = TRUE),
se = sd(lead, na.rm = TRUE) / sqrt(n)
)
## # A tibble: 1 × 5
## n mean median sd se
## <int> <dbl> <dbl> <dbl> <dbl>
## 1 271 10.6 3.52 21.6 1.31
The descriptive statistics were used to calculate the summarized lead
concentration in MDEQ (specifically the lead2 dataset) and Virginia Tech
datasets. The MDEQ dataset contained 69 observation with a mean lead
level of 5.72 ppb and a median of 3.00 ppb whereas the Virginia Tech
dataset includes 271 observations and shows a higher mean lead level of
10.60 ppb with a median of 3.52 ppb. While the median values are
somewhat similar, the difference in mean values suggests that the higher
lead measurements occur more frequently in Virginia Tech Data.
in both datasets the mean exceeds the median, indicating right skewed
distributions. This amount of skewness is more evidence in Virginia Tech
data, which also has a larger standard deviation (21.60 vs 8.34)
reflecting the larger variability and more extreme values
#histogram
hist1 <- ggplot(flint_mdeq, aes(x = lead2)) + geom_histogram() +
labs(title = "Distribution of Lead levels (MDEQ)", x = "Lead concentration (ppb)", y = "Frequency")
hist2 <- ggplot(flint_vt, aes(x = lead)) + geom_histogram() +
labs(title = "Distribution of Lead levels (VT)", x = "Lead concentration (ppb)", y = "Frequency")
grid.arrange(hist1, hist2, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

These plots show a strong right skewed and the presence of extreme
outliers particularly in the Virginia Tech dataset, suggesting that a
transformation may be necessary for further analysis
Both histograms show a strong right skew, with most of the samples
clustered at at low lead concentrations and a long tail of higher
values. The VT histogram extends to approximately 150 ppb, highlight the
presence of extreme outliers. These distribution properties confirms
that both dataset violates normality distributed in it’s raw form.
log1 <- ggplot(flint_mdeq, aes( x = log(lead2))) + geom_histogram() +
labs(title = "Distribution of Log Lead levels (MDEQ)", x = "Lead (ppb)", y = "Frequency")
log2 <-ggplot(flint_vt, aes( x = log(lead))) + geom_histogram()+
labs(title = "Distribution of Log Lead levels (VT)", x = "Lead (ppb)", y = "Frequency")
grid.arrange(log1, log2, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 15 rows containing non-finite outside the scale range
## (`stat_bin()`).
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

After applying a log transformation, the Virginia Tech distribution
becomes more symmetric and roughly resembles a bell curve which suggests
that the raw VT data follows a roughly log normal pattern. The MDEQ log
histogram remains irregular likely due to a smaller sample size (n = 69)
opposed to a different underlying distribution.
box1 <- ggplot(flint_mdeq, aes(y = lead2)) + geom_boxplot() +
labs(title = "MDEQ Adjusted Lead Levels", y = "Lead Concentration (ppb)", x = "Sample")
box2 <-ggplot(flint_vt, aes(y = lead)) + geom_boxplot() +
labs(title = "Virginia Tech Adjusted Lead Levels", y = "Lead Concentration (ppb)", x = "Sample")
grid.arrange(box1, box2, ncol = 2)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

The raw boxplot confirms the presence of substantial right skew and
extreme outliers in both datasets, particularly within the Virginia Tech
data. The log scale boxplot reveals that the interquartile ranges of the
datasets are broadly comparable. The VT data shows a slightly higher
median. The Log(Lead + 1) transformation is use here opposed to the
log() to handle any zero values within the data
box3 <- ggplot(flint_mdeq, aes(y = log(lead2 + 1))) + geom_boxplot() +
labs(title = "MDEQ Adjusted (log) Lead Levels", y = "Log of Lead Concentrations", x = "Samples")
box4 <-ggplot(flint_vt, aes(y = log(lead + 1))) + geom_boxplot() +
labs(title = "Virginia Tech Adjusted (log) Lead Levels", y = "Log of Lead Concentrations", x = "Samples")
grid.arrange(box3, box4, ncol = 2)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

The log-scale boxplot allows clearer comparison of the interquartile
ranges between the two datasets. Even after transformation, both
distributions remain skewed with notable outliers present in the
Virginia Tech data which further supporting the decision to use a
nonparametric test as the primary analysis.
Statistical Tests (Inferential Statistics)
Because the research question compares MDEQ and Virginia Tech
measurements at the same sampling location, a pair analysis is
appropriate. The datasets were joined through a sample ID producing a
matched pair
combined <- flint_mdeq %>%
select(sample, lead2) %>%
rename(mdeq = lead2) %>%
inner_join(flint_vt, by = "sample") %>%
rename(vt = lead)
For a paired t-test to be valid, the differences between the paired
and observed data must be approximately normally distributed. The
Shapiro-Wilk test was used to the paired differences (MDEQ - VT) to
determine this assumption
diff <- combined$mdeq - combined$vt
shapiro.test(diff)
##
## Shapiro-Wilk normality test
##
## data: diff
## W = 0.6365, p-value = 8.623e-12
The Shapiro-Wilk test on the paired differences produced W = 0.6364,
P = 8.623e-12. This rejects the assumption of normality. The normality
assumption required for a paired t-test is therefore violated.
The normality assumption was violated and the distribution of paired
differences showed a strong right skew, a nonparametric Wilcoxon test
was selected as the primary test. The Wilcoxon test does not assume
normality therefore appropriate for the skewed paired data
wilcox.test(combined$mdeq, combined$vt, paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: combined$mdeq and combined$vt
## V = 2184, p-value = 5.365e-09
## alternative hypothesis: true location shift is not equal to 0
The Wilcoxon test produced V = 2184, p = 5.365e-9, which is far below
the significance threshold of a = 0.05, thus rejecting the null
hypothesis. This concludes that there is a significant difference in
tlead concentration between the matched MDEQ and Virginia Tech
samples
Despite normality being violated, a paired t-test was conducted as a
robustness check, as it is relatively robust to moderate deviations from
normality given the sample size (n = 69 paired observations) thus
allowing the comparisons between parametric and nonparametric tests
(confirmed this)
t.test(combined$mdeq, combined$vt, paired = TRUE)
##
## Paired t-test
##
## data: combined$mdeq and combined$vt
## t = 4.7127, df = 68, p-value = 1.253e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 2.740594 6.765899
## sample estimates:
## mean difference
## 4.753246
The paired t-test produced t(68) = 4.6127, p = 1.253e-05 with a 95%
confidence interval for the mean difference of (2.740529 , 6.765899)
ppb. The estimated mean paired difference (MDEQ - VT) is 4.75 ppb, which
indicates that within this matched subset, MDEQ values that are on
average higher than the VT values. This was supported between both tests
strengthening the confidence of the conclusion
Discussion
The Wilcoxon (V = 2184, p = 5.365e-9) and the paired t-test (t(68) =
4.713, p = 1.253e-5) both providing evidence to reject the null
hypothesis confirming a statistically significant difference in lead
concentration between MDEQ and Virginia Tech samples. the mean paired
differences from the t-test is 4.753246 ppb (MDEQ - VT), indicating that
within the subset of 69 sample locations, MDEQ values are higher on
average than the corresponding VT values.
These results are important both biologically and environmentally.
The variation in lead concentration across sampling programs may reflect
real difference in which households were tested, how the samples were
collected and which values were included in the official reports. Lead
exposure is particularly harmful to children, with documented damages
associated with irreversible neurological damage including reduced IQ,
learning disabilities and behavior problems. A proportion of children
residing in Flint, Michigan have been documented to have elevated blood
lead levels nearly doubled after the water supply switch. This
demonstrates the direct biological consequences of under-reporting
contamination and any underestimation of lead levels causing serious
public health issues.
The consistency between the parametric t-test and the nonparametric
Wilcoxon test strengthens the reliability of the conclusion. Despite the
assumption of normality being violated, the moderate sample size (69
pairs) and agreement between both tests suggest that the observed
difference is robust.
Previous research on the Flint water crisis has reported that
Virginia Tech measurements identified higher and more concerning lead
levels compared to official MDEQ data, a difference that is linked
variations within sampling method and the exclusion of high value
samples from the MDEQ reports. Despite this, MDEQ measurements appear
higher on average within the paired subset analyzed, the discrepancy is
potentially due to the differences within dataset construction which
includes the use of adjusted MDEQ values (variable lead2 to remove any
residence with filters) and the sample size being only 69, representing
only a specific subset of locations, not the full ranges of sites
sampled by Virginia Tech. Thus goes to highlight how data processing
choices and sampling design can influence conclusions.
Within this, there are several limitations that can affect the
interpretation of the analysis. The strong right skew and extreme
outliers in both datasets indicate the presence of highly elevated
individual measurements which could reflect localized contamination or a
local pipe specific conditions opposed to a system wide trend, which
could disproportionately influence the mean based comparisons.
The analysis is restricted to a moderate 69 sample locations present
in both datasets. This matched subset may not be representative of all
affected household in Flint, which limits the ability to create an
generalization of these findings. The MDEQ dataset uses an adjusted lead
variable (lead2), the precise values is not documented within the
dataset, making it difficult to fully interpret what the MDEQ values
represent relative to the raw measurements. Finally, this analysis
doesn’t account for any potential external variables such as pipe age,
standard of water treatment or the seasonal variation in water chemistry
which all could have the potential to influence the lead
concentration.
Conclusion
Lead concentrations were significantly different between matched MDEQ
and Virginia Tech samples as confirmed by both the Wilcoxon test (p =
5.365e-9) and the supporting paired t-test(p = 1.253e-5). While the
paired subset subset of 69 matched locations shows MDEQ values to be
higher on average (mean difference = 4.753246 ppb, MDEQ - VT), although
the full Virginia Tech dataset captures a broader range of contaminated
sites with a overall higher mean (10.60ppb vs 5.72ppb), this suggests
that MDEQ’s sampling program is underrepresented the most severely
affected locations. With both data, together these findings reveals how
differences in sampling methodology can produce significantly different
statistics of lead contamination with consequences for public health
decision making during environmental crisis.
References
Ruckart, P. Z., Ettinger, A. S., Hanna-Attisha, M., Jones, N., Davis,
S. I., & Breysse, P. N. (2019). The Flint water crisis: A
coordinated public health emergency response and recovery initiative.
Journal of Public Health Management and Practice, 25(Suppl 1), S84–S90.
https://doi.org/10.1097/PHH.0000000000000871
Langkjær-Bain, R. (2017). The murky tale of Flint’s deceptive water
data. Significance, 14(2), 16–21. https://doi.org/10.1111/j.1740-9713.2017.01016.x
Aquasana. (2018). [Image of water contamination]. https://www.aquasana.com/dw/image/v2/BDTV_PRD/on/demandware.static/-/Sites-aquasana-Library/default/dwd1dbbe41/images/blog/2018/07/iStock-500755634.jpg