Introduction

In 2015, Flint, Michigan experienced a public water health crisis in which the city’s drinking water became contaminated with lead, exposing thousands of residents to toxic conditions.The crisis began in 2014 when Flint switched it’s water supply from Lake Huron to the Flint River as a means to cut costs, saving estimated 5 million USD over two years. To swiftly switch the supplies, Officials skipped the corrosion control treatment step. With the naturally high chloride levels within Flint river, this corrodes the pipes causing lead to be exposed into the water.

Lead is used across the ageing infrastructure across the United States. Water systems add corrosion control chemicals to prevent pipe corrosion and limit contamination. Although there is no completely safe amount of lead consumption, the regulation limit allowed by the lead and copper rule (LCR) is 15ppb (parts per billion). If more than 10% of residential exceeds this limit, action is required.

Despite these enforced rules, Michigan Department of Environmental Quality (MDEQ) initially reported that Flint’s drinking water met the safety standards. However, Investigators and resident sampling reveled significantly higher lead levels. In some cases lead concentrations were 6 times higher than the 15 ppb limit, in other cases, samples were 25 times higher than the limit. Further investigation showed that the official MDEQ testing procedures underestimated contamination due to flawed sampling methods, including the exclusion of high value samples from their analysis.

Lead is a toxic heavy metal and can be extremely harmful towards human health. It particularly towards children, as exposure can result in irreversible neurological damage which include, learning disabilities, behavioral problems, reduced IQ and slowed growth. In adults, exposure to lead has been linked with kidney damage, hypertension and cardiovascular disease. As lead accumulates within the body overtime, it can lead to long term biological consequences. During the Flint crisis, studies showed that the percentage of children with elevated blood leads nearly double after the water supply switch, showing the direct biological impact of environmental contamination.

Understanding how lead concentration across residential households and conditions is crucial in identifying the different high risk areas and in improving public health responses.This study specifically compares lead concentration measurements collected by Michigan Department of Environmental Quality (MPEQ) and the Virginia Tech (VT) research team. Two sampling programs that have produced different estimated contamination levels, which raises a crucial questions regarding the sampling methodology and data validity.

Lead concentration in drinking water samples, measured in parts per billion (ppb). This variable represents the level of contamination present in each sample and is an indicator of the potential health risk. The greater the value is corresponding to the greater the exposure to lead and the increased danger to residents

Explanatory variables of this dataset includes factors that may influence the lead concentration levels. These includes the sampling location, sampling conditions and the testing period(before or after interventions). The infrastructure of the water system can also influence the lead concentration levels as the presence of lead pipes and the difference in water chemistry may contribute towards differences in lead levels.

Main Research Question:

Is there a significant difference in lead concentration samples gathered in Virginia Tech and Michigan Department of Environmental Quality (MDEQ)

Null Hypothesis = There is no difference in lead concentration between matched MDEQ and Virginia Tech samples (mean paired difference = 0)

Alternative Hypothesis = There is a significant difference in lead concentration between matched MDEQ and Virginia Tech samples (mean paired difference is not 0)

Exploring the Data (Descriptive Statistics)

flint_mdeq %>% summarise(
  n = sum(!is.na(lead2)),
  mean = mean(lead2, na.rm = TRUE),
  median = median(lead2, na.rm = TRUE),
  sd = sd(lead2, na.rm = TRUE),
  se = sd(lead2, na.rm = TRUE) / sqrt(n)
)
## # A tibble: 1 × 5
##       n  mean median    sd    se
##   <int> <dbl>  <dbl> <dbl> <dbl>
## 1    69  5.72      3  8.34  1.00
flint_vt %>% summarize(
  n = sum(!is.na(lead)),
  mean = mean(lead, na.rm = TRUE),
  median = median(lead, na.rm = TRUE),
  sd = sd(lead, na.rm = TRUE),
  se = sd(lead, na.rm = TRUE) / sqrt(n)
)
## # A tibble: 1 × 5
##       n  mean median    sd    se
##   <int> <dbl>  <dbl> <dbl> <dbl>
## 1   271  10.6   3.52  21.6  1.31

The descriptive statistics were used to calculate the summarized lead concentration in MDEQ (specifically the lead2 dataset) and Virginia Tech datasets. The MDEQ dataset contained 69 observation with a mean lead level of 5.72 ppb and a median of 3.00 ppb whereas the Virginia Tech dataset includes 271 observations and shows a higher mean lead level of 10.60 ppb with a median of 3.52 ppb. While the median values are somewhat similar, the difference in mean values suggests that the higher lead measurements occur more frequently in Virginia Tech Data.

in both datasets the mean exceeds the median, indicating right skewed distributions. This amount of skewness is more evidence in Virginia Tech data, which also has a larger standard deviation (21.60 vs 8.34) reflecting the larger variability and more extreme values

#histogram
hist1 <- ggplot(flint_mdeq, aes(x = lead2)) + geom_histogram() +  
  labs(title = "Distribution of Lead levels (MDEQ)", x = "Lead concentration (ppb)", y = "Frequency")
hist2 <- ggplot(flint_vt, aes(x = lead)) + geom_histogram() +
  labs(title = "Distribution of Lead levels (VT)", x = "Lead concentration (ppb)", y = "Frequency")
grid.arrange(hist1, hist2, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

These plots show a strong right skewed and the presence of extreme outliers particularly in the Virginia Tech dataset, suggesting that a transformation may be necessary for further analysis

Both histograms show a strong right skew, with most of the samples clustered at at low lead concentrations and a long tail of higher values. The VT histogram extends to approximately 150 ppb, highlight the presence of extreme outliers. These distribution properties confirms that both dataset violates normality distributed in it’s raw form.

log1 <- ggplot(flint_mdeq, aes( x = log(lead2))) + geom_histogram() +
  labs(title = "Distribution of Log Lead levels (MDEQ)", x = "Lead (ppb)", y = "Frequency")
log2 <-ggplot(flint_vt, aes( x = log(lead))) + geom_histogram()+
  labs(title = "Distribution of Log Lead levels (VT)", x = "Lead (ppb)", y = "Frequency")
grid.arrange(log1, log2, ncol = 2)
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
## Warning: Removed 15 rows containing non-finite outside the scale range
## (`stat_bin()`).
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

After applying a log transformation, the Virginia Tech distribution becomes more symmetric and roughly resembles a bell curve which suggests that the raw VT data follows a roughly log normal pattern. The MDEQ log histogram remains irregular likely due to a smaller sample size (n = 69) opposed to a different underlying distribution.

box1 <- ggplot(flint_mdeq, aes(y = lead2)) + geom_boxplot() + 
  labs(title = "MDEQ Adjusted Lead Levels", y = "Lead Concentration (ppb)", x = "Sample")
box2 <-ggplot(flint_vt, aes(y = lead)) + geom_boxplot() + 
  labs(title = "Virginia Tech Adjusted Lead Levels", y = "Lead Concentration (ppb)", x = "Sample")
grid.arrange(box1, box2, ncol = 2)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

The raw boxplot confirms the presence of substantial right skew and extreme outliers in both datasets, particularly within the Virginia Tech data. The log scale boxplot reveals that the interquartile ranges of the datasets are broadly comparable. The VT data shows a slightly higher median. The Log(Lead + 1) transformation is use here opposed to the log() to handle any zero values within the data

box3 <- ggplot(flint_mdeq, aes(y = log(lead2 + 1))) + geom_boxplot() + 
  labs(title = "MDEQ Adjusted (log) Lead Levels", y = "Log of Lead Concentrations", x = "Samples")
box4 <-ggplot(flint_vt, aes(y = log(lead + 1))) + geom_boxplot() + 
  labs(title = "Virginia Tech Adjusted (log) Lead Levels", y = "Log of Lead Concentrations", x = "Samples")
grid.arrange(box3, box4, ncol = 2)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

The log-scale boxplot allows clearer comparison of the interquartile ranges between the two datasets. Even after transformation, both distributions remain skewed with notable outliers present in the Virginia Tech data which further supporting the decision to use a nonparametric test as the primary analysis.

Statistical Tests (Inferential Statistics)

Because the research question compares MDEQ and Virginia Tech measurements at the same sampling location, a pair analysis is appropriate. The datasets were joined through a sample ID producing a matched pair

combined <- flint_mdeq %>% 
  select(sample, lead2) %>%
  rename(mdeq = lead2) %>%
  inner_join(flint_vt, by = "sample") %>%
  rename(vt = lead)

For a paired t-test to be valid, the differences between the paired and observed data must be approximately normally distributed. The Shapiro-Wilk test was used to the paired differences (MDEQ - VT) to determine this assumption

diff <- combined$mdeq - combined$vt
shapiro.test(diff)
## 
##  Shapiro-Wilk normality test
## 
## data:  diff
## W = 0.6365, p-value = 8.623e-12

The Shapiro-Wilk test on the paired differences produced W = 0.6364, P = 8.623e-12. This rejects the assumption of normality. The normality assumption required for a paired t-test is therefore violated.

The normality assumption was violated and the distribution of paired differences showed a strong right skew, a nonparametric Wilcoxon test was selected as the primary test. The Wilcoxon test does not assume normality therefore appropriate for the skewed paired data

wilcox.test(combined$mdeq, combined$vt, paired = TRUE)
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  combined$mdeq and combined$vt
## V = 2184, p-value = 5.365e-09
## alternative hypothesis: true location shift is not equal to 0

The Wilcoxon test produced V = 2184, p = 5.365e-9, which is far below the significance threshold of a = 0.05, thus rejecting the null hypothesis. This concludes that there is a significant difference in tlead concentration between the matched MDEQ and Virginia Tech samples

Despite normality being violated, a paired t-test was conducted as a robustness check, as it is relatively robust to moderate deviations from normality given the sample size (n = 69 paired observations) thus allowing the comparisons between parametric and nonparametric tests (confirmed this)

t.test(combined$mdeq, combined$vt, paired = TRUE)
## 
##  Paired t-test
## 
## data:  combined$mdeq and combined$vt
## t = 4.7127, df = 68, p-value = 1.253e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  2.740594 6.765899
## sample estimates:
## mean difference 
##        4.753246

The paired t-test produced t(68) = 4.6127, p = 1.253e-05 with a 95% confidence interval for the mean difference of (2.740529 , 6.765899) ppb. The estimated mean paired difference (MDEQ - VT) is 4.75 ppb, which indicates that within this matched subset, MDEQ values that are on average higher than the VT values. This was supported between both tests strengthening the confidence of the conclusion

Discussion

The Wilcoxon (V = 2184, p = 5.365e-9) and the paired t-test (t(68) = 4.713, p = 1.253e-5) both providing evidence to reject the null hypothesis confirming a statistically significant difference in lead concentration between MDEQ and Virginia Tech samples. the mean paired differences from the t-test is 4.753246 ppb (MDEQ - VT), indicating that within the subset of 69 sample locations, MDEQ values are higher on average than the corresponding VT values.

These results are important both biologically and environmentally. The variation in lead concentration across sampling programs may reflect real difference in which households were tested, how the samples were collected and which values were included in the official reports. Lead exposure is particularly harmful to children, with documented damages associated with irreversible neurological damage including reduced IQ, learning disabilities and behavior problems. A proportion of children residing in Flint, Michigan have been documented to have elevated blood lead levels nearly doubled after the water supply switch. This demonstrates the direct biological consequences of under-reporting contamination and any underestimation of lead levels causing serious public health issues.

The consistency between the parametric t-test and the nonparametric Wilcoxon test strengthens the reliability of the conclusion. Despite the assumption of normality being violated, the moderate sample size (69 pairs) and agreement between both tests suggest that the observed difference is robust.

Previous research on the Flint water crisis has reported that Virginia Tech measurements identified higher and more concerning lead levels compared to official MDEQ data, a difference that is linked variations within sampling method and the exclusion of high value samples from the MDEQ reports. Despite this, MDEQ measurements appear higher on average within the paired subset analyzed, the discrepancy is potentially due to the differences within dataset construction which includes the use of adjusted MDEQ values (variable lead2 to remove any residence with filters) and the sample size being only 69, representing only a specific subset of locations, not the full ranges of sites sampled by Virginia Tech. Thus goes to highlight how data processing choices and sampling design can influence conclusions.

Within this, there are several limitations that can affect the interpretation of the analysis. The strong right skew and extreme outliers in both datasets indicate the presence of highly elevated individual measurements which could reflect localized contamination or a local pipe specific conditions opposed to a system wide trend, which could disproportionately influence the mean based comparisons.

The analysis is restricted to a moderate 69 sample locations present in both datasets. This matched subset may not be representative of all affected household in Flint, which limits the ability to create an generalization of these findings. The MDEQ dataset uses an adjusted lead variable (lead2), the precise values is not documented within the dataset, making it difficult to fully interpret what the MDEQ values represent relative to the raw measurements. Finally, this analysis doesn’t account for any potential external variables such as pipe age, standard of water treatment or the seasonal variation in water chemistry which all could have the potential to influence the lead concentration.

Conclusion

Lead concentrations were significantly different between matched MDEQ and Virginia Tech samples as confirmed by both the Wilcoxon test (p = 5.365e-9) and the supporting paired t-test(p = 1.253e-5). While the paired subset subset of 69 matched locations shows MDEQ values to be higher on average (mean difference = 4.753246 ppb, MDEQ - VT), although the full Virginia Tech dataset captures a broader range of contaminated sites with a overall higher mean (10.60ppb vs 5.72ppb), this suggests that MDEQ’s sampling program is underrepresented the most severely affected locations. With both data, together these findings reveals how differences in sampling methodology can produce significantly different statistics of lead contamination with consequences for public health decision making during environmental crisis.

References

Ruckart, P. Z., Ettinger, A. S., Hanna-Attisha, M., Jones, N., Davis, S. I., & Breysse, P. N. (2019). The Flint water crisis: A coordinated public health emergency response and recovery initiative. Journal of Public Health Management and Practice, 25(Suppl 1), S84–S90. https://doi.org/10.1097/PHH.0000000000000871

Langkjær-Bain, R. (2017). The murky tale of Flint’s deceptive water data. Significance, 14(2), 16–21. https://doi.org/10.1111/j.1740-9713.2017.01016.x

Aquasana. (2018). [Image of water contamination]. https://www.aquasana.com/dw/image/v2/BDTV_PRD/on/demandware.static/-/Sites-aquasana-Library/default/dwd1dbbe41/images/blog/2018/07/iStock-500755634.jpg