Analysis: What factors contribute to childhood lead poisoning in Milwaukee?

This analysis examined the correlation between childhood lead poisoning and eight factors:

Scatterplots

We compared lead poisoning rates with each factor one by one. Here’s what we found:

Census tracts with a higher share of Black residents have a higher likelihood of childhood lead poisoning:

## 
## Call:
## lm(formula = PERC_POISONED ~ black_pct, data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.1412 -3.3884 -0.9074  3.2466 14.1934 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.3413     0.5611   5.955 1.17e-08 ***
## black_pct     9.2120     1.0165   9.063  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.965 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.2943, Adjusted R-squared:  0.2907 
## F-statistic: 82.14 on 1 and 197 DF,  p-value: < 2.2e-16

Census tracts with higher rates of renters show a similar correlation:

## 
## Call:
## lm(formula = PERC_POISONED ~ renter_occupied_pct, data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.249  -3.543  -1.281   1.495  16.383 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -0.07102    1.27144  -0.056    0.956    
## renter_occupied_pct 12.31999    2.02530   6.083 6.03e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.423 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.1581, Adjusted R-squared:  0.1539 
## F-statistic:    37 on 1 and 197 DF,  p-value: 6.029e-09

As do census tracts with higher rates of childen on Medicaid:

## 
## Call:
## lm(formula = PERC_POISONED ~ under_19_medicaid_pct, data = joined)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.193 -3.990 -1.643  3.446 19.468 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.5312     0.8734   1.753   0.0811 .  
## under_19_medicaid_pct  11.4613     1.5699   7.300 6.91e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.243 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.2129, Adjusted R-squared:  0.2089 
## F-statistic:  53.3 on 1 and 197 DF,  p-value: 6.908e-12

Census tracts with a higher median household income show the opposite effect — they have lower levels of lead poisoning:

## 
## Call:
## lm(formula = PERC_POISONED ~ median_household_incomeE, data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.576  -3.909  -1.493   2.998  16.365 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               1.307e+01  9.349e-01  13.986  < 2e-16 ***
## median_household_incomeE -1.286e-04  1.905e-05  -6.751 1.61e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.326 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.1879, Adjusted R-squared:  0.1838 
## F-statistic: 45.57 on 1 and 197 DF,  p-value: 1.607e-10

So do census tracts with newer residential housing:

## 
## Call:
## lm(formula = PERC_POISONED ~ avg_year_built, data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.086  -3.317  -0.816   2.474  13.795 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    250.31574   26.56952   9.421   <2e-16 ***
## avg_year_built  -0.12581    0.01376  -9.146   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.948 on 196 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.2991, Adjusted R-squared:  0.2955 
## F-statistic: 83.65 on 1 and 196 DF,  p-value: < 2.2e-16

While census tracts with a higher share of Hispanic residents do not show much of a correlation at all

## 
## Call:
## lm(formula = PERC_POISONED ~ hisp_pct, data = joined)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.153 -4.817 -1.260  3.467 16.328 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.3519     0.5254  15.896  < 2e-16 ***
## hisp_pct     -5.5112     1.7326  -3.181  0.00171 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.764 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.04885,    Adjusted R-squared:  0.04402 
## F-statistic: 10.12 on 1 and 197 DF,  p-value: 0.001706

But the strongest correlation appears to be the census tract’s violation rate — which I calculated as the number of individual DNS violations per rental unit:

## `geom_smooth()` using formula 'y ~ x'

## Warning: Removed 12 rows containing non-finite values (stat_smooth).

## Warning: Removed 12 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = PERC_POISONED ~ violation_rate, data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.9516 -2.3384 -0.4347  2.0552  8.6141 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.6107     0.3794   4.246 3.36e-05 ***
## violation_rate   5.2439     0.2677  19.588  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.442 on 197 degrees of freedom
##   (12 observations deleted due to missingness)
## Multiple R-squared:  0.6607, Adjusted R-squared:  0.659 
## F-statistic: 383.7 on 1 and 197 DF,  p-value: < 2.2e-16

What the above shows is that under a simple linear regression model, there is a statistically significant correlation between violation rate and childhood lead poisoning. Specifically, an increase of one violation per rental unit appears correlated with a 5 percentage point increase in the childhood lead poisoning rate, with an adjusted R-squared of .659.

fit <- lm(PERC_POISONED ~ black_pct + avg_year_built + violation_rate, data = joined)
summary(fit)

## 
## Call:
## lm(formula = PERC_POISONED ~ black_pct + avg_year_built + violation_rate, 
##     data = joined)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.2349 -1.6568 -0.3176  1.0227 11.4312 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    151.95748   19.70229   7.713 6.29e-13 ***
## black_pct        4.82653    0.80299   6.011 9.00e-09 ***
## avg_year_built  -0.07780    0.01016  -7.656 8.83e-13 ***
## violation_rate   3.25985    0.34030   9.579  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.982 on 194 degrees of freedom
##   (13 observations deleted due to missingness)
## Multiple R-squared:  0.7481, Adjusted R-squared:  0.7442 
## F-statistic: 192.1 on 3 and 194 DF,  p-value: < 2.2e-16

Above is a multiple linear regression model that includes the violation rate along with the share of Black residents and the age of residential housing. This appears to show an even better fit, with an adjusted R-squared of .744. Under this model, an increase of one violation per rental unit is associated with a 3.3 percentage point increase in the childhood lead poisoning rate, when holding the share of Black residents and the median housing age constant.

Analysis: What factors contribute to childhood lead poisoning in Milwaukee?

Findings: Lead poisoning is higher in neighborhoods with more code violations, older homes and more Black residents

2023-05-08

Comparison maps