This analysis examines the relationship between childhood lead exposure from leaded gasoline and subsequent criminal behavior in adulthood using US crime rate data. Tetraethyl lead was widely used as a gasoline additive throughout the mid-20th century until its phaseout began in 1976 due to established health concerns. Prior research has demonstrated that childhood lead exposure negatively affects brain development, particularly in regions governing impulse control, decision-making, and aggression—cognitive and behavioral factors associated with criminal activity. To test this hypothesis at the population level, crime rates before 1994 (representing populations exposed during childhood) are compared with rates from 1994 onward (representing populations who matured after the phaseout), employing an 18-year lag to align exposure periods with adult criminal behavior.
Let \(\mu_{\text{leaded}}\) represent the mean crime rate for cohorts exposed to leaded gasoline (pre-1994), and \(\mu_{\text{unleaded}}\) represent the mean crime rate for cohorts not exposed to leaded gasoline (post-1994).
Null Hypothesis:
\[H_0: \mu_{\text{leaded}} - \mu_{\text{unleaded}} = 0\]
Alternative Hypothesis:
\[H_a: \mu_{\text{leaded}} - \mu_{\text{unleaded}} > 0\]
The dataset is first reduced to the essential variables: year, total crime count, and population. Records containing missing values are removed to ensure analytical integrity. A normalized crime rate per 100,000 population is calculated to account for population changes over time, enabling valid comparisons across years. The data is then categorized into two eras based on the 18-year lag hypothesis: the “Leaded” era (years before 1995, representing cohorts exposed to leaded gasoline during childhood) and the “Unleaded” era (1995 onward, representing cohorts who grew up after the phaseout began in 1977). Finally, mean crime rates are computed for each era to facilitate comparison between the two groups.
us_crime_rates <- read_csv("us_crime_rates.csv")
## Rows: 60 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (12): year, population, total, violent, property, murder, forcible_rape,...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(us_crime_rates)
## # A tibble: 6 × 12
## year population total violent property murder forcible_rape robbery
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1960 179323175 3384200 288460 3095700 9110 17190 107840
## 2 1961 182992000 3488000 289390 3198600 8740 17220 106670
## 3 1962 185771000 3752200 301510 3450700 8530 17550 110860
## 4 1963 188483000 4109500 316970 3792500 8640 17650 116470
## 5 1964 191141000 4564600 364220 4200400 9360 21420 130390
## 6 1965 193526000 4739400 387390 4352000 9960 23410 138690
## # ℹ 4 more variables: aggravated_assault <dbl>, burglary <dbl>,
## # larceny_theft <dbl>, vehicle_theft <dbl>
summary(us_crime_rates)
## year population total violent
## Min. :1960 Min. :179323175 Min. : 3384200 Min. : 288460
## 1st Qu.:1975 1st Qu.:212691000 1st Qu.: 8685625 1st Qu.: 996838
## Median :1990 Median :248474436 Median :11272114 Median :1278578
## Mean :1990 Mean :252716504 Mean :10452904 Mean :1194566
## 3rd Qu.:2004 3rd Qu.:294369397 3rd Qu.:12600326 3rd Qu.:1425626
## Max. :2019 Max. :328239523 Max. :14872900 Max. :1932270
## property murder forcible_rape robbery
## Min. : 3095700 Min. : 8530 Min. : 17190 Min. :106670
## 1st Qu.: 7749522 1st Qu.:15266 1st Qu.: 55918 1st Qu.:331625
## Median :10053484 Median :16980 Median : 87132 Median :415836
## Mean : 9256650 Mean :17263 Mean : 77384 Mean :407210
## 3rd Qu.:11216494 3rd Qu.:20200 3rd Qu.: 94385 3rd Qu.:500543
## Max. :12961100 Max. :24700 Max. :143765 Max. :687730
## aggravated_assault burglary larceny_theft vehicle_theft
## Min. : 154320 Min. : 912100 Min. :1855400 Min. : 328200
## 1st Qu.: 483518 1st Qu.:1913601 1st Qu.:5195649 1st Qu.: 748819
## Median : 763033 Median :2204156 Median :6453334 Median :1006000
## Mean : 691072 Mean :2335967 Mean :5915788 Mean :1004968
## 3rd Qu.: 869517 3rd Qu.:3071950 3rd Qu.:7138300 3rd Qu.:1236357
## Max. :1135610 Max. :3795200 Max. :8142200 Max. :1661700
us_crime_rates <- us_crime_rates |>
select(year, total, population)
us_crime_rates <- us_crime_rates |>
drop_na()
us_crime_rates <- us_crime_rates |>
mutate(crime_rate_normalized = total / population * 100000)
us_crime_rates <- us_crime_rates |>
mutate(era = factor(ifelse(year < 1977 + 18, "Leaded", "Unleaded"),
levels = c("Leaded", "Unleaded")))
era_means <- us_crime_rates |>
group_by(era) |>
summarize(mean_crime_rate = mean(crime_rate_normalized))
print(era_means)
## # A tibble: 2 × 2
## era mean_crime_rate
## <fct> <dbl>
## 1 Leaded 4473.
## 2 Unleaded 3711.
library(ggplot2)
ggplot(us_crime_rates, aes(x=year, y=crime_rate_normalized, color=era)) +
geom_point() +
geom_vline(xintercept=1976.5, linetype="dashed", color="red") +
geom_vline(xintercept=1976.5 + 18, linetype="dashed", color="blue") +
labs(title="US Crime Rate Before and After Lead Removal",
x="Year", y="Crime Rate per 100,000") +
theme_minimal()
The beginning of the phase out of lead is marked with the red dashed line, whereas the delayed bifurcation point to account for childhood exposure is marked with blue.
An ANOVA test is used to determine whether mean crime rates differ significantly between the leaded and unleaded eras. This test is appropriate as it compares group means and assesses whether observed differences exceed what would be expected by random variation.
anova_result <- aov(crime_rate_normalized ~ era, data=us_crime_rates)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## era 1 8475608 8475608 6.31 0.0148 *
## Residuals 58 77909916 1343274
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA test yielded a p-value of 0.0148, which is statistically significant at the α = 0.05 level, leading us to reject the null hypothesis. The data supports that crime rates were significantly higher when offenders would have grown up exposed to leaded gasoline (pre-1994) compared to those who grew up after the phaseout began (post-1994). The 18-year lag methodology, based on the typical age for adult criminal charges, accounts for the latency between childhood exposure and adult behavioral outcomes. Notably, this lag is actually conservative: peak criminal activity occurs in the late teens through mid-twenties, meaning many in our “unleaded” cohort still experienced some lead exposure. The fact that we observe significant differences despite this overlap suggests a robust relationship. While correlation does not prove causation, the statistically significant result combined with the biological plausibility of lead’s effects on brain development and impulse control provides compelling evidence for lead exposure as a contributing factor to historical crime rate variations. Further research could be conducted to eliminate additional confounding variables, such as differing judicial policy, or adjusting the data for the number of cars actually on the road.