Summary Statistics

These are the summary statistics from the Burkhardt paper:

include_graphics("sum_burk.png")

These are my means:

mean(crime$mean)
## [1] 21.03657
sd(crime$mean)
## [1] 8.067214
min(crime$mean)
## [1] 1.91396
max(crime$mean)
## [1] 74.125
mean(crime$mean_oz, na.rm = TRUE)
## [1] 12.55691
sd(crime$mean_oz, na.rm = TRUE)
## [1] 5.575342
min(crime$mean_oz, na.rm = TRUE)
## [1] 0.2885415
max(crime$mean_oz, na.rm = TRUE)
## [1] 85.61695

I divided ozone by 1000, to put it in the same scale as the means in the Burkhardt paper. I do not know why my ozone means are so high (maybe not in parts per million, but parts per thousand? Another possibility is that Medellin has dramatically higher ozone levels than the US, or as a city vs the widespread counties they use in the paper). In any case, for this exploration that will do, with the understanding that I need to find out what the problem is here.

Regressions

I will use mean PM25 since that is the one they use in the paper.

This is their specification, in a Poisson Quasi-Maximum Likelihood:

\[ crime_{ct}^t = \gamma^j_{PM}PM25_{ct} + \gamma^j_{o}ozone_{ct} + X_{ct}\beta^j + \phi_{ct} + \epsilon_{ct}^j \] These are the results of their regressions:

include_graphics("reg_burk.png")

This is my approximation, using a Poisson Quasi-Maximum Likelihood estimation:

NOTE: I omit the coefficients for barrio, month and weekday, for readibility.

Model 1: crime = pm25 + oz

Model 2: crime = pm25 + oz with barrio Fixed Effects

Model 3: crime = pm25 + oz with barrio and year Fixed Effects

Model 4: crime = pm25 + oz with barrio, year and month Fixed Effects

Model 5: crime = pm25 + oz with barrio, year, month and day of the week Fixed Effects

Dependent variable:
crime
(1) (2) (3) (4) (5)
pm25 0.024*** 0.009* 0.014*** 0.023*** 0.015**
(0.005) (0.005) (0.005) (0.006) (0.007)
mean_oz -0.030*** -0.012 -0.018* -0.033*** -0.023**
(0.011) (0.009) (0.010) (0.011) (0.011)
factor(year)2018 0.324*** 0.429*** 0.349***
(0.115) (0.121) (0.129)
factor(year)2019 0.071 0.727** 0.587
(0.289) (0.348) (0.373)
Constant -3.104*** -1.706*** -1.922*** -2.635*** -2.538***
(0.199) (0.190) (0.208) (0.318) (0.364)
Observations 15,798 15,798 15,798 15,798 15,798
Note: p<0.1; p<0.05; p<0.01

Now, given that I don’t really trust my Ozone estimations, I’ll run the same models but without Ozone. Note that these have MUCH HIGHER, sample sizes, since I lose a lot of observations given how many NAs there are in the Ozone observations.

Model 1: crime = pm25

Model 2: crime = pm25 with barrio Fixed Effects

Model 3: crime = pm25 with barrio and year Fixed Effects

Model 4: crime = pm25 with barrio, year and month Fixed Effects

Model 5: crime = pm25 with barrio, year, month and day of the week Fixed Effects

Dependent variable:
crime
(1) (2) (3) (4) (5)
pm25 0.011*** 0.005*** 0.006*** 0.018*** 0.008***
(0.002) (0.002) (0.002) (0.002) (0.002)
factor(year)2018 0.275*** 0.342*** 0.308***
(0.038) (0.041) (0.042)
factor(year)2019 -0.270*** -0.011 -0.073
(0.082) (0.100) (0.103)
Constant -2.474*** -2.145*** -2.369*** -2.837*** -2.613***
(0.049) (0.095) (0.101) (0.123) (0.133)
Observations 85,698 85,698 85,698 85,698 85,698
Note: p<0.1; p<0.05; p<0.01

I couldn’t replicate his model because the amount of FEs crashes the computer.

Burkhardt FEs

Model 1: crime = pm25 with barrio-year-month-dow Fixed Effects Didn’t work

Model 1: crime = pm25 + oxone with barrio-year-month-dow Fixed Effects Didn’t work

model1c <- glm(crime ~ pm25 + factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model2c <- glm(violence ~ pm25 + factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model3c <- glm(property ~ pm25 + factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model4c <- glm(viol_prop ~ pm25 + factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))

Dependent Variable Comparison

Full FE Model without ozone different independent variables:

Model 1: crime = pm25 with barrio, year, month and day of the week Fixed Effects

Model 2: property_posession = pm25 with barrio, year, month and day of the week Fixed Effects

Model 3: life_integrity = pm25 with barrio, year, month and day of the week Fixed Effects

Model 4: economic_activity = pm25 with barrio, year, month and day of the week Fixed Effects

Model 5: public_spaceint = pm25 with barrio, year, month and day of the week Fixed Effects

Model 6: urban_prop = pm25 with barrio, year, month and day of the week Fixed Effects

stargazer(model1c, model2c,model3c, model4c, type = 'html', omit = c("barrio", "month", "weekday"))
Dependent variable:
crime violence property viol_prop
(1) (2) (3) (4)
pm25 0.008*** 0.006 0.013*** 0.010***
(0.002) (0.004) (0.003) (0.003)
factor(year)2018 0.308*** 0.181*** 0.374*** 0.295***
(0.042) (0.061) (0.059) (0.045)
factor(year)2019 -0.073 -0.219 -0.017 -0.100
(0.103) (0.143) (0.150) (0.110)
Constant -2.613*** -4.747*** -3.321*** -2.953***
(0.133) (0.304) (0.177) (0.147)
Observations 85,698 85,698 85,698 85,698
Note: p<0.1; p<0.05; p<0.01
model1d <- glm(crime ~ pm25 + mean_oz + factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model2d <- glm(violence ~ pm25 + mean_oz +  factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model3d <- glm(property ~ pm25 + mean_oz +  factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))
model4d <- glm(viol_prop ~ pm25 + mean_oz +  factor(barrio) + factor(year) + factor(month) + factor(weekday), data = crime,family = stats::quasipoisson(link = "log"))

Same models as above but with ozone:

Dependent variable:
crime violence property viol_prop
(1) (2) (3) (4)
pm25 0.015** 0.016 0.012 0.013*
(0.007) (0.011) (0.008) (0.008)
mean_oz -0.023** -0.018 -0.026* -0.023*
(0.011) (0.017) (0.013) (0.012)
factor(year)2018 0.349*** 0.112 0.387** 0.264*
(0.129) (0.198) (0.154) (0.141)
factor(year)2019 0.587 -0.104 0.974** 0.582
(0.373) (0.681) (0.403) (0.403)
Constant -2.538*** -3.729*** -3.409*** -2.827***
(0.364) (0.567) (0.433) (0.400)
Observations 15,798 15,798 15,798 15,798
Note: p<0.1; p<0.05; p<0.01

Other Measures

Now, I will add new measures to expand the model. I will gradually add mean temperature, mean humidity, mean wind and mean precipitation. Additionally, I will add an indicator for the learning period of the bill, to observe if this confounding factor in any way alters the relationships:

This first group of models includes only total crime:

model1e <- glm(crime ~ pm25  + factor(barrio)  + factor(month) + factor(weekday) 
               + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model2e <- glm(crime ~ pm25  + mean_temp + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model3e <- glm(crime ~ pm25  + mean_prec + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model4e <- glm(crime ~ pm25 + mean_hum + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model5e <- glm(crime ~ pm25  + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model6e <- glm(crime ~ pm25  + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))
Dependent variable:
crime
(1) (2) (3) (4) (5) (6)
pm25 0.009*** 0.010*** 0.009*** 0.010*** 0.009*** 0.011***
(0.002) (0.003) (0.002) (0.003) (0.002) (0.003)
mean_temp 0.027** 0.028
(0.012) (0.022)
mean_prec -0.216 3.819
(2.207) (2.674)
mean_hum -0.003 -0.0002
(0.002) (0.005)
mean_wind 0.023 0.014
(0.039) (0.054)
factor(learning)1 -0.306*** -0.303*** -0.296*** -0.341*** -0.298*** -0.347***
(0.042) (0.042) (0.042) (0.042) (0.042) (0.043)
Constant -2.488*** -2.991*** -2.453*** -2.251*** -2.506*** -3.055***
(0.116) (0.272) (0.115) (0.197) (0.144) (0.700)
Observations 85,698 84,046 84,029 81,188 83,702 80,651
Note: p<0.1; p<0.05; p<0.01

This second group includes the same models but with Property Crime:

model1e2 <- glm(property ~ pm25  + factor(barrio)  + factor(month) + factor(weekday) 
               + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model2e2 <- glm(property ~ pm25  + mean_temp + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model3e2 <- glm(property ~ pm25  + mean_prec + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model4e2 <- glm(property ~ pm25 + mean_hum + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model5e2 <- glm(property ~ pm25  + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model6e2 <- glm(property ~ pm25  + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))
Dependent variable:
property
(1) (2) (3) (4) (5) (6)
pm25 0.013*** 0.014*** 0.013*** 0.014*** 0.013*** 0.015***
(0.003) (0.003) (0.003) (0.004) (0.003) (0.004)
mean_temp 0.022 0.034
(0.017) (0.031)
mean_prec -2.265 1.684
(3.092) (3.826)
mean_hum -0.001 0.005
(0.003) (0.007)
mean_wind -0.009 0.042
(0.055) (0.076)
factor(learning)1 -0.372*** -0.371*** -0.368*** -0.442*** -0.368*** -0.456***
(0.058) (0.059) (0.059) (0.060) (0.059) (0.061)
Constant -3.133*** -3.519*** -3.078*** -3.079*** -3.066*** -4.271***
(0.150) (0.381) (0.151) (0.276) (0.195) (1.001)
Observations 85,698 84,046 84,029 81,188 83,702 80,651
Note: p<0.1; p<0.05; p<0.01

This second group includes the same models but with Violent Crime:

model1e3 <- glm(violence ~ pm25  + factor(barrio)  + factor(month) + factor(weekday) 
               + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model2e3 <- glm(violence ~ pm25  + mean_temp + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model3e3 <- glm(violence ~ pm25  + mean_prec + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model4e3 <- glm(violence ~ pm25 + mean_hum + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model5e3 <- glm(violence ~ pm25  + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model6e3 <- glm(violence ~ pm25  + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio) + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))
Dependent variable:
violence
(1) (2) (3) (4) (5) (6)
pm25 0.007* 0.007** 0.006* 0.006* 0.007* 0.006*
(0.004) (0.004) (0.003) (0.004) (0.004) (0.004)
mean_temp 0.013 0.0002
(0.017) (0.030)
mean_prec -0.290 1.927
(3.168) (3.704)
mean_hum -0.003 -0.007
(0.003) (0.006)
mean_wind -0.018 -0.079
(0.055) (0.074)
factor(learning)1 -0.180*** -0.163*** -0.160*** -0.163*** -0.159*** -0.158***
(0.061) (0.060) (0.059) (0.059) (0.060) (0.060)
Constant -4.757*** -4.997*** -4.737*** -4.496*** -4.700*** -4.054***
(0.289) (0.443) (0.282) (0.351) (0.307) (0.974)
Observations 85,698 84,046 84,029 81,188 83,702 80,651
Note: p<0.1; p<0.05; p<0.01

Daily lags of PM25

Here I run the model with a one day lag on the three outcome variables:

model1f <- glm(crime ~ pm25_lag + ozone_lag  + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio) + factor(month) + factor(weekday) 
               + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model2f <- glm(violence ~ pm25_lag + ozone_lag  + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))

model3f <- glm(property ~ pm25_lag + ozone_lag + mean_temp + mean_hum + mean_prec + mean_wind + factor(barrio)  + factor(month) + factor(weekday) + factor(learning), data = crime,family = stats::quasipoisson(link = "log"))
Dependent variable:
crime violence property
(1) (2) (3)
pm25_lag 0.014** 0.017 0.009
(0.007) (0.010) (0.008)
ozone_lag -0.019** -0.022 -0.024*
(0.010) (0.016) (0.013)
mean_temp 0.137*** 0.052 0.151**
(0.053) (0.092) (0.064)
mean_hum 0.002 -0.026 0.010
(0.013) (0.021) (0.017)
mean_prec -12.794 -2.176 -21.158*
(9.269) (13.120) (12.842)
mean_wind -0.256 -0.619** 0.024
(0.167) (0.267) (0.205)
factor(learning)1 -0.461*** -0.293 -0.500***
(0.119) (0.186) (0.157)
Constant -18.956 -18.113 -24.413
(1,594.568) (11,057.750) (10,844.160)
Observations 14,709 14,709 14,709
Note: p<0.1; p<0.05; p<0.01