I am using a very basic approach, where I compare the MSE of the basic naïve model, vs incremental MSE’s where each term is squared.
This is the naïve model, with a quasi-poisson family:
crime = pm25 + mean_temp + mean_hum + mean_prec + mean_wind
## [1] "Basic Model MSE:"
## [1] 26.16971
## [1] "PM Squared MSE:"
## [1] 26.0536
## [1] "Temp Squared MSE:"
## [1] 26.97275
## [1] "Humidity Squared MSE:"
## [1] 27.09407
## [1] "Precipitation Squared MSE:"
## [1] 25.55084
## [1] "Wind Squared MSE:"
## [1] 26.30129
Barely any difference in any of the variables, so I’ll stick with non-quadratic.
Barrio, year and month fixed effects. Clustering errors by barrio:
basic_pois <- feglm(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + month + cod_barrio, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/0/1 fixed-effect (358 observations) removed because of only 0 outcomes.
Barrio, year and month fixed effects. Clustering errors by year:
basic_pois2 <- feglm(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio + year + month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 1/0/0 fixed-effect (358 observations) removed because of only 0 outcomes.
Barrio-Month fixed effects. Clustering errors by year:
basic_pois3 <- feglm(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + cod_barrio^month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/561 fixed-effects (19,366 observations) removed because of only 0 outcomes.
Barrio-Month-year fixed effects. Clustering errors by barrio:
basic_pois4 <- feglm(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 772 fixed-effects (21,029 observations) removed because of only 0 outcomes.
Barrio-Month-year-weekday fixed effects. Clustering errors by barrio:
basic_pois5 <- feglm(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year^weekday, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 10,217 fixed-effects (46,318 observations) removed because of only 0 outcomes.
etable(basic_pois, basic_pois2, basic_pois3, basic_pois4, basic_pois5)
## basic_pois basic_pois2
## Dependent Var.: crime crime
##
## pm25 0.0252. (0.0036) 0.0252*** (0.0060)
## mean_temp 0.0902. (0.0102) 0.0902*** (0.0241)
## mean_hum 0.0054 (0.0044) 0.0054 (0.0079)
## mean_prec -1.183 (2.270) -1.183 (4.332)
## mean_wind 0.0449 (0.0208) 0.0449 (0.1017)
## Fixed-Effects: ---------------- ------------------
## year Yes Yes
## month Yes Yes
## cod_barrio Yes Yes
## cod_barrio-month No No
## cod_barrio-month-year No No
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ __________________
## S.E.: Clustered by: year by: cod_barrio
## Observations 59,306 59,306
## Squared Cor. 0.09670 0.09670
## basic_pois3 basic_pois4
## Dependent Var.: crime crime
##
## pm25 0.0259. (0.0038) 0.0259*** (0.0063)
## mean_temp 0.0846. (0.0130) 0.0839** (0.0262)
## mean_hum 0.0053 (0.0050) 0.0048 (0.0094)
## mean_prec -2.400 (2.246) -2.265 (5.119)
## mean_wind 0.0513 (0.0155) 0.0509 (0.1137)
## Fixed-Effects: ---------------- ------------------
## year Yes No
## month No No
## cod_barrio No No
## cod_barrio-month Yes No
## cod_barrio-month-year No Yes
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ __________________
## S.E.: Clustered by: year by: cod.^mon.^year
## Observations 40,298 38,635
## Squared Cor. 0.11879 0.11910
## basic_pois5
## Dependent Var.: crime
##
## pm25 0.0135* (0.0055)
## mean_temp 0.0281 (0.0298)
## mean_hum -0.0053 (0.0090)
## mean_prec 5.460 (4.660)
## mean_wind 0.1324 (0.1051)
## Fixed-Effects: ----------------
## year No
## month No
## cod_barrio No
## cod_barrio-month No
## cod_barrio-month-year No
## cod_barrio-month-year-weekday Yes
## _____________________________ ________________
## S.E.: Clustered by: cod.^mon.^year^wee.
## Observations 13,346
## Squared Cor. 0.23389
viol_pois <- feglm(violence ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + month + cod_barrio, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/0/16 fixed-effects (6,514 observations) removed because of only 0 outcomes.
viol_pois2 <- feglm(violence ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio + year + month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 16/0/0 fixed-effects (6,514 observations) removed because of only 0 outcomes.
viol_pois3 <- feglm(violence ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + cod_barrio^month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/927 fixed-effects (32,653 observations) removed because of only 0 outcomes.
viol_pois4 <- feglm(violence ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 1,178 fixed-effects (34,184 observations) removed because of only 0 outcomes.
viol_pois5 <- feglm(violence ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year^weekday, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 11,559 fixed-effects (52,616 observations) removed because of only 0 outcomes.
etable(viol_pois, viol_pois2, viol_pois3, viol_pois4, viol_pois5)
## viol_pois viol_pois2
## Dependent Var.: violence violence
##
## pm25 0.0118 (0.0081) 0.0118** (0.0043)
## mean_temp 0.0444 (0.0453) 0.0444 (0.0314)
## mean_hum -0.0059 (0.0042) -0.0059 (0.0068)
## mean_prec -3.972 (3.124) -3.972 (4.632)
## mean_wind -0.0718 (0.1063) -0.0718 (0.0881)
## Fixed-Effects: ---------------- -----------------
## year Yes Yes
## month Yes Yes
## cod_barrio Yes Yes
## cod_barrio-month No No
## cod_barrio-month-year No No
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ _________________
## S.E.: Clustered by: year by: cod_barrio
## Observations 53,150 53,150
## Squared Cor. 0.06381 0.06381
## viol_pois3 viol_pois4
## Dependent Var.: violence violence
##
## pm25 0.0122 (0.0077) 0.0124* (0.0050)
## mean_temp 0.0321 (0.0437) 0.0316 (0.0353)
## mean_hum -0.0089 (0.0047) -0.0093 (0.0073)
## mean_prec -5.367 (2.925) -5.190 (4.053)
## mean_wind -0.1089 (0.1128) -0.1085 (0.1043)
## Fixed-Effects: ---------------- ----------------
## year Yes No
## month No No
## cod_barrio No No
## cod_barrio-month Yes No
## cod_barrio-month-year No Yes
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ ________________
## S.E.: Clustered by: year by: cod.^mon.^year
## Observations 27,011 25,480
## Squared Cor. 0.07065 0.07253
## viol_pois5
## Dependent Var.: violence
##
## pm25 0.0041 (0.0054)
## mean_temp -0.0394 (0.0389)
## mean_hum -0.0201* (0.0086)
## mean_prec 0.1830 (4.431)
## mean_wind -0.0287 (0.1074)
## Fixed-Effects: -----------------
## year No
## month No
## cod_barrio No
## cod_barrio-month No
## cod_barrio-month-year No
## cod_barrio-month-year-weekday Yes
## _____________________________ _________________
## S.E.: Clustered by: cod.^mon.^year^wee.
## Observations 7,048
## Squared Cor. 0.15367
prop_pois <- feglm(property ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + month + cod_barrio, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/0/9 fixed-effects (3,217 observations) removed because of only 0 outcomes.
prop_pois2 <- feglm(property ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio + year + month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 9/0/0 fixed-effects (3,217 observations) removed because of only 0 outcomes.
prop_pois3 <- feglm(property ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | year + cod_barrio^month, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 0/848 fixed-effects (29,254 observations) removed because of only 0 outcomes.
prop_pois4 <- feglm(property ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 1,090 fixed-effects (31,010 observations) removed because of only 0 outcomes.
prop_pois5 <- feglm(property ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year^weekday, data = crime, family=quasipoisson)
## NOTES: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443, Fixed-effects: 11,172).
## 11,441 fixed-effects (51,859 observations) removed because of only 0 outcomes.
etable(prop_pois, prop_pois2, prop_pois3, prop_pois4, prop_pois5)
## prop_pois prop_pois2
## Dependent Var.: property property
##
## pm25 0.0353* (0.0018) 0.0353*** (0.0097)
## mean_temp 0.1033 (0.0380) 0.1033** (0.0364)
## mean_hum 0.0125 (0.0094) 0.0125 (0.0100)
## mean_prec -1.976 (2.196) -1.976 (7.135)
## mean_wind 0.0853 (0.0481) 0.0853 (0.1305)
## Fixed-Effects: ---------------- ------------------
## year Yes Yes
## month Yes Yes
## cod_barrio Yes Yes
## cod_barrio-month No No
## cod_barrio-month-year No No
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ __________________
## S.E.: Clustered by: year by: cod_barrio
## Observations 56,447 56,447
## Squared Cor. 0.06388 0.06388
## prop_pois3 prop_pois4
## Dependent Var.: property property
##
## pm25 0.0368* (0.0024) 0.0367*** (0.0099)
## mean_temp 0.1064 (0.0427) 0.1055** (0.0393)
## mean_hum 0.0135 (0.0111) 0.0129 (0.0150)
## mean_prec -2.715 (2.264) -2.584 (7.226)
## mean_wind 0.0962 (0.0688) 0.0962 (0.1676)
## Fixed-Effects: ---------------- ------------------
## year Yes No
## month No No
## cod_barrio No No
## cod_barrio-month Yes No
## cod_barrio-month-year No Yes
## cod_barrio-month-year-weekday No No
## _____________________________ ________________ __________________
## S.E.: Clustered by: year by: cod.^mon.^year
## Observations 30,410 28,654
## Squared Cor. 0.09000 0.08990
## prop_pois5
## Dependent Var.: property
##
## pm25 0.0215** (0.0083)
## mean_temp 0.0524 (0.0436)
## mean_hum 0.0025 (0.0151)
## mean_prec 6.780 (7.315)
## mean_wind 0.1961 (0.1728)
## Fixed-Effects: -----------------
## year No
## month No
## cod_barrio No
## cod_barrio-month No
## cod_barrio-month-year No
## cod_barrio-month-year-weekday Yes
## _____________________________ _________________
## S.E.: Clustered by: cod.^mon.^year^wee.
## Observations 7,805
## Squared Cor. 0.22422
Lagged pois 2 uses the “training period”.
pdat <- panel(crime, ~cod_barrio+date, duplicate.method = "first")
lagged_pois_all <- feglm(crime ~ l(pm25, 0:7) + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = pdat, family=quasipoisson)
## NOTES: 17,877 observations removed because of NA values (LHS: 11,172, RHS: 17,877, Fixed-effects: 11,172).
## 781 fixed-effects (20,354 observations) removed because of only 0 outcomes.
lagged_pois_violent <- feglm(violence ~ l(pm25, 0:7) + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = pdat, family=quasipoisson)
## NOTES: 17,877 observations removed because of NA values (LHS: 11,172, RHS: 17,877, Fixed-effects: 11,172).
## 1,185 fixed-effects (32,962 observations) removed because of only 0 outcomes.
lagged_pois_property <- feglm(property ~ l(pm25, 0:7) + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month^year, data = pdat, family=quasipoisson)
## NOTES: 17,877 observations removed because of NA values (LHS: 11,172, RHS: 17,877, Fixed-effects: 11,172).
## 1,100 fixed-effects (29,951 observations) removed because of only 0 outcomes.
pdat2 <- panel(crime2, ~cod_barrio+date, duplicate.method = "first")
lagged_learningperiod <- feglm(crime ~ l(pm25, 0:7) + mean_temp + mean_hum + mean_prec + mean_wind | cod_barrio^month, data = pdat2, family=quasipoisson)
## NOTES: 3,330 observations removed because of NA values (LHS: 1,055, RHS: 3,330, Fixed-effects: 1,055).
## 307 fixed-effects (8,512 observations) removed because of only 0 outcomes.
etable(lagged_pois_all, lagged_pois_violent, lagged_pois_property, lagged_learningperiod)
## lagged_pois_all lagged_pois_viol.. lagged_pois_prop..
## Dependent Var.: crime violence property
##
## pm25 0.0330*** (0.0063) 0.0208*** (0.0061) 0.0423*** (0.0097)
## l(pm25,1) -0.0017 (0.0062) -0.0051 (0.0067) 0.0054 (0.0105)
## l(pm25,2) -0.0067 (0.0057) -0.0027 (0.0062) -0.0220 (0.0136)
## l(pm25,3) -0.0106* (0.0054) -0.0061 (0.0060) -0.0147 (0.0103)
## l(pm25,4) -0.0162** (0.0051) -0.0135* (0.0055) -0.0192** (0.0068)
## l(pm25,5) 0.0107** (0.0040) 0.0139** (0.0054) 0.0084 (0.0056)
## l(pm25,6) 0.0087 (0.0056) 0.0119* (0.0048) 0.0035 (0.0105)
## l(pm25,7) -0.0028 (0.0033) -0.0096. (0.0052) 0.0030 (0.0044)
## mean_temp 0.0576* (0.0279) 0.0153 (0.0375) 0.0599 (0.0477)
## mean_hum -0.0012 (0.0088) -0.0167* (0.0075) 0.0062 (0.0127)
## mean_prec 0.5559 (5.167) -2.808 (4.140) 0.2487 (7.044)
## mean_wind 0.0249 (0.1271) -0.1622 (0.1092) 0.0871 (0.1857)
## Fixed-Effects: ------------------ ------------------ ------------------
## cod_barrio-month-year Yes Yes Yes
## cod_barrio-month No No No
## _____________________ __________________ __________________ __________________
## S.E.: Clustered by: cod.^mon.^year by: cod.^mon.^year by: cod.^mon.^year
## Observations 36,876 24,268 27,279
## Squared Cor. 0.12927 0.07865 0.11406
## lagged_learningp..
## Dependent Var.: crime
##
## pm25 0.0102 (0.0081)
## l(pm25,1) 0.0019 (0.0073)
## l(pm25,2) -0.0153** (0.0058)
## l(pm25,3) 0.0029 (0.0062)
## l(pm25,4) -0.0101 (0.0064)
## l(pm25,5) 0.0074 (0.0056)
## l(pm25,6) 0.0094 (0.0075)
## l(pm25,7) -0.0152. (0.0078)
## mean_temp -0.0561 (0.0696)
## mean_hum -0.0315* (0.0146)
## mean_prec -32.75*** (9.503)
## mean_wind -0.0664 (0.1572)
## Fixed-Effects: ------------------
## cod_barrio-month-year No
## cod_barrio-month Yes
## _____________________ __________________
## S.E.: Clustered by: cod.^mon.
## Observations 10,161
## Squared Cor. 0.12036
coefplot(lagged_pois_all, drop = c("mean_temp", "mean_hum", "mean_prec", "mean_wind"))
first_stage <- feols(crime ~ mean_temp + mean_hum + mean_prec + mean_wind | month^year | pm25 ~ wind_dir, data = crime)
## NOTE: 15,443 observations removed because of NA values (LHS: 11,172, RHS: 15,443).
etable(first_stage, fitstat = ~ rmse + r2 + wald + wf + ivwald)
## first_stage
## Dependent Var.: crime
##
## pm25 0.0239. (0.0111)
## mean_temp -0.0045 (0.0075)
## mean_hum -0.0038* (0.0014)
## mean_prec 0.8374 (0.7476)
## mean_wind 0.0271 (0.0243)
## Fixed-Effects: -----------------
## month-year Yes
## ______________________ _________________
## S.E.: Clustered by: month^year
## RMSE 0.54931
## R2 -0.04286
## Wald (joint nullity) 6.1791
## F-test (projected) -514.85
## Wald (1st stage), pm25 1,960.0
fitstat(first_stage, ~ ivwald)
## Wald (1st stage), pm25: stat = 1,960.0, p < 2.2e-16, on 19 and 59,640 DoF, VCOV: Clustered (month^year).
ivreg <- ivreg(crime ~ pm25 + mean_temp + mean_hum + mean_prec + mean_wind + factor(year) + factor(month) | wind_dir + mean_temp + mean_hum + mean_prec + mean_wind, data = crime)
summary(ivreg)
##
## Call:
## ivreg(formula = crime ~ pm25 + mean_temp + mean_hum + mean_prec +
## mean_wind + factor(year) + factor(month) | wind_dir + mean_temp +
## mean_hum + mean_prec + mean_wind, data = crime)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.8288 -0.7430 0.7593 1.9316 27.0340
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.721227 84.788516 -0.044 0.965
## pm25 -0.001575 0.264886 -0.006 0.995
## mean_temp 0.094367 2.266191 0.042 0.967
## mean_hum 0.023010 0.278794 0.083 0.934
## mean_prec 7.899111 188.151659 0.042 0.967
## mean_wind -0.284337 14.059151 -0.020 0.984
## factor(year)2019 -0.974987 62.829535 -0.016 0.988
## factor(month)2 -0.868527 68.141569 -0.013 0.990
## factor(month)3 -12.837146 421.591624 -0.030 0.976
## factor(month)4 -7.730940 227.992562 -0.034 0.973
## factor(month)5 -0.010913 119.341982 0.000 1.000
## factor(month)6 0.040491 71.631395 0.001 1.000
## factor(month)7 0.982775 59.811068 0.016 0.987
## factor(month)8 12.298433 367.351318 0.033 0.973
## factor(month)9 -1.377287 74.408602 -0.019 0.985
## factor(month)10 1.641460 73.406332 0.022 0.982
## factor(month)11 17.943414 525.594183 0.034 0.973
## factor(month)12 1.576211 89.634675 0.018 0.986
##
## Residual standard error: 7.616 on 59646 degrees of freedom
## Multiple R-Squared: -199.4, Adjusted R-squared: -199.5
## Wald test: 0.04923 on 17 and 59646 DF, p-value: 1
m <- margins(ivreg)
summary(m)
## factor AME SE z p lower upper
## mean_hum 0.0230 0.2788 0.0825 0.9342 -0.5234 0.5694
## mean_prec 7.8991 188.1517 0.0420 0.9665 -360.8715 376.6698
## mean_temp 0.0944 2.2662 0.0416 0.9668 -4.3473 4.5360
## mean_wind -0.2843 14.0591 -0.0202 0.9839 -27.8398 27.2711
## month10 1.6415 73.4063 0.0224 0.9822 -142.2323 145.5152
## month11 17.9434 525.5941 0.0341 0.9728 -1012.2021 1048.0889
## month12 1.5762 89.6347 0.0176 0.9860 -174.1045 177.2569
## month2 -0.8685 68.1416 -0.0127 0.9898 -134.4235 132.6865
## month3 -12.8371 421.5917 -0.0304 0.9757 -839.1417 813.4674
## month4 -7.7309 227.9926 -0.0339 0.9729 -454.5881 439.1263
## month5 -0.0109 119.3420 -0.0001 0.9999 -233.9169 233.8951
## month6 0.0405 71.6314 0.0006 0.9995 -140.3545 140.4354
## month7 0.9828 59.8111 0.0164 0.9869 -116.2448 118.2103
## month8 12.2984 367.3513 0.0335 0.9733 -707.6969 732.2937
## month9 -1.3773 74.4086 -0.0185 0.9852 -147.2155 144.4609
## pm25 -0.0016 0.2649 -0.0059 0.9953 -0.5207 0.5176
## year2019 -0.9750 62.8295 -0.0155 0.9876 -124.1186 122.1686
plot(m)