Discussion

Author

Zixuan Yin

1. Setup

# Clear the workspace
rm(list = ls()) 
gc() 
          used (Mb) gc trigger (Mb) max used (Mb)
Ncells  592897 31.7    1350778 72.2   686425 36.7
Vcells 1079950  8.3    8388608 64.0  1875867 14.4
cat("\f")
graphics.off()

packages <- c("AER",          # load dataset
              "plm",          # for FE
              "stargazer",    # summary stats for sharing
              "ggplot2",      #for graphing,
              "psych",
              "naniar",      # for visualisation of missing data,
              "visdat",      # for visualisation of missing data,
              "VIM",         # for visualisation of missing data,
              "DataExplorer", # for visualisation of missing data,
              "dplyr",
              "magrittr",
              "pisaRT"
              )

for (i in 1:length(packages)) {
  if (!packages[i] %in% rownames(installed.packages())) {
    install.packages(packages[i]
                     , repos = "http://cran.rstudio.com/"
                     , dependencies = TRUE
                     )
  }
  library(packages[i], character.only = TRUE)
}
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 

Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':

    %+%, alpha
The following object is masked from 'package:car':

    logit
Loading required package: colorspace
Loading required package: grid
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

Attaching package: 'VIM'
The following object is masked from 'package:datasets':

    sleep

Attaching package: 'dplyr'
The following objects are masked from 'package:plm':

    between, lag, lead
The following object is masked from 'package:car':

    recode
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
rm(packages)

2. Data

2.1 Load Data

Crime Data for 90 Observation Units (Counties) in North Carolina, from 1981 to 1987. The data frame containing 630 observations on

data("Crime")
?Crime
starting httpd help server ... done

2.2 Summary Statistics

stargazer (Crime, 
          type = "text", # html, latex
          # out =
          # summary.stat = 
          # covariate.labels = labels,
          digits = 1)

==========================================
Statistic  N  Mean  St. Dev.  Min    Max  
------------------------------------------
county    630 100.6   58.0     1     197  
year      630 84.0    2.0     81     87   
crmrte    630 0.03    0.02   0.002   0.2  
prbarr    630  0.3    0.2     0.1    2.8  
prbconv   630  0.7    1.7     0.1   37.0  
prbpris   630  0.4    0.1     0.1    0.7  
avgsen    630  9.0    2.7     4.2   25.8  
polpc     630 0.002  0.003   0.000  0.04  
density   630  1.4    1.4     0.2    8.8  
taxpc     630 30.2    11.5   14.3   119.8 
pctmin    630 25.7    16.9    1.3   64.3  
wcon      630 245.7  122.0   65.6  2,324.6
wtuc      630 406.1  266.5   28.9  3,042.0
wtrd      630 192.8   88.4   16.9  2,242.7
wfir      630 272.1   55.8    3.5   509.5 
wser      630 224.7  104.9    1.8  2,177.1
wmfg      630 285.2   82.4   101.8  646.9 
wfed      630 403.9   63.1   255.4  598.0 
wsta      630 296.9   53.4   173.0  548.0 
wloc      630 258.0   41.4   163.6  388.1 
mix       630  0.1    0.2    0.002   4.0  
pctymle   630  0.1    0.02    0.1    0.3  
lcrmrte   630 -3.6    0.6    -6.3   -1.8  
lprbarr   630 -1.3    0.4    -2.8    1.0  
lprbconv  630 -0.7    0.6    -2.7    3.6  
lprbpris  630 -0.9    0.2    -1.9   -0.4  
lavgsen   630  2.2    0.3     1.4    3.3  
lpolpc    630 -6.5    0.5    -7.7   -3.3  
ldensity  630 -0.02   0.8    -1.6    2.2  
lwcon     630  5.5    0.2     4.2    7.8  
lwtuc     630  5.9    0.4     3.4    8.0  
lwtrd     630  5.2    0.2     2.8    7.7  
lwfir     630  5.6    0.3     1.3    6.2  
lwser     630  5.4    0.4     0.6    7.7  
lwmfg     630  5.6    0.3     4.6    6.5  
lwfed     630  6.0    0.2     5.5    6.4  
lwsta     630  5.7    0.2     5.2    6.3  
lwloc     630  5.5    0.2     5.1    6.0  
lpctymle  630 -2.4    0.2    -2.8   -1.3  
lpctmin   630  2.9    1.0     0.2    4.2  
ltaxpc    630  3.4    0.3     2.7    4.8  
lmix      630 -2.2    0.6    -6.0    1.4  
------------------------------------------

2.3 Balanced Panel

counts <- Crime %>%
  group_by(county) %>%
  summarise(n_time = n_distinct(year))

is_balanced <- all(counts$n_time == counts$n_time[1])

if (is_balanced) {
  cat("Balanced")
} else {
  cat("Unbalanced")
}
Balanced

3. Regression

I would expect a higher probability of arrest linked with less committed per person

3.1 Without FE

\[ lCrmrte_{it} = \beta_0 + \beta_1 lPrbarr _{it} \]

fatal_lm_mod <- lm(data   = Crime,
                  formula = lcrmrte ~ lprbarr
 , 
                     )

summary(fatal_lm_mod)

Call:
lm(formula = lcrmrte ~ lprbarr, data = Crime)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.87702 -0.31597  0.05811  0.35574  1.33322 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -4.39869    0.06579  -66.86   <2e-16 ***
lprbarr     -0.61954    0.04909  -12.62   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.512 on 628 degrees of freedom
Multiple R-squared:  0.2023,    Adjusted R-squared:  0.2011 
F-statistic: 159.3 on 1 and 628 DF,  p-value: < 2.2e-16

According to the result of OLS, if the probability of arrest increases by 1%, then the number of crimes committed per person will decrease by 0.62%. It is statistically significant and matches the previous expectation.

Although the regression results are as expected, variables may still be omitted. Some omitted variables vary across the county:

Income Level. Variations in average income levels across regions can influence crime rates, with lower-income areas typically experiencing higher crime rates.

Education Levels. Differences in educational attainment and resources across regions may affect the likelihood of youth engaging in criminal behavior. Areas with fewer educational opportunities often have higher crime rates.

Legal and Policy Environment. Variations in the severity of legal punishments for certain crimes may affect the probability of crime.

Other omitted variables that vary over time and thus cause a bias:

3.2 FE Model ( Dummy Variables )

\[ Crmrte_{it} = \beta_0 + \beta_1 Prbarr _{it} + \delta_1 County1_i +\delta_2 County2_i+...+\delta_n Countyn_i + u_{it} \]

fatal_fe_lm_mod   <- lm( data = Crime,
                      formula = lcrmrte ~ lprbarr + factor(county), 
                     )

# summary(fatal_fe_lm_mod)

stargazer(fatal_fe_lm_mod,
          type="text", 
          omit = "factor\\(county\\)"
          )

===============================================
                        Dependent variable:    
                    ---------------------------
                              lcrmrte          
-----------------------------------------------
lprbarr                      -0.092***         
                              (0.033)          
                                               
Constant                     -3.438***         
                              (0.078)          
                                               
-----------------------------------------------
Observations                    630            
R2                             0.914           
Adjusted R2                    0.900           
Residual Std. Error      0.181 (df = 539)      
F Statistic          63.715*** (df = 90; 539)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

3.3 FE Model ( Demeaned Regression )

\[ Crmrte_{it} - \bar Crmrte_i = \beta_1( Prbarr _{it}-\bar Prbarr_i) + (u_{it}-\bar u_i) \]

Crime_demeaned <- with(data = Crime,
                            expr = data.frame(lcrmrte = lcrmrte - ave(x = lcrmrte, county), 
                                              lprbarr = lprbarr - ave(x = lprbarr, county)      ))
          


# estimate the regression

fatal_demeaned_lm_mod <-
lm(data    = Crime_demeaned,
           formula = lcrmrte ~ lprbarr - 1 , 
           )

summary(fatal_demeaned_lm_mod)

Call:
lm(formula = lcrmrte ~ lprbarr - 1, data = Crime_demeaned)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.83748 -0.08467 -0.00421  0.08723  1.00171 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)   
lprbarr -0.09171    0.03028  -3.029  0.00255 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1679 on 629 degrees of freedom
Multiple R-squared:  0.01438,   Adjusted R-squared:  0.01281 
F-statistic: 9.174 on 1 and 629 DF,  p-value: 0.002555

3.4 FE Model ( Panel linear FE )

fatal_fe_mod <- plm(formula = lcrmrte ~ lprbarr, 
                    data    = Crime,
                    index   = c("county", "year"), # declaring data to be panel
                    model   = "within"            # fixed effects mdodel
                    )

summary(fatal_fe_mod )
Oneway (individual) effect Within Model

Call:
plm(formula = lcrmrte ~ lprbarr, data = Crime, model = "within", 
    index = c("county", "year"))

Balanced Panel: n = 90, T = 7, N = 630

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.8374825 -0.0846744 -0.0042123  0.0872307  1.0017060 

Coefficients:
         Estimate Std. Error t-value Pr(>|t|)   
lprbarr -0.091714   0.032710 -2.8039 0.005231 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    17.991
Residual Sum of Squares: 17.732
R-Squared:      0.014376
Adj. R-Squared: -0.1502
F-statistic: 7.86164 on 1 and 539 DF, p-value: 0.0052315

3.5 Regression Result of FE Model

stargazer(#fatal_lm_mod,                                            # OLS
          fatal_fe_lm_mod, fatal_demeaned_lm_mod,  fatal_fe_mod,    # fixed effect
                  type  = "text",
          column.labels = c(#"OLS - no Dummy",
                            "OLS - State Dummy",
                            "Demeaned Reg",
                            "FE"
                            ),
          omit = "factor\\(county\\)"
          )

==========================================================================================
                                             Dependent variable:                          
                    ----------------------------------------------------------------------
                                                   lcrmrte                                
                                          OLS                               panel         
                                                                            linear        
                       OLS - State Dummy          Demeaned Reg                FE          
                              (1)                     (2)                    (3)          
------------------------------------------------------------------------------------------
lprbarr                    -0.092***               -0.092***              -0.092***       
                            (0.033)                 (0.030)                (0.033)        
                                                                                          
Constant                   -3.438***                                                      
                            (0.078)                                                       
                                                                                          
------------------------------------------------------------------------------------------
Observations                  630                     630                    630          
R2                           0.914                   0.014                  0.014         
Adjusted R2                  0.900                   0.013                  -0.150        
Residual Std. Error     0.181 (df = 539)        0.168 (df = 629)                          
F Statistic         63.715*** (df = 90; 539) 9.174*** (df = 1; 629) 7.862*** (df = 1; 539)
==========================================================================================
Note:                                                          *p<0.1; **p<0.05; ***p<0.01

Comparing the OLS coefficients, the coefficient for the FE Model decreases from -0.61954 to -0.092, which means the relationship between the probability of arrest and Crime is still negative but weakened. On the other hand, it also indicates that the OLS coefficients receive some bias from omitted variables and the fixed effect model controls for some biases that will not be observed.

3.6 One way and Two Way FE Implementation ( county and year )

fatal_fe_lm_mod2 <- lm(data    = Crime,
                       formula = lcrmrte ~ lprbarr + as.factor(county) + as.factor(year), 
                     )

fatal_fe_lm_mod_time   <- lm( data = Crime,
                      formula = lcrmrte ~ lprbarr + as.factor(year), )


# estimate the fixed effects regression with plm()
fatal_fe_mod2 <- plm(formula = lcrmrte ~ lprbarr + as.factor(year), 
                     data    = Crime,
                     index   = c("county", "year"), # declaring data to be panel
                     model   = "within"            # fixed effects mdodel
                     )
fatal_fe_mod_time <- plm(formula = lcrmrte ~ lprbarr + as.factor(year), 
                     data    = Crime,
                     index   = c("year"), # declaring data to be panel
                     model   = "within"            # fixed effects mdodel
                     )

stargazer(fatal_fe_lm_mod,# County - One Way OLS (FE)
          fatal_fe_mod_time, # County - One Way OLS (FE)
          fatal_fe_lm_mod2, # County - Time (Two Way) OLS
          fatal_fe_mod2,    # County - Time (Two Way) FE
                  type  = "text",
          column.labels = c("FE - One way (county)",
                            "FE - One way (time)",
                            "FE - Two way (county-time) OLS",
                            "FE - Two way (county-time) PLM"
                            ),
         add.lines = list(
              c('County Fixed effects', "Yes", "No", "Yes", "Yes"),
              c('Time Fixed effects', "No", "Yes", "Yes", "Yes")
          ),
          keep = c("lprbarr")
)

====================================================================================================================================
                                                                   Dependent variable:                                              
                     ---------------------------------------------------------------------------------------------------------------
                                                                         lcrmrte                                                    
                               OLS                     panel                        OLS                           panel             
                                                       linear                                                     linear            
                      FE - One way (county)     FE - One way (time)    FE - Two way (county-time) OLS FE - Two way (county-time) PLM
                               (1)                      (2)                         (3)                            (4)              
------------------------------------------------------------------------------------------------------------------------------------
lprbarr                     -0.092***                -0.617***                    -0.078**                       -0.078**           
                             (0.033)                  (0.049)                     (0.031)                        (0.031)            
                                                                                                                                    
------------------------------------------------------------------------------------------------------------------------------------
County Fixed effects           Yes                       No                         Yes                            Yes              
Time Fixed effects              No                      Yes                         Yes                            Yes              
Observations                   630                      630                         630                            630              
R2                            0.914                    0.202                       0.923                          0.114             
Adjusted R2                   0.900                    0.193                       0.909                          -0.046            
Residual Std. Error      0.181 (df = 539)                                     0.173 (df = 533)                                      
F Statistic          63.715*** (df = 90; 539) 157.559*** (df = 1; 622)    66.338*** (df = 96; 533)        9.803*** (df = 7; 533)    
====================================================================================================================================
Note:                                                                                                    *p<0.1; **p<0.05; ***p<0.01

In a fixed effects model, the fixed effects control for specific characteristics based on the specification used. For my model, it is mainly controlled by time-invariant characteristics.

Compared to the OLS coefficients, the coefficients from the individual county-fixed effects model show a significant decrease. Additionally, the R² of the county-fixed effects model reaches 0.914, which is much higher than that of the time-fixed effects model. The R² of the model with both time and entity fixed effects only improves by 0.009 compared to the county-fixed effects model, indicating that the county-fixed effects control for a significant portion of the bias.

The coefficient of the FE model depends on how the fixed effects are defined and what they control for. As shown in section 3.6, the one-way and two-way fixed effects models have different coefficients. Because the one-way fixed effect only controls for either time or county, while the two-way fixed effect controls for both. If you use different methods to fix the same conditions, such as demeaned regression and county-fixed effect model, the coefficients will be the same.