Introduction

Average arrests rates will always be in question as politics sometimes lie to the public about certain statistics in order to be elected for a certain position. This analysis is concerned with the question around whether the percentage of white people in a state affect the average arrests rate in that state. A multilevel method will be utilized to further analyze the relationship between race and arrests rates and possible variations amongst all 50 states, who have a large population of white race or other race. The analysis will begin with an ecological analysis, complete pooling model, a no pool modeling, and conclude with multilevel modeling to ultimately determine the best fit.

Method

To conduct this analysis, the primary data on crime average arrests rate and the 2014 race disparity in each state was extracted from Social Explorer. The two datasets were merged to create a dataset called Crimedatabyrace. The variables used in analysis are statename, county, Arrestsrate_per100k, white, black, native, asian, pacifisland, and mixed race.

library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(stringr)

Merging the Datasets

Populationbyrace <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /population_race_2014.csv")
Crimedata <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /crimedata.csv")
crimedatabyrace <- merge(Populationbyrace, Crimedata, 
                         by.x = "Geo_NAME",
                         by.y = "Geo_Name")
#Isolate the State Name from teh Geo_QName variable
X<-data.frame(str_locate(crimedatabyrace$Geo_QName,"County,"))
X2<-X%>%select(end)
crimedatabyrace$loc <- X2$end
crimedatabyrace1<-crimedatabyrace%>%
  mutate(statename = substr(Geo_QName, loc+1,length(Geo_QName)))
head(crimedatabyrace1)

Manipulating The Data

crimedatabyrace1 <- rename(crimedatabyrace1, county = Geo_NAME, Arrestsrate_per100k = SE_T011_001,
           white = SE_T020_002,
           black = SE_T020_003,
           native = SE_T020_004,
           asian = SE_T020_005,
           pacifisland= SE_T020_006,
           mixed = SE_T020_007)%>%
  select(statename, county, Arrestsrate_per100k, white, black, native, asian, pacifisland, mixed)
  head(crimedatabyrace1)
print(crimedatabyrace1)
crimedatabyrace1<-na.omit(crimedatabyrace1)
head(crimedatabyrace1)

Looking at the Means

###Means, color = red
####Looking at aaverages for each state and race
crimedatabyrace1%>%
  group_by(statename)%>%
  summarize(Arrestsrate_per100k = mean(Arrestsrate_per100k), white = mean(white), black = mean(black))

State Level Analysis Ecological Model

statenames <- crimedatabyrace1 %>% 
  group_by(statename) %>% 
  summarise(Arrestsrate_per100k = mean(Arrestsrate_per100k, na.rm = TRUE), white = mean(log(white), na.rm = TRUE))
head(statenames)

The ecological Analysis

ecoreg <- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(ecoreg)

Call:
lm(formula = Arrestsrate_per100k ~ white, data = crimedatabyrace1)

Residuals:
    Min      1Q  Median      3Q     Max 
-3198.4 -1177.1  -262.8   947.7 16228.0 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.163e+03  1.634e+01 193.656   <2e-16 ***
white       4.538e-05  8.838e-05   0.513    0.608    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1744 on 13251 degrees of freedom
Multiple R-squared:  1.99e-05,  Adjusted R-squared:  -5.557e-05 
F-statistic: 0.2637 on 1 and 13251 DF,  p-value: 0.6076
ggplot(data=crimedatabyrace1, aes(x=statename, y=Arrestsrate_per100k))+
  geom_col(color ="red", fill = "black")+coord_flip()

Regression Model Complete Pooling

cpooling1<- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(cpooling1)

Call:
lm(formula = Arrestsrate_per100k ~ white, data = crimedatabyrace1)

Residuals:
    Min      1Q  Median      3Q     Max 
-3198.4 -1177.1  -262.8   947.7 16228.0 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.163e+03  1.634e+01 193.656   <2e-16 ***
white       4.538e-05  8.838e-05   0.513    0.608    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1744 on 13251 degrees of freedom
Multiple R-squared:  1.99e-05,  Adjusted R-squared:  -5.557e-05 
F-statistic: 0.2637 on 1 and 13251 DF,  p-value: 0.6076
ARwhite<- lm(Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
summary(ARwhite)

Call:
lm(formula = Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)

Residuals:
    Min      1Q  Median      3Q     Max 
-3196.6 -1177.2  -263.1   947.8 16228.3 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.163e+03  1.638e+01 193.170   <2e-16 ***
white        5.673e-05  1.242e-04   0.457    0.648    
black       -5.484e-05  4.216e-04  -0.130    0.897    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1744 on 13250 degrees of freedom
Multiple R-squared:  2.117e-05, Adjusted R-squared:  -0.0001298 
F-statistic: 0.1403 on 2 and 13250 DF,  p-value: 0.8691

Intercept

When considering the intercept model, which was an indication of whites affect on average arrests rate we see that most intercepts range between 1000 and 3500 where there is less white people there seems to be higher average arrests rates.

dcoef <- crimedatabyrace1 %>% 
    group_by(statename) %>% 
    do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

Slope

Our slope models show a consistent positive impact, suggesting that with more white people in a state the average arrests rate continues to decrease.

dcoef <- crimedatabyrace1 %>% 
    group_by(statename) %>% 
    do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(Arrestsrate_per100k = coef(.$mod)[2]))
ggplot(coef, aes(x = Arrestsrate_per100k)) + geom_histogram()

Race and Average arrests rate by state - A Random Intercept Model

m1_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~1|statename, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: crimedatabyrace1 
       AIC      BIC    logLik
  231552.4 231582.3 -115772.2

Random effects:
 Formula: ~1 | statename
        (Intercept) Residual
StdDev:    908.1807  1494.13

Fixed effects: Arrestsrate_per100k ~ white 
               Value Std.Error    DF   t-value p-value
(Intercept) 3245.785 136.54066 13206 23.771565  0.0000
white          0.000   0.00008 13206 -0.279782  0.7796
 Correlation: 
      (Intr)
white -0.052

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.2451978 -0.6331617 -0.1160736  0.5145651 10.6186223 

Number of Observations: 13253
Number of Groups: 46 

Race and Average Arrests Rates by State - A Random Slope Model

m2_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~ white|statename, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: crimedatabyrace1 
     AIC      BIC  logLik
  231548 231592.9 -115768

Random effects:
 Formula: ~white | statename
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 9.101732e+02 (Intr)
white       4.278992e-04 0.016 
Residual    1.492528e+03       

Fixed effects: Arrestsrate_per100k ~ white 
               Value Std.Error    DF   t-value p-value
(Intercept) 3244.624 136.91245 13206 23.698533  0.0000
white          0.000   0.00011 13206  0.398053  0.6906
 Correlation: 
      (Intr)
white -0.04 

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.2501799 -0.6329070 -0.1188938  0.5100618 10.6371287 

Number of Observations: 13253
Number of Groups: 46 

Models Summary

When account for the variation between states we continue to see that there is no statistically significant relationship between the race in a state and the average arrests rate in a state.

Introducing Covariate - The Effect of Another Race

The model maintains its significance in regards to whether race in a state has an affect on average arrests rates. We do however see an increase in average arrests rate when considering the amount of blacks in a state.

m3_lme <- lme(Arrestsrate_per100k ~ white + black, data = crimedatabyrace1, random = ~ white|statename, method = "ML")
summary(m3_lme)
Linear mixed-effects model fit by maximum likelihood
 Data: crimedatabyrace1 
       AIC      BIC    logLik
  231549.7 231602.1 -115767.8

Random effects:
 Formula: ~white | statename
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev       Corr  
(Intercept) 9.105658e+02 (Intr)
white       4.388536e-04 0.007 
Residual    1.492466e+03       

Fixed effects: Arrestsrate_per100k ~ white + black 
               Value Std.Error    DF   t-value p-value
(Intercept) 3243.489 136.99458 13205 23.676037  0.0000
white          0.000   0.00014 13205  0.664813  0.5062
black          0.000   0.00038 13205 -0.589313  0.5557
 Correlation: 
      (Intr) white 
white -0.045       
black  0.015 -0.581

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-3.2500378 -0.6334179 -0.1185085  0.5120136 10.6384636 

Number of Observations: 13253
Number of Groups: 46 
htmlreg(list(AR, m1_lme, m2_lme, m3_lme))
Statistical models
Model 1 Model 2 Model 3 Model 4
(Intercept) 3163.41*** 3245.79*** 3244.62*** 3243.49***
(16.34) (136.54) (136.91) (136.99)
white 0.00 -0.00 0.00 0.00
(0.00) (0.00) (0.00) (0.00)
black -0.00
(0.00)
R2 0.00
Adj. R2 -0.00
Num. obs. 13253 13253 13253 13253
RMSE 1744.39
AIC 231552.35 231548.00 231549.66
BIC 231582.32 231592.95 231602.10
Log Likelihood -115772.18 -115768.00 -115767.83
Num. groups 46 46 46
p < 0.001, p < 0.01, p < 0.05

Conclusion: As we can see the lowest AIC means best fit and in this report that is the random intercept model 3. In this report we see that race within states has very little effect on the average arrests rates per 100k population. However in the final model when we now took a look at the amount of blacks in a state we did see an increase in average arrests rates in that state. Through further analysis one will be able to measure average arrests rate by sex and other races.

