Average arrests rates will always be in question as politics sometimes lie to the public about certain statistics in order to be elected for a certain position. This analysis is concerned with the question around whether the percentage of white people in a state affect the average arrests rate in that state. A multilevel method will be utilized to further analyze the relationship between race and arrests rates and possible variations amongst all 50 states, who have a large population of white race or other race. The analysis will begin with an ecological analysis, complete pooling model, a no pool modeling, and conclude with multilevel modeling to ultimately determine the best fit.
To conduct this analysis, the primary data on crime average arrests rate and the 2014 race disparity in each state was extracted from Social Explorer. The two datasets were merged to create a dataset called Crimedatabyrace. The variables used in analysis are statename, county, Arrestsrate_per100k, white, black, native, asian, pacifisland, and mixed race.
library(nlme)
library(dplyr)
library(magrittr)
library(tidyr)
library(haven)
library(lmerTest)
library(ggplot2)
library(texreg)
library(stringr)
Populationbyrace <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /population_race_2014.csv")
Crimedata <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /crimedata.csv")
crimedatabyrace <- merge(Populationbyrace, Crimedata,
by.x = "Geo_NAME",
by.y = "Geo_Name")
#Isolate the State Name from teh Geo_QName variable
X<-data.frame(str_locate(crimedatabyrace$Geo_QName,"County,"))
X2<-X%>%select(end)
crimedatabyrace$loc <- X2$end
crimedatabyrace1<-crimedatabyrace%>%
mutate(statename = substr(Geo_QName, loc+1,length(Geo_QName)))
head(crimedatabyrace1)
crimedatabyrace1 <- rename(crimedatabyrace1, county = Geo_NAME, Arrestsrate_per100k = SE_T011_001,
white = SE_T020_002,
black = SE_T020_003,
native = SE_T020_004,
asian = SE_T020_005,
pacifisland= SE_T020_006,
mixed = SE_T020_007)%>%
select(statename, county, Arrestsrate_per100k, white, black, native, asian, pacifisland, mixed)
head(crimedatabyrace1)
print(crimedatabyrace1)
crimedatabyrace1<-na.omit(crimedatabyrace1)
head(crimedatabyrace1)
Looking at the Means
###Means, color = red
####Looking at aaverages for each state and race
crimedatabyrace1%>%
group_by(statename)%>%
summarize(Arrestsrate_per100k = mean(Arrestsrate_per100k), white = mean(white), black = mean(black))
statenames <- crimedatabyrace1 %>%
group_by(statename) %>%
summarise(Arrestsrate_per100k = mean(Arrestsrate_per100k, na.rm = TRUE), white = mean(log(white), na.rm = TRUE))
head(statenames)
ecoreg <- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(ecoreg)
Call:
lm(formula = Arrestsrate_per100k ~ white, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3198.4 -1177.1 -262.8 947.7 16228.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.634e+01 193.656 <2e-16 ***
white 4.538e-05 8.838e-05 0.513 0.608
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13251 degrees of freedom
Multiple R-squared: 1.99e-05, Adjusted R-squared: -5.557e-05
F-statistic: 0.2637 on 1 and 13251 DF, p-value: 0.6076
ggplot(data=crimedatabyrace1, aes(x=statename, y=Arrestsrate_per100k))+
geom_col(color ="red", fill = "black")+coord_flip()
cpooling1<- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(cpooling1)
Call:
lm(formula = Arrestsrate_per100k ~ white, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3198.4 -1177.1 -262.8 947.7 16228.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.634e+01 193.656 <2e-16 ***
white 4.538e-05 8.838e-05 0.513 0.608
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13251 degrees of freedom
Multiple R-squared: 1.99e-05, Adjusted R-squared: -5.557e-05
F-statistic: 0.2637 on 1 and 13251 DF, p-value: 0.6076
ARwhite<- lm(Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
summary(ARwhite)
Call:
lm(formula = Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3196.6 -1177.2 -263.1 947.8 16228.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.638e+01 193.170 <2e-16 ***
white 5.673e-05 1.242e-04 0.457 0.648
black -5.484e-05 4.216e-04 -0.130 0.897
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13250 degrees of freedom
Multiple R-squared: 2.117e-05, Adjusted R-squared: -0.0001298
F-statistic: 0.1403 on 2 and 13250 DF, p-value: 0.8691
When considering the intercept model, which was an indication of whites affect on average arrests rate we see that most intercepts range between 1000 and 3500 where there is less white people there seems to be higher average arrests rates.
dcoef <- crimedatabyrace1 %>%
group_by(statename) %>%
do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
Our slope models show a consistent positive impact, suggesting that with more white people in a state the average arrests rate continues to decrease.
dcoef <- crimedatabyrace1 %>%
group_by(statename) %>%
do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(Arrestsrate_per100k = coef(.$mod)[2]))
ggplot(coef, aes(x = Arrestsrate_per100k)) + geom_histogram()
m1_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~1|statename, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
Data: crimedatabyrace1
AIC BIC logLik
231552.4 231582.3 -115772.2
Random effects:
Formula: ~1 | statename
(Intercept) Residual
StdDev: 908.1807 1494.13
Fixed effects: Arrestsrate_per100k ~ white
Value Std.Error DF t-value p-value
(Intercept) 3245.785 136.54066 13206 23.771565 0.0000
white 0.000 0.00008 13206 -0.279782 0.7796
Correlation:
(Intr)
white -0.052
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2451978 -0.6331617 -0.1160736 0.5145651 10.6186223
Number of Observations: 13253
Number of Groups: 46
m2_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~ white|statename, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
Data: crimedatabyrace1
AIC BIC logLik
231548 231592.9 -115768
Random effects:
Formula: ~white | statename
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 9.101732e+02 (Intr)
white 4.278992e-04 0.016
Residual 1.492528e+03
Fixed effects: Arrestsrate_per100k ~ white
Value Std.Error DF t-value p-value
(Intercept) 3244.624 136.91245 13206 23.698533 0.0000
white 0.000 0.00011 13206 0.398053 0.6906
Correlation:
(Intr)
white -0.04
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2501799 -0.6329070 -0.1188938 0.5100618 10.6371287
Number of Observations: 13253
Number of Groups: 46
When account for the variation between states we continue to see that there is no statistically significant relationship between the race in a state and the average arrests rate in a state.
The model maintains its significance in regards to whether race in a state has an affect on average arrests rates. We do however see an increase in average arrests rate when considering the amount of blacks in a state.
m3_lme <- lme(Arrestsrate_per100k ~ white + black, data = crimedatabyrace1, random = ~ white|statename, method = "ML")
summary(m3_lme)
Linear mixed-effects model fit by maximum likelihood
Data: crimedatabyrace1
AIC BIC logLik
231549.7 231602.1 -115767.8
Random effects:
Formula: ~white | statename
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 9.105658e+02 (Intr)
white 4.388536e-04 0.007
Residual 1.492466e+03
Fixed effects: Arrestsrate_per100k ~ white + black
Value Std.Error DF t-value p-value
(Intercept) 3243.489 136.99458 13205 23.676037 0.0000
white 0.000 0.00014 13205 0.664813 0.5062
black 0.000 0.00038 13205 -0.589313 0.5557
Correlation:
(Intr) white
white -0.045
black 0.015 -0.581
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2500378 -0.6334179 -0.1185085 0.5120136 10.6384636
Number of Observations: 13253
Number of Groups: 46
htmlreg(list(AR, m1_lme, m2_lme, m3_lme))
| Model 1 | Model 2 | Model 3 | Model 4 | ||
|---|---|---|---|---|---|
| (Intercept) | 3163.41*** | 3245.79*** | 3244.62*** | 3243.49*** | |
| (16.34) | (136.54) | (136.91) | (136.99) | ||
| white | 0.00 | -0.00 | 0.00 | 0.00 | |
| (0.00) | (0.00) | (0.00) | (0.00) | ||
| black | -0.00 | ||||
| (0.00) | |||||
| R2 | 0.00 | ||||
| Adj. R2 | -0.00 | ||||
| Num. obs. | 13253 | 13253 | 13253 | 13253 | |
| RMSE | 1744.39 | ||||
| AIC | 231552.35 | 231548.00 | 231549.66 | ||
| BIC | 231582.32 | 231592.95 | 231602.10 | ||
| Log Likelihood | -115772.18 | -115768.00 | -115767.83 | ||
| Num. groups | 46 | 46 | 46 | ||
| p < 0.001, p < 0.01, p < 0.05 | |||||
Conclusion: As we can see the lowest AIC means best fit and in this report that is the random intercept model 3. In this report we see that race within states has very little effect on the average arrests rates per 100k population. However in the final model when we now took a look at the amount of blacks in a state we did see an increase in average arrests rates in that state. Through further analysis one will be able to measure average arrests rate by sex and other races.