In this report, we will investigate whether or not the average amount of white population in a state has an effect on the average arrests rate in that state. In order to do so we will merge two datasets, one from social explorer on crime data and the other on the 2014 white population rate in each state.
library(nlme)
library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:nlme’:
collapse
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(magrittr)
library(tidyr)
Attaching package: ‘tidyr’
The following object is masked from ‘package:magrittr’:
extract
library(haven)
library(lmerTest)
Loading required package: Matrix
Attaching package: ‘Matrix’
The following object is masked from ‘package:tidyr’:
expand
Loading required package: lme4
Attaching package: ‘lme4’
The following object is masked from ‘package:nlme’:
lmList
Attaching package: ‘lmerTest’
The following object is masked from ‘package:lme4’:
lmer
The following object is masked from ‘package:stats’:
step
library(ggplot2)
library(texreg)
Version: 1.36.23
Date: 2017-03-03
Author: Philip Leifeld (University of Glasgow)
Please cite the JSS article in your publications -- see citation("texreg").
Attaching package: ‘texreg’
The following object is masked from ‘package:tidyr’:
extract
The following object is masked from ‘package:magrittr’:
extract
library(stringr)
Populationbyrace <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /population_race_2014.csv")
Crimedata <- read.csv("/Users/robertperez/Documents/Rstudio DataSets /crimedata.csv")
crimedatabyrace <- merge(Populationbyrace, Crimedata,
by.x = "Geo_NAME",
by.y = "Geo_Name")
#Isolate the State Name from teh Geo_QName variable
X<-data.frame(str_locate(crimedatabyrace$Geo_QName,"County,"))
X2<-X%>%select(end)
crimedatabyrace$loc <- X2$end
crimedatabyrace1<-crimedatabyrace%>%
mutate(statename = substr(Geo_QName, loc+1,length(Geo_QName)))
head(crimedatabyrace1)
crimedatabyrace1 <- rename(crimedatabyrace1, county = Geo_NAME, Arrestsrate_per100k = SE_T011_001,
white = SE_T020_002,
black = SE_T020_003,
native = SE_T020_004,
asian = SE_T020_005,
pacifisland= SE_T020_006,
mixed = SE_T020_007)%>%
select(statename, county, Arrestsrate_per100k, white, black, native, asian, pacifisland, mixed)
head(crimedatabyrace1)
print(crimedatabyrace1)
crimedatabyrace1<-na.omit(crimedatabyrace1)
head(crimedatabyrace1)
Looking at the means
###Means, color = red
####Looking at aaverages for each state and race
crimedatabyrace1%>%
group_by(statename)%>%
summarize(Arrestsrate_per100k = mean(Arrestsrate_per100k), white = mean(white), black = mean(black))
statenames <- crimedatabyrace1 %>%
group_by(statename) %>%
summarise(Arrestsrate_per100k = mean(Arrestsrate_per100k, na.rm = TRUE), white = mean(log(white), na.rm = TRUE))
head(statenames)
ecoreg <- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(ecoreg)
Call:
lm(formula = Arrestsrate_per100k ~ white, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3198.4 -1177.1 -262.8 947.7 16228.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.634e+01 193.656 <2e-16 ***
white 4.538e-05 8.838e-05 0.513 0.608
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13251 degrees of freedom
Multiple R-squared: 1.99e-05, Adjusted R-squared: -5.557e-05
F-statistic: 0.2637 on 1 and 13251 DF, p-value: 0.6076
ggplot(data=crimedatabyrace1, aes(x=statename, y=Arrestsrate_per100k))+
geom_col(color ="red", fill = "black")+coord_flip()
AR<- lm(Arrestsrate_per100k ~ white, data = crimedatabyrace1)
summary(ARwhite)
Call:
lm(formula = Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3196.6 -1177.2 -263.1 947.8 16228.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.638e+01 193.170 <2e-16 ***
white 5.673e-05 1.242e-04 0.457 0.648
black -5.484e-05 4.216e-04 -0.130 0.897
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13250 degrees of freedom
Multiple R-squared: 2.117e-05, Adjusted R-squared: -0.0001298
F-statistic: 0.1403 on 2 and 13250 DF, p-value: 0.8691
ARwhite<- lm(Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
summary(ARwhite)
Call:
lm(formula = Arrestsrate_per100k ~ white + black, data = crimedatabyrace1)
Residuals:
Min 1Q Median 3Q Max
-3196.6 -1177.2 -263.1 947.8 16228.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.163e+03 1.638e+01 193.170 <2e-16 ***
white 5.673e-05 1.242e-04 0.457 0.648
black -5.484e-05 4.216e-04 -0.130 0.897
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1744 on 13250 degrees of freedom
Multiple R-squared: 2.117e-05, Adjusted R-squared: -0.0001298
F-statistic: 0.1403 on 2 and 13250 DF, p-value: 0.8691
dcoef <- crimedatabyrace1 %>%
group_by(statename) %>%
do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
dcoef <- crimedatabyrace1 %>%
group_by(statename) %>%
do(mod = lm(Arrestsrate_per100k ~ white, data = .))
coef <- dcoef %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram()
m1_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~1|statename, method = "ML")
summary(m1_lme)
Linear mixed-effects model fit by maximum likelihood
Data: crimedatabyrace1
AIC BIC logLik
231552.4 231582.3 -115772.2
Random effects:
Formula: ~1 | statename
(Intercept) Residual
StdDev: 908.1807 1494.13
Fixed effects: Arrestsrate_per100k ~ white
Value Std.Error DF t-value p-value
(Intercept) 3245.785 136.54066 13206 23.771565 0.0000
white 0.000 0.00008 13206 -0.279782 0.7796
Correlation:
(Intr)
white -0.052
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2451978 -0.6331617 -0.1160736 0.5145651 10.6186223
Number of Observations: 13253
Number of Groups: 46
m2_lme <- lme(Arrestsrate_per100k ~ white, data = crimedatabyrace1, random = ~ white|statename, method = "ML")
summary(m2_lme)
Linear mixed-effects model fit by maximum likelihood
Data: crimedatabyrace1
AIC BIC logLik
231548 231592.9 -115768
Random effects:
Formula: ~white | statename
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 9.101732e+02 (Intr)
white 4.278992e-04 0.016
Residual 1.492528e+03
Fixed effects: Arrestsrate_per100k ~ white
Value Std.Error DF t-value p-value
(Intercept) 3244.624 136.91245 13206 23.698533 0.0000
white 0.000 0.00011 13206 0.398053 0.6906
Correlation:
(Intr)
white -0.04
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.2501799 -0.6329070 -0.1188938 0.5100618 10.6371287
Number of Observations: 13253
Number of Groups: 46
AIC(AR, m1_lme, m2_lme)
Conclusion: As we can see the lowest AIC means best fit and in this report that is the random intercept model 2. In this report we see that race within states has very little effect on the average arrests rates per 100k population. Through further analysis one will be able to measure average arrests rate by sex and other races.