require(dplyr, quietly = T, warn.conflicts = F)
require(ggplot2, quietly = T, warn.conflicts = F)
arrest_df <- read.csv('https://raw.githubusercontent.com/mehtablocker/CUNY_bridge/master/Arrests.csv')
arrest_df <- arrest_df %>% select(-1)
## convert "Yes/No" to TRUE/FALSE
arrest_df$released <- arrest_df$released=="Yes"
arrest_df$employed <- arrest_df$employed=="Yes"
arrest_df$citizen <- arrest_df$citizen=="Yes"
Have possession arrests gone down over the years?
arrest_df %>% ggplot(aes(year) ) + geom_histogram(bins = 30) + labs(title = "Arrests by Year", x="Year", y="Number of Arrests")
The overall trend is not a steady decline. But we do see a substantial drop between 2001 and 2002. Unfortunately the data is not more recent. I suspect the drops would continue.
How is age related to arrests?
arrest_df$age %>% median; arrest_df$age %>% mean
## [1] 21
## [1] 23.84654
The median age of 21 is lower than I might have thought. The mean is of course higher than the median because the age distribution is skewed to the right (i.e., older), which can be reaffirmed by a simple boxplot:
arrest_df$age %>% boxplot(main="Boxplot of Arrest Age", horizontal=T, xlab="Age")
Were males more likely to be arrested than females?
arrest_df$sex %>% summary()
## Female Male
## 443 4783
That is a resounding “YES!”
Of course, this in and of itself does not prove causation (i.e., gender bias in arrests) because we don’t know if males are in fact more likely to use marijuana than females. But the fact that the ratio here is so high (over 10 to 1) relative to the ratio of men to women in the overall population (around 1 to 1) is definitely a bit… interesting.
What percentage of those arrested were black?
arrest_df$colour %>% summary()
## Black White
## 1288 3938
summary(arrest_df$colour)/length(arrest_df$colour)
## Black White
## 0.24646 0.75354
While once again we don’t necessarily know the causation, we can note that the percentage of black folks arrested (~25%) is quite a bit higher than the population percentage (recent figures estimate 8 to 9 percent of Toronto is black.)
Here is the percentage by year:
arrest_df %>% group_by(year) %>% summarise(pct_black=mean(colour=="Black")) %>% ggplot(aes(x=year, y=pct_black)) + geom_line() + geom_point() + labs(title="Black Arrest Percentage by Year", x="Year", y="Percentage of Those Arrested Who Were Black")
This dataset has some interesting information. I wish it were more recent and more specific (e.g., the current binary labels of “Black/White” for race are way too general and could be biasing the proportions.) Perhaps the most interesting statistic is the high ratio of male arrests to female arrests.