Over the past decade, police brutality has been a topic of discussion more than ever. Many Americans have seen countless news articles of Black men and women encountering unwarrented aggressive and violent behavior from officers with some losing their lives as a result. With that being said, the racial injustices within America’s judicial system is a serious and obvious issue that cannot be avoided. This dataset consists of over 65,000 traffic violations from Maryland County from 2005-2011 with variables covering age, race, gender, violation type, if the driver was arrested, if a search was conducted, how long the stop was for, and if there was a drug related stop. With these variables, I would like to explore if there is a clear indicator that Black Americans are more suceptible to getting pulled over and are given harsher treatment. In addition, I would like to explore any differences between the genders and how their stops differ.
library(ggplot2)
library(RColorBrewer)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Load the data
traffic <- read.csv("/Users/michellenguyen/Downloads/traffic_violaions.csv")
head(traffic)
## stop_date stop_time country_name driver_gender driver_age_raw driver_age
## 1 1/2/2005 1:55 NA M 1985 20
## 2 1/18/2005 8:15 NA M 1965 40
## 3 1/23/2005 23:15 NA M 1972 33
## 4 2/20/2005 17:15 NA M 1986 19
## 5 3/14/2005 10:00 NA F 1984 21
## 6 3/23/2005 9:45 NA M 1982 23
## driver_race violation_raw violation search_conducted
## 1 White Speeding Speeding FALSE
## 2 White Speeding Speeding FALSE
## 3 White Speeding Speeding FALSE
## 4 White Call for Service Other FALSE
## 5 White Speeding Speeding FALSE
## 6 Black Equipment/Inspection Violation Equipment FALSE
## search_type stop_outcome is_arrested stop_duration drugs_related_stop
## 1 Citation FALSE 0-15 Min FALSE
## 2 Citation FALSE 0-15 Min FALSE
## 3 Citation FALSE 0-15 Min FALSE
## 4 Arrest Driver TRUE 16-30 Min FALSE
## 5 Citation FALSE 0-15 Min FALSE
## 6 Citation FALSE 0-15 Min FALSE
# Remove na from driver gender and stop duration. Try to find the counts of stop durations by gender
gender <- traffic %>%
filter(!is.na(driver_gender) & !is.na(stop_duration)) %>%
group_by(driver_gender, stop_duration) %>%
count(driver_gender)
gender <- gender[-c(1,2),]
gender
## # A tibble: 6 × 3
## # Groups: driver_gender, stop_duration [6]
## driver_gender stop_duration n
## <chr> <chr> <int>
## 1 F 0-15 Min 10344
## 2 F 16-30 Min 2400
## 3 F 30+ Min 272
## 4 M 0-15 Min 27088
## 5 M 16-30 Min 7445
## 6 M 30+ Min 2031
# Grouped bar chart displaying the amount of times each gender was stopped and for how long
colnames(gender) <- c("Gender", "Stop_Duration", "Count")
ggplot(gender, aes(fill = Gender, y = Count, x = Stop_Duration)) + geom_bar(position= "dodge", stat = "identity") + xlab("Stop Duration") + ylab("Count") + ggtitle("Stop Duration Among Genders")
The graph shows that women get pulled over far less than men. Most traffic stop encounters are short.
options(scipen = 100)
table <- table(traffic$driver_race, traffic$is_arrested)
# Look at correlation between race and is arrested
table = table[-1,]
table
##
## FALSE TRUE
## Asian 1345 30
## Black 6136 458
## Hispanic 3830 269
## Other 147 2
## White 36254 1110
prop.table(table,2) # column percentages
##
## FALSE TRUE
## Asian 0.028189973 0.016051364
## Black 0.128604963 0.245050829
## Hispanic 0.080273307 0.143927234
## Other 0.003080986 0.001070091
## White 0.759850771 0.593900482
White have a 59% chance of getting arrested, Black have 24% chance of getting arrested. It is also important to note that the White is the dominant race in Maryland. Asians do not get pulled over nearly as much as other races.
chisq.test(table)
##
## Pearson's Chi-squared test
##
## data: table
## X-squared = 349.22, df = 4, p-value < 0.00000000000000022
t.test(table)
##
## One Sample t-test
##
## data: table
## t = 1.4029, df = 9, p-value = 0.1942
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -3036.954 12953.154
## sample estimates:
## mean of x
## 4958.1
The Chi-squared test calculated the p-value among variables. I tested the p value for race and likeliness to get arrested and the p-value turned out very small. However, I tried to take into account the population variance and stumbled across Welch’s test which tests the hypothesis that two have equal means. This p-value came up to be .2922 which is very large, indicating that there is some correlation.
# Correlation between race and search conducted
table2 <- table(traffic$driver_race, traffic$search_conducted)
# Look at correlation between race and is arrested
table2 = table2[-1,]
table2
##
## FALSE TRUE
## Asian 1343 32
## Black 6084 510
## Hispanic 3762 337
## Other 148 1
## White 36159 1205
prop.table(table2,2) # column percentages
##
## FALSE TRUE
## Asian 0.0282760654 0.0153477218
## Black 0.1280949975 0.2446043165
## Hispanic 0.0792066700 0.1616306954
## Other 0.0031160519 0.0004796163
## White 0.7613062153 0.5779376499
There is a 57% chance for White race to have searches conducted with Black race having 24%.
chisq.test(table2)
##
## Pearson's Chi-squared test
##
## data: table2
## X-squared = 473.77, df = 4, p-value < 0.00000000000000022
t.test(table2)
##
## One Sample t-test
##
## data: table2
## t = 1.4077, df = 9, p-value = 0.1928
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -3009.439 12925.639
## sample estimates:
## mean of x
## 4958.1
Again, using the Chi-squared the p-value came out to be incredibly small. I wanted to use the Welch’s test and the p-value came out to be .1928 which indicates correlation.
Link to tableau: https://public.tableau.com/app/profile/michelle.n1785/viz/NumberofArrestsandSearchesbyRaceperYear/Dashboard1?publish=yes