Analysis on Traffic Violations Among Demographics by Michelle Nguyen

Over the past decade, police brutality has been a topic of discussion more than ever. Many Americans have seen countless news articles of Black men and women encountering unwarrented aggressive and violent behavior from officers with some losing their lives as a result. With that being said, the racial injustices within America’s judicial system is a serious and obvious issue that cannot be avoided. This dataset consists of over 65,000 traffic violations from Maryland County from 2005-2011 with variables covering age, race, gender, violation type, if the driver was arrested, if a search was conducted, how long the stop was for, and if there was a drug related stop. With these variables, I would like to explore if there is a clear indicator that Black Americans are more suceptible to getting pulled over and are given harsher treatment. In addition, I would like to explore any differences between the genders and how their stops differ.

library(ggplot2)
library(RColorBrewer)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Load the data
traffic <- read.csv("/Users/michellenguyen/Downloads/traffic_violaions.csv")
head(traffic)

##   stop_date stop_time country_name driver_gender driver_age_raw driver_age
## 1  1/2/2005      1:55           NA             M           1985         20
## 2 1/18/2005      8:15           NA             M           1965         40
## 3 1/23/2005     23:15           NA             M           1972         33
## 4 2/20/2005     17:15           NA             M           1986         19
## 5 3/14/2005     10:00           NA             F           1984         21
## 6 3/23/2005      9:45           NA             M           1982         23
##   driver_race                  violation_raw violation search_conducted
## 1       White                       Speeding  Speeding            FALSE
## 2       White                       Speeding  Speeding            FALSE
## 3       White                       Speeding  Speeding            FALSE
## 4       White               Call for Service     Other            FALSE
## 5       White                       Speeding  Speeding            FALSE
## 6       Black Equipment/Inspection Violation Equipment            FALSE
##   search_type  stop_outcome is_arrested stop_duration drugs_related_stop
## 1                  Citation       FALSE      0-15 Min              FALSE
## 2                  Citation       FALSE      0-15 Min              FALSE
## 3                  Citation       FALSE      0-15 Min              FALSE
## 4             Arrest Driver        TRUE     16-30 Min              FALSE
## 5                  Citation       FALSE      0-15 Min              FALSE
## 6                  Citation       FALSE      0-15 Min              FALSE

# Remove na from driver gender and stop duration. Try to find the counts of stop durations by gender
gender <- traffic %>% 
  filter(!is.na(driver_gender) & !is.na(stop_duration)) %>%
  group_by(driver_gender, stop_duration) %>%
  count(driver_gender)
gender <- gender[-c(1,2),]
gender

## # A tibble: 6 × 3
## # Groups:   driver_gender, stop_duration [6]
##   driver_gender stop_duration     n
##   <chr>         <chr>         <int>
## 1 F             0-15 Min      10344
## 2 F             16-30 Min      2400
## 3 F             30+ Min         272
## 4 M             0-15 Min      27088
## 5 M             16-30 Min      7445
## 6 M             30+ Min        2031

# Grouped bar chart displaying the amount of times each gender was stopped and for how long
colnames(gender) <- c("Gender", "Stop_Duration", "Count")
ggplot(gender, aes(fill = Gender, y = Count, x = Stop_Duration)) + geom_bar(position= "dodge", stat = "identity") + xlab("Stop Duration") + ylab("Count") + ggtitle("Stop Duration Among Genders")

The graph shows that women get pulled over far less than men. Most traffic stop encounters are short.

options(scipen = 100)
table <- table(traffic$driver_race, traffic$is_arrested)
# Look at correlation between race and is arrested
table = table[-1,]
table

##           
##            FALSE  TRUE
##   Asian     1345    30
##   Black     6136   458
##   Hispanic  3830   269
##   Other      147     2
##   White    36254  1110

prop.table(table,2) # column percentages

##           
##                  FALSE        TRUE
##   Asian    0.028189973 0.016051364
##   Black    0.128604963 0.245050829
##   Hispanic 0.080273307 0.143927234
##   Other    0.003080986 0.001070091
##   White    0.759850771 0.593900482

White have a 59% chance of getting arrested, Black have 24% chance of getting arrested. It is also important to note that the White is the dominant race in Maryland. Asians do not get pulled over nearly as much as other races.

chisq.test(table)

## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 349.22, df = 4, p-value < 0.00000000000000022

t.test(table)

## 
##  One Sample t-test
## 
## data:  table
## t = 1.4029, df = 9, p-value = 0.1942
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -3036.954 12953.154
## sample estimates:
## mean of x 
##    4958.1

The Chi-squared test calculated the p-value among variables. I tested the p value for race and likeliness to get arrested and the p-value turned out very small. However, I tried to take into account the population variance and stumbled across Welch’s test which tests the hypothesis that two have equal means. This p-value came up to be .2922 which is very large, indicating that there is some correlation.

# Correlation between race and search conducted
table2 <- table(traffic$driver_race, traffic$search_conducted)
# Look at correlation between race and is arrested
table2 = table2[-1,]
table2

##           
##            FALSE  TRUE
##   Asian     1343    32
##   Black     6084   510
##   Hispanic  3762   337
##   Other      148     1
##   White    36159  1205

prop.table(table2,2) # column percentages

##           
##                   FALSE         TRUE
##   Asian    0.0282760654 0.0153477218
##   Black    0.1280949975 0.2446043165
##   Hispanic 0.0792066700 0.1616306954
##   Other    0.0031160519 0.0004796163
##   White    0.7613062153 0.5779376499

There is a 57% chance for White race to have searches conducted with Black race having 24%.

chisq.test(table2)

## 
##  Pearson's Chi-squared test
## 
## data:  table2
## X-squared = 473.77, df = 4, p-value < 0.00000000000000022

t.test(table2)

## 
##  One Sample t-test
## 
## data:  table2
## t = 1.4077, df = 9, p-value = 0.1928
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -3009.439 12925.639
## sample estimates:
## mean of x 
##    4958.1

Again, using the Chi-squared the p-value came out to be incredibly small. I wanted to use the Welch’s test and the p-value came out to be .1928 which indicates correlation.

Link to tableau: https://public.tableau.com/app/profile/michelle.n1785/viz/NumberofArrestsandSearchesbyRaceperYear/Dashboard1?publish=yes

Project3

Michelle Nguyen

5/10/2022

Analysis on Traffic Violations Among Demographics by Michelle Nguyen