This file contains data about traffic stops in Rhode Island, United States from 2005 to 2015. The data covers the date of the traffic stop, stop time, stop duration, gender and ethinicity of driver, age of driver, type of violation, if there was a search conducted, outcome of the stop, if there was an arrest, and if it was a drug related stop. In this project, I am going to import the dataset, clean the data, and perform exploratory data analysis using 5 kinds of data visualisations in order to gain a better understanding of the data and question the outcomes.
ri_traffic <- read.csv("U:/police_data.csv")
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(scales)
library(RColorBrewer)
library(ggthemes)
library(ggthemes)
colnames(ri_traffic)
## [1] "stop_date" "stop_time" "driver_gender"
## [4] "driver_age" "driver_race" "violation"
## [7] "search_conducted" "stop_outcome" "is_arrested"
## [10] "stop_duration" "drugs_related_stop"
head(ri_traffic)
## stop_date stop_time driver_gender driver_age driver_race violation
## 1 1/2/2005 1:55 M 20 White Speeding
## 2 1/18/2005 8:15 M 40 White Speeding
## 3 1/23/2005 23:15 M 33 White Speeding
## 4 2/20/2005 17:15 M 19 White Other
## 5 3/14/2005 10:00 F 21 White Speeding
## 6 3/23/2005 9:45 M 23 Black Equipment
## search_conducted stop_outcome is_arrested stop_duration drugs_related_stop
## 1 FALSE Citation FALSE 0-15 Min FALSE
## 2 FALSE Citation FALSE 0-15 Min FALSE
## 3 FALSE Citation FALSE 0-15 Min FALSE
## 4 FALSE Arrest Driver TRUE 16-30 Min FALSE
## 5 FALSE Citation FALSE 0-15 Min FALSE
## 6 FALSE Citation FALSE 0-15 Min FALSE
dim(ri_traffic)
## [1] 91741 11
summary(ri_traffic)
## stop_date stop_time driver_gender driver_age
## Length:91741 Length:91741 Length:91741 Min. :15.00
## Class :character Class :character Class :character 1st Qu.:23.00
## Mode :character Mode :character Mode :character Median :31.00
## Mean :34.01
## 3rd Qu.:43.00
## Max. :99.00
## NA's :5621
## driver_race violation search_conducted stop_outcome
## Length:91741 Length:91741 Mode :logical Length:91741
## Class :character Class :character FALSE:88545 Class :character
## Mode :character Mode :character TRUE :3196 Mode :character
##
##
##
##
## is_arrested stop_duration drugs_related_stop
## Mode :logical Length:91741 Mode :logical
## FALSE:83479 Class :character FALSE:90926
## TRUE :2929 Mode :character TRUE :815
## NA's :5333
##
##
##
It is clear that I have many columns and rows with “NA,” and thus I need to filter these out and remove them before I can begin my visualizations.
traffic <- ri_traffic
traffic[traffic == ""] <- NA
colSums(is.na(traffic))
## stop_date stop_time driver_gender driver_age
## 0 0 5335 5621
## driver_race violation search_conducted stop_outcome
## 5333 5333 0 5333
## is_arrested stop_duration drugs_related_stop
## 5333 5333 0
I am creating a bar chart to look at the number of traffic stops in Rhode island by the driver’s race.
In the bar chart above, you can see that most of the drivers who were
stopped by the police were white people. This makes sense because white
people make up most of Rhode Island’s population. The black and hispanic
population were stopped about 6 times less than the white population.
This data was collected from 2005-2015 and thus this makes me think if
these totals have changed in the recent 10ish year. Currently Rhode
Island has an 80% white population, but I wonder if decreased from 10
years ago since population diversity is increasing in the United
States.
I am creating a histogram to show the distribution of Rhode island traffic stops by year, from 2005 to 2015.
In the Histogram above, we can see that the years, 2006 and 2012, had
the most traffic stops in Rhode Island. Also, we can see that there were
a very low amount of traffic stops in 2005. Beside those three years,
the rest of years seem to linger around the same amount annually. There
is a very big jump from 2005 to 2006 in traffic stops. This could be due
to an infinite amount of things. I wonder if there was new technology
installed in 2006 that helped the police detect more issues that
required to be pulled over.
I am creating a stacked bar chart to show the how many of each type of traffic violation each race was stopped for.
In the stacked bar chart above, we can see that the white population
leads all categories in the each type of violation. Most of the traffic
stops were due to speeding overall and for each race population.
I am creating a line plot to show the increases and decreases of each violation type from 2005 to 2015.
From the line plot above, we can see that speeding was the most common
violation each year, which aligns with our previous findings. Also, In
our histogram, we saw that 2006 had the second most traffic stops out of
all of the years. Now looking at the line plot, we can see that this was
most likely due to all of the speeding violation. Besides speeding and
seatbelt violations, there seems to be similar trends between each type
of violation each year. In the line plot, we can see that there were no
stops for seatbelts in any year before 2012. This makes me question if
there is any data missing on seatbelt traffic stops.
I created a pie chart to show percentage of each race who was stopped by the Rhode Island police each year.
From the pie charts above, we can see that the white race leads in
traffic stops every year. Although, over the years, you can see a slight
increase in traffice stops of the hispanic and black population. This
could be due to increase in diversity in Rhode Island. This makes me
think that years, 2016 to 2023, could have even more increases in
traffic stops for races such as black, hispanic, or asian.
I created a heat map to showcase the amount of traffic violations each day of the week for each year from 2005 to 2015. The colors going from white to red, show the increasing amount of violations that day of the week.
From the heat map above, we can see that there is an extremely low
amount of violations in all days of the week in 2005. But this makes
sense, because in our histogram, we saw that 2005 had a very low overall
traffic stop count compared to other years. In 2006 and 2012, had high
violation counts every day, which makes sense. In 2014 and 2015, it is
clear that Saturday has a high count of violations compared to the rest
of the days of the week. I wonder why Saturday has the highest violation
count, considering it is the weekend and non-work day for most
people.
Overall, most of the traffic stops, from 2005 to 2015, are from speeding and most of the violations are made by the white population in Rhode Island. The extreme increase from 2005 to 2006 could be caused by many things such as technology, the economy, or legislation. It makes me wonder if there was a change in laws or a new technology was introduced to help the police make stops. I would love to be able to break this down by counties or town in Rhode Island so I can get a more in depth look at the data. I can ask questions like, which counties are being stopped the most? Or Which race has the most violations in each county?