Executive Summary

This file contains data about traffic stops in Rhode Island, United States from 2005 to 2015. The data covers the date of the traffic stop, stop time, stop duration, gender and ethinicity of driver, age of driver, type of violation, if there was a search conducted, outcome of the stop, if there was an arrest, and if it was a drug related stop. In this project, I am going to import the dataset, clean the data, and perform exploratory data analysis using 5 kinds of data visualisations in order to gain a better understanding of the data and question the outcomes.

Loading in Data and R Libraries:

ri_traffic <- read.csv("U:/police_data.csv")

library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(scales)
library(RColorBrewer)
library(ggthemes)
library(ggthemes)

Descriptive Statistics

colnames(ri_traffic)
##  [1] "stop_date"          "stop_time"          "driver_gender"     
##  [4] "driver_age"         "driver_race"        "violation"         
##  [7] "search_conducted"   "stop_outcome"       "is_arrested"       
## [10] "stop_duration"      "drugs_related_stop"
head(ri_traffic)
##   stop_date stop_time driver_gender driver_age driver_race violation
## 1  1/2/2005      1:55             M         20       White  Speeding
## 2 1/18/2005      8:15             M         40       White  Speeding
## 3 1/23/2005     23:15             M         33       White  Speeding
## 4 2/20/2005     17:15             M         19       White     Other
## 5 3/14/2005     10:00             F         21       White  Speeding
## 6 3/23/2005      9:45             M         23       Black Equipment
##   search_conducted  stop_outcome is_arrested stop_duration drugs_related_stop
## 1            FALSE      Citation       FALSE      0-15 Min              FALSE
## 2            FALSE      Citation       FALSE      0-15 Min              FALSE
## 3            FALSE      Citation       FALSE      0-15 Min              FALSE
## 4            FALSE Arrest Driver        TRUE     16-30 Min              FALSE
## 5            FALSE      Citation       FALSE      0-15 Min              FALSE
## 6            FALSE      Citation       FALSE      0-15 Min              FALSE
dim(ri_traffic)
## [1] 91741    11
summary(ri_traffic)
##   stop_date          stop_time         driver_gender        driver_age   
##  Length:91741       Length:91741       Length:91741       Min.   :15.00  
##  Class :character   Class :character   Class :character   1st Qu.:23.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :31.00  
##                                                           Mean   :34.01  
##                                                           3rd Qu.:43.00  
##                                                           Max.   :99.00  
##                                                           NA's   :5621   
##  driver_race         violation         search_conducted stop_outcome      
##  Length:91741       Length:91741       Mode :logical    Length:91741      
##  Class :character   Class :character   FALSE:88545      Class :character  
##  Mode  :character   Mode  :character   TRUE :3196       Mode  :character  
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  is_arrested     stop_duration      drugs_related_stop
##  Mode :logical   Length:91741       Mode :logical     
##  FALSE:83479     Class :character   FALSE:90926       
##  TRUE :2929      Mode  :character   TRUE :815         
##  NA's :5333                                           
##                                                       
##                                                       
## 

Cleaning Data

It is clear that I have many columns and rows with “NA,” and thus I need to filter these out and remove them before I can begin my visualizations.

traffic <- ri_traffic
traffic[traffic == ""] <- NA 
colSums(is.na(traffic))
##          stop_date          stop_time      driver_gender         driver_age 
##                  0                  0               5335               5621 
##        driver_race          violation   search_conducted       stop_outcome 
##               5333               5333                  0               5333 
##        is_arrested      stop_duration drugs_related_stop 
##               5333               5333                  0

Bar Chart

I am creating a bar chart to look at the number of traffic stops in Rhode island by the driver’s race.

In the bar chart above, you can see that most of the drivers who were stopped by the police were white people. This makes sense because white people make up most of Rhode Island’s population. The black and hispanic population were stopped about 6 times less than the white population. This data was collected from 2005-2015 and thus this makes me think if these totals have changed in the recent 10ish year. Currently Rhode Island has an 80% white population, but I wonder if decreased from 10 years ago since population diversity is increasing in the United States.

Histogram

I am creating a histogram to show the distribution of Rhode island traffic stops by year, from 2005 to 2015.

In the Histogram above, we can see that the years, 2006 and 2012, had the most traffic stops in Rhode Island. Also, we can see that there were a very low amount of traffic stops in 2005. Beside those three years, the rest of years seem to linger around the same amount annually. There is a very big jump from 2005 to 2006 in traffic stops. This could be due to an infinite amount of things. I wonder if there was new technology installed in 2006 that helped the police detect more issues that required to be pulled over.

Stacked Bar Chart

I am creating a stacked bar chart to show the how many of each type of traffic violation each race was stopped for.

In the stacked bar chart above, we can see that the white population leads all categories in the each type of violation. Most of the traffic stops were due to speeding overall and for each race population.

Line Plot

I am creating a line plot to show the increases and decreases of each violation type from 2005 to 2015.

From the line plot above, we can see that speeding was the most common violation each year, which aligns with our previous findings. Also, In our histogram, we saw that 2006 had the second most traffic stops out of all of the years. Now looking at the line plot, we can see that this was most likely due to all of the speeding violation. Besides speeding and seatbelt violations, there seems to be similar trends between each type of violation each year. In the line plot, we can see that there were no stops for seatbelts in any year before 2012. This makes me question if there is any data missing on seatbelt traffic stops.

Pie Chart

I created a pie chart to show percentage of each race who was stopped by the Rhode Island police each year.

From the pie charts above, we can see that the white race leads in traffic stops every year. Although, over the years, you can see a slight increase in traffice stops of the hispanic and black population. This could be due to increase in diversity in Rhode Island. This makes me think that years, 2016 to 2023, could have even more increases in traffic stops for races such as black, hispanic, or asian.

Heat Map

I created a heat map to showcase the amount of traffic violations each day of the week for each year from 2005 to 2015. The colors going from white to red, show the increasing amount of violations that day of the week.

From the heat map above, we can see that there is an extremely low amount of violations in all days of the week in 2005. But this makes sense, because in our histogram, we saw that 2005 had a very low overall traffic stop count compared to other years. In 2006 and 2012, had high violation counts every day, which makes sense. In 2014 and 2015, it is clear that Saturday has a high count of violations compared to the rest of the days of the week. I wonder why Saturday has the highest violation count, considering it is the weekend and non-work day for most people.

Conclusion

Overall, most of the traffic stops, from 2005 to 2015, are from speeding and most of the violations are made by the white population in Rhode Island. The extreme increase from 2005 to 2006 could be caused by many things such as technology, the economy, or legislation. It makes me wonder if there was a change in laws or a new technology was introduced to help the police make stops. I would love to be able to break this down by counties or town in Rhode Island so I can get a more in depth look at the data. I can ask questions like, which counties are being stopped the most? Or Which race has the most violations in each county?