===================== # STEP 1: Set up my environment # =====================
Notes: setting up my R environment by loading ‘tidyverse’ and Los Angeles data set.
library(tidyverse)
library(lubridate)
library(janitor)
library(scales)
LA<-read.csv("ca_los_angeles_2020_04_01.csv")
===================== # STEP 2: Explore the data set # =====================
First, let’s take a look at what columns exist in our Los Angeles data frame.
colnames(LA)
## [1] "raw_row_number" "date"
## [3] "time" "district"
## [5] "region" "subject_race"
## [7] "subject_sex" "officer_id_hash"
## [9] "type" "raw_descent_description"
How many stops do we have in our dataset?
nrow(LA)
## [1] 5418402
What date range does our data cover?
## [1] "2010-01-01"
to
## [1] "2018-06-23"
Since we only have six months of data for 2018, let’s filter it out. Note that there are ways to deal with partial years in analysis, but to make things easier for ourselves, let’s focus on 2010-2017.
===================== # STEP 3. Clean the data # =====================
LA <- filter(LA, year(date) < 2018)
How many rows do we have now?
LA %>% nrow()
## [1] 5108322
Our analyses will focus on traffic stops so lets make sure our data frame is applicable for such data
unique(LA$type)
## [1] "vehicular" "pedestrian" NA
Filter for traffic stops only
LA <- LA %>% filter(type == "vehicular")
Double check the data frame (only contains traffic stops).
unique(LA$type)
## [1] "vehicular"
View number of rows
nrow(LA)
## [1] 3899546
===================== # STEP 4. Conduct descriptive analysis # =====================
To find stop counts per year, we need to define a notion of year (recall that our data only has date). Use the year() and mutate() functions to add a new column called year to our LA data frame and then use count() to determine the number of stops per year.
## year n
## 1 2010 362513
## 2 2011 616919
## 3 2012 580928
## 4 2013 566108
## 5 2014 535928
## 6 2015 384006
## 7 2016 398011
## 8 2017 455133
Use count() to determine the number of stops by race.
## subject_race n
## 1 asian/pacific islander 179635
## 2 black 817122
## 3 hispanic 1626446
## 4 other 288608
## 5 white 987735
Let’s make another table that gives us the proportion of stops by race.
## subject_race n prop
## 1 asian/pacific islander 179635 0.04606562
## 2 black 817122 0.20954285
## 3 hispanic 1626446 0.41708599
## 4 other 288608 0.07401067
## 5 white 987735 0.25329487
At first glance, we see there are far more stops of hispanic drivers than any other race. This stat on its own doesn’t actually say much. We’ll return to this more rigorously later on.
How about counting how many stops by year and race?
## year subject_race n
## 1 2010 asian/pacific islander 20853
## 2 2010 black 57975
## 3 2010 hispanic 134334
## 4 2010 other 26650
## 5 2010 white 122701
## 6 2011 asian/pacific islander 30279
## 7 2011 black 122746
## 8 2011 hispanic 249702
## 9 2011 other 41893
## 10 2011 white 172299
## 11 2012 asian/pacific islander 29290
## 12 2012 black 116744
## 13 2012 hispanic 236928
## 14 2012 other 41472
## 15 2012 white 156494
## 16 2013 asian/pacific islander 26970
## 17 2013 black 115285
## 18 2013 hispanic 234257
## 19 2013 other 42005
## 20 2013 white 147591
## 21 2014 asian/pacific islander 24956
## 22 2014 black 106267
## 23 2014 hispanic 226617
## 24 2014 other 45409
## 25 2014 white 132679
## 26 2015 asian/pacific islander 16019
## 27 2015 black 86184
## 28 2015 hispanic 164909
## 29 2015 other 29488
## 30 2015 white 87406
## 31 2016 asian/pacific islander 14331
## 32 2016 black 97115
## 33 2016 hispanic 178527
## 34 2016 other 28808
## 35 2016 white 79230
## 36 2017 asian/pacific islander 16937
## 37 2017 black 114806
## 38 2017 hispanic 201172
## 39 2017 other 32883
## 40 2017 white 89335
Lets visualize
Traffic stops were on the rise for all racial groups in 2017.Lets look further and see what happened in that year.
df2017 <- LA %>% filter(year(date) == 2017)
Look for racial demographics in Los Angeles (2017) i.e. number of hispanics, blacks, asian, etc. and create a data frame to establish comparison.
population_2017 <- tibble(subject_race = c("asian/pacific islander","black", "hispanic", "other","white"),
num_people = c(467248, 351971, 1922879, 1069295, 2061262)) %>%
mutate(subject_race = as.factor(subject_race))
Lets see our newly formed numbers and with proportion rates.
## # A tibble: 5 x 3
## subject_race num_people proportion
## <fct> <dbl> <dbl>
## 1 asian/pacific islander 467248 0.0796
## 2 black 351971 0.0599
## 3 hispanic 1922879 0.327
## 4 other 1069295 0.182
## 5 white 2061262 0.351
Visualization
If we join the two tables (population_2017 and df2017) together, we can compute stop rates by race i.e. number of stops per person. (Note: Remember to take into account how many years are in your stop data, in order to get a true value of stops per capita; we’re using only 2017 for stops and for population, so we’re in good shape). Use left join () to complete benchmark test and mutate () to calculate the stop rate.
## subject_race n num_people stop_rate
## 1 asian/pacific islander 16937 467248 0.03624842
## 2 black 114806 351971 0.32618028
## 3 hispanic 201172 1922879 0.10462021
## 4 other 32883 1069295 0.03075204
## 5 white 89335 2061262 0.04333995
We can now divide the hispanic stop rate by the white stop rate to be able to make a quantitative statement about how much more often hispanic drivers are stopped compared to white drivers, relative to their share of the city’s population. Hispanic drivers are stopped at a rate 2.4 times higher than white drivers. Black drivers on the other hand are stopped at a rate 7.5 times higher than white drivers. Blacks are about 6 percent of the population but they have the highest stop rate, ~33 percent. Let’s look more into the black population but first is there a difference in the stop rate when we take into account sex?
## subject_sex n prop
## 1 female 139509 0.3065236
## 2 male 315624 0.6934764
Unfortunately, we don’t have a data frame to establish comparison of stop rates for sex but we can see that males make up nearly 70 percent of police vehicular encounters in Los Angeles regardless of race. Visualization.
Lets find the regions with the highest number of vehicular stops.
## region n
## 1 VALLEY TRAFFIC 62810
## 2 METROPOLITAN DIVISN 46956
## 3 WEST TRAFFIC 45345
## 4 SOUTH TRAFFIC 43069
## 5 CENTRAL TRAFFIC 32593
## 6 CENTRAL 19509
## 7 HOLLYWOOD 15699
## 8 SEVENTY-SEVENTH 15239
## 9 NEWTON 14799
## 10 SOUTH WEST 12939
## 11 DEVONSHIRE 12349
## 12 SOUTH EAST 12263
## 13 OLYMPIC 11361
## 14 HARBOR 10533
## 15 WEST VALLEY 9862
## 16 PACIFIC 9754
## 17 WILSHIRE 9504
## 18 FOOTHILL 8476
## 19 MISSION 7439
## 20 HOLLENBECK 7255
## 21 TOPANGA 7121
## 22 WEST LA 7016
## 23 VAN NUYS 6963
## 24 NORTH HOLLYWOOD 6639
## 25 NORTH EAST 6214
## 26 RAMPART 5867
## 27 UNIFORMED SUPPORT 3765
## 28 SOUTH BUREAU 1232
## 29 VALLEY BUREAU 904
## 30 SECURITY SERVICES 454
Visualize the racial demographics in the top regions
Earlier we found the stop rate for blacks (~33%). Based on the graph the METROPOLITAN DIVISN was the second highest region with the most vehicular stops. The METROPOLITAN DIVISN region was one of the top regions where we only see a high number of black vehicular stops. Let’s find the proportion for blacks in this region.
## subject_race n proportion
## 1 asian/pacific islander 313 0.006665815
## 2 black 22331 0.475572877
## 3 hispanic 20829 0.443585484
## 4 other 1506 0.032072579
## 5 white 1977 0.042103246
Nearly 50 percent of all vehicular stops for blacks occurred in the METROPOLITAN DIVISN region. What is the stop rate in this particular region?
## subject_race n num_people stop_rate
## 1 asian/pacific islander 313 467248 0.0006698798
## 2 black 22331 351971 0.0634455680
## 3 hispanic 20829 1922879 0.0108321948
## 4 other 1506 1069295 0.0014084046
## 5 white 1977 2061262 0.0009591212
The stop rate for blacks in the ‘METROPOLITAN DIVISN’ was 6.3 percent but compared to the stop rates of whites, black are 66 time more likely to be stopped in this region. There can be many explanations to why the numbers are so high. This is all to say that while benchmark stats are a good place to start, more investigation is required before we can draw any conclusions that bias or discrimination is overtly present in Los Angeles law enforcement.
End of Analysis