===================== # STEP 1: Set up my environment # =====================

Notes: setting up my R environment by loading ‘tidyverse’ and Los Angeles data set.

library(tidyverse)
library(lubridate)
library(janitor)
library(scales)
LA<-read.csv("ca_los_angeles_2020_04_01.csv")

===================== # STEP 2: Explore the data set # =====================

First, let’s take a look at what columns exist in our Los Angeles data frame.

colnames(LA)
##  [1] "raw_row_number"          "date"                   
##  [3] "time"                    "district"               
##  [5] "region"                  "subject_race"           
##  [7] "subject_sex"             "officer_id_hash"        
##  [9] "type"                    "raw_descent_description"

How many stops do we have in our dataset?

nrow(LA)
## [1] 5418402

What date range does our data cover?

## [1] "2010-01-01"

to

## [1] "2018-06-23"

Since we only have six months of data for 2018, let’s filter it out. Note that there are ways to deal with partial years in analysis, but to make things easier for ourselves, let’s focus on 2010-2017.

===================== # STEP 3. Clean the data # =====================

LA <- filter(LA, year(date) < 2018)

How many rows do we have now?

LA %>% nrow()
## [1] 5108322

Our analyses will focus on traffic stops so lets make sure our data frame is applicable for such data

 unique(LA$type)
## [1] "vehicular"  "pedestrian" NA

Filter for traffic stops only

LA <- LA %>% filter(type == "vehicular")

Double check the data frame (only contains traffic stops).

 unique(LA$type)
## [1] "vehicular"

View number of rows

nrow(LA)
## [1] 3899546

===================== # STEP 4. Conduct descriptive analysis # =====================

To find stop counts per year, we need to define a notion of year (recall that our data only has date). Use the year() and mutate() functions to add a new column called year to our LA data frame and then use count() to determine the number of stops per year.

##   year      n
## 1 2010 362513
## 2 2011 616919
## 3 2012 580928
## 4 2013 566108
## 5 2014 535928
## 6 2015 384006
## 7 2016 398011
## 8 2017 455133

Use count() to determine the number of stops by race.

##             subject_race       n
## 1 asian/pacific islander  179635
## 2                  black  817122
## 3               hispanic 1626446
## 4                  other  288608
## 5                  white  987735

Let’s make another table that gives us the proportion of stops by race.

##             subject_race       n       prop
## 1 asian/pacific islander  179635 0.04606562
## 2                  black  817122 0.20954285
## 3               hispanic 1626446 0.41708599
## 4                  other  288608 0.07401067
## 5                  white  987735 0.25329487

At first glance, we see there are far more stops of hispanic drivers than any other race. This stat on its own doesn’t actually say much. We’ll return to this more rigorously later on.

How about counting how many stops by year and race?

##    year           subject_race      n
## 1  2010 asian/pacific islander  20853
## 2  2010                  black  57975
## 3  2010               hispanic 134334
## 4  2010                  other  26650
## 5  2010                  white 122701
## 6  2011 asian/pacific islander  30279
## 7  2011                  black 122746
## 8  2011               hispanic 249702
## 9  2011                  other  41893
## 10 2011                  white 172299
## 11 2012 asian/pacific islander  29290
## 12 2012                  black 116744
## 13 2012               hispanic 236928
## 14 2012                  other  41472
## 15 2012                  white 156494
## 16 2013 asian/pacific islander  26970
## 17 2013                  black 115285
## 18 2013               hispanic 234257
## 19 2013                  other  42005
## 20 2013                  white 147591
## 21 2014 asian/pacific islander  24956
## 22 2014                  black 106267
## 23 2014               hispanic 226617
## 24 2014                  other  45409
## 25 2014                  white 132679
## 26 2015 asian/pacific islander  16019
## 27 2015                  black  86184
## 28 2015               hispanic 164909
## 29 2015                  other  29488
## 30 2015                  white  87406
## 31 2016 asian/pacific islander  14331
## 32 2016                  black  97115
## 33 2016               hispanic 178527
## 34 2016                  other  28808
## 35 2016                  white  79230
## 36 2017 asian/pacific islander  16937
## 37 2017                  black 114806
## 38 2017               hispanic 201172
## 39 2017                  other  32883
## 40 2017                  white  89335

Lets visualize

Traffic stops were on the rise for all racial groups in 2017.Lets look further and see what happened in that year.

df2017 <- LA %>% filter(year(date) == 2017)

Look for racial demographics in Los Angeles (2017) i.e. number of hispanics, blacks, asian, etc. and create a data frame to establish comparison.

population_2017 <- tibble(subject_race = c("asian/pacific islander","black", "hispanic", "other","white"),
  num_people = c(467248, 351971, 1922879, 1069295, 2061262)) %>%
  mutate(subject_race = as.factor(subject_race))

Lets see our newly formed numbers and with proportion rates.

## # A tibble: 5 x 3
##   subject_race           num_people proportion
##   <fct>                       <dbl>      <dbl>
## 1 asian/pacific islander     467248     0.0796
## 2 black                      351971     0.0599
## 3 hispanic                  1922879     0.327 
## 4 other                     1069295     0.182 
## 5 white                     2061262     0.351

Visualization

Stop rates

———————

If we join the two tables (population_2017 and df2017) together, we can compute stop rates by race i.e. number of stops per person. (Note: Remember to take into account how many years are in your stop data, in order to get a true value of stops per capita; we’re using only 2017 for stops and for population, so we’re in good shape). Use left join () to complete benchmark test and mutate () to calculate the stop rate.

##             subject_race      n num_people  stop_rate
## 1 asian/pacific islander  16937     467248 0.03624842
## 2                  black 114806     351971 0.32618028
## 3               hispanic 201172    1922879 0.10462021
## 4                  other  32883    1069295 0.03075204
## 5                  white  89335    2061262 0.04333995

We can now divide the hispanic stop rate by the white stop rate to be able to make a quantitative statement about how much more often hispanic drivers are stopped compared to white drivers, relative to their share of the city’s population. Hispanic drivers are stopped at a rate 2.4 times higher than white drivers. Black drivers on the other hand are stopped at a rate 7.5 times higher than white drivers. Blacks are about 6 percent of the population but they have the highest stop rate, ~33 percent. Let’s look more into the black population but first is there a difference in the stop rate when we take into account sex?

##   subject_sex      n      prop
## 1      female 139509 0.3065236
## 2        male 315624 0.6934764

Unfortunately, we don’t have a data frame to establish comparison of stop rates for sex but we can see that males make up nearly 70 percent of police vehicular encounters in Los Angeles regardless of race. Visualization.


Vehicular stops based on region

———————

Lets find the regions with the highest number of vehicular stops.

##                 region     n
## 1       VALLEY TRAFFIC 62810
## 2  METROPOLITAN DIVISN 46956
## 3         WEST TRAFFIC 45345
## 4        SOUTH TRAFFIC 43069
## 5      CENTRAL TRAFFIC 32593
## 6              CENTRAL 19509
## 7            HOLLYWOOD 15699
## 8      SEVENTY-SEVENTH 15239
## 9               NEWTON 14799
## 10          SOUTH WEST 12939
## 11          DEVONSHIRE 12349
## 12          SOUTH EAST 12263
## 13             OLYMPIC 11361
## 14              HARBOR 10533
## 15         WEST VALLEY  9862
## 16             PACIFIC  9754
## 17            WILSHIRE  9504
## 18            FOOTHILL  8476
## 19             MISSION  7439
## 20          HOLLENBECK  7255
## 21             TOPANGA  7121
## 22             WEST LA  7016
## 23            VAN NUYS  6963
## 24     NORTH HOLLYWOOD  6639
## 25          NORTH EAST  6214
## 26             RAMPART  5867
## 27   UNIFORMED SUPPORT  3765
## 28        SOUTH BUREAU  1232
## 29       VALLEY BUREAU   904
## 30   SECURITY SERVICES   454

Visualize the racial demographics in the top regions

Earlier we found the stop rate for blacks (~33%). Based on the graph the METROPOLITAN DIVISN was the second highest region with the most vehicular stops. The METROPOLITAN DIVISN region was one of the top regions where we only see a high number of black vehicular stops. Let’s find the proportion for blacks in this region.

##             subject_race     n  proportion
## 1 asian/pacific islander   313 0.006665815
## 2                  black 22331 0.475572877
## 3               hispanic 20829 0.443585484
## 4                  other  1506 0.032072579
## 5                  white  1977 0.042103246

Nearly 50 percent of all vehicular stops for blacks occurred in the METROPOLITAN DIVISN region. What is the stop rate in this particular region?

##             subject_race     n num_people    stop_rate
## 1 asian/pacific islander   313     467248 0.0006698798
## 2                  black 22331     351971 0.0634455680
## 3               hispanic 20829    1922879 0.0108321948
## 4                  other  1506    1069295 0.0014084046
## 5                  white  1977    2061262 0.0009591212

The stop rate for blacks in the ‘METROPOLITAN DIVISN’ was 6.3 percent but compared to the stop rates of whites, black are 66 time more likely to be stopped in this region. There can be many explanations to why the numbers are so high. This is all to say that while benchmark stats are a good place to start, more investigation is required before we can draw any conclusions that bias or discrimination is overtly present in Los Angeles law enforcement.

End of Analysis