Need Help?

Transform your raw data into actionable insights. Let my expertise in R and advanced data analysis techniques unlock the power of your information. Get a personalized consultation and see how I can streamline your projects, saving you time and driving better decision-making. Contact me today at or visit to schedule a call.

Contact Now RstudioDataLab

Introduction

The data set provided by the Dallas Police Department includes information about the subjects involved in occurrences. Injuries may occur throughout this course, which has also been recorded. Other information provided includes officer and subject race, gender, officer force type, and so on. The primary goal of the data presented is to investigate whether there is a race effect on arrests and other crime-related incidents involving both parties. We shall examine this question using the procedures outlined below.

Load the data

data <- read.csv("37-00049_UOF-P_2016_prepped.csv")

Data was successfully loaded in R and called it as data.

Data Manuplation

dim(data)
## [1] 2384   47
colnames(data)
##  [1] "INCIDENT_DATE"                               
##  [2] "INCIDENT_TIME"                               
##  [3] "UOF_NUMBER"                                  
##  [4] "OFFICER_ID"                                  
##  [5] "OFFICER_GENDER"                              
##  [6] "OFFICER_RACE"                                
##  [7] "OFFICER_HIRE_DATE"                           
##  [8] "OFFICER_YEARS_ON_FORCE"                      
##  [9] "OFFICER_INJURY"                              
## [10] "OFFICER_INJURY_TYPE"                         
## [11] "OFFICER_HOSPITALIZATION"                     
## [12] "SUBJECT_ID"                                  
## [13] "SUBJECT_RACE"                                
## [14] "SUBJECT_GENDER"                              
## [15] "SUBJECT_INJURY"                              
## [16] "SUBJECT_INJURY_TYPE"                         
## [17] "SUBJECT_WAS_ARRESTED"                        
## [18] "SUBJECT_DESCRIPTION"                         
## [19] "SUBJECT_OFFENSE"                             
## [20] "REPORTING_AREA"                              
## [21] "BEAT"                                        
## [22] "SECTOR"                                      
## [23] "DIVISION"                                    
## [24] "LOCATION_DISTRICT"                           
## [25] "STREET_NUMBER"                               
## [26] "STREET_NAME"                                 
## [27] "STREET_DIRECTION"                            
## [28] "STREET_TYPE"                                 
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"                               
## [31] "LOCATION_STATE"                              
## [32] "LOCATION_LATITUDE"                           
## [33] "LOCATION_LONGITUDE"                          
## [34] "INCIDENT_REASON"                             
## [35] "REASON_FOR_FORCE"                            
## [36] "TYPE_OF_FORCE_USED1"                         
## [37] "TYPE_OF_FORCE_USED2"                         
## [38] "TYPE_OF_FORCE_USED3"                         
## [39] "TYPE_OF_FORCE_USED4"                         
## [40] "TYPE_OF_FORCE_USED5"                         
## [41] "TYPE_OF_FORCE_USED6"                         
## [42] "TYPE_OF_FORCE_USED7"                         
## [43] "TYPE_OF_FORCE_USED8"                         
## [44] "TYPE_OF_FORCE_USED9"                         
## [45] "TYPE_OF_FORCE_USED10"                        
## [46] "NUMBER_EC_CYCLES"                            
## [47] "FORCE_EFFECTIVE"
data<-data[-1,]
head(data)
##   INCIDENT_DATE INCIDENT_TIME    UOF_NUMBER OFFICER_ID OFFICER_GENDER
## 2      09-03-16    4:14:00 AM         37702      10810           Male
## 3       3/22/16   11:00:00 PM         33413       7706           Male
## 4       5/22/16    1:29:00 PM         34567      11014           Male
## 5      01-10-16    8:55:00 PM         31460       6692           Male
## 6      11-08-16    2:30:00 AM  37879, 37898       9844           Male
## 7      09-11-16    7:20:00 PM         36724       9855           Male
##   OFFICER_RACE OFFICER_HIRE_DATE OFFICER_YEARS_ON_FORCE OFFICER_INJURY
## 2        Black          05-07-14                      2             No
## 3        White          01-08-99                     17            Yes
## 4        Black           5/20/15                      1             No
## 5        Black           7/29/91                     24             No
## 6        White          10-04-09                      7             No
## 7        White          06-10-09                      7             No
##            OFFICER_INJURY_TYPE OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## 2 No injuries noted or visible                      No      46424        Black
## 3                Sprain/Strain                     Yes      44324     Hispanic
## 4 No injuries noted or visible                      No      45126     Hispanic
## 5 No injuries noted or visible                      No      43150     Hispanic
## 6 No injuries noted or visible                      No      47307        Black
## 7 No injuries noted or visible                      No      46549        White
##   SUBJECT_GENDER SUBJECT_INJURY          SUBJECT_INJURY_TYPE
## 2         Female            Yes      Non-Visible Injury/Pain
## 3           Male             No No injuries noted or visible
## 4           Male             No No injuries noted or visible
## 5           Male            Yes               Laceration/Cut
## 6           Male             No No injuries noted or visible
## 7         Female             No No injuries noted or visible
##   SUBJECT_WAS_ARRESTED SUBJECT_DESCRIPTION          SUBJECT_OFFENSE
## 2                  Yes   Mentally unstable                    APOWW
## 3                  Yes   Mentally unstable                    APOWW
## 4                  Yes             Unknown                    APOWW
## 5                  Yes FD-Unknown if Armed           Evading Arrest
## 6                  Yes             Unknown Other Misdemeanor Arrest
## 7                  Yes             Unknown               Assault/FV
##   REPORTING_AREA BEAT SECTOR      DIVISION LOCATION_DISTRICT STREET_NUMBER
## 2           2062  134    130       CENTRAL               D14           211
## 3           1197  237    230     NORTHEAST                D9          7647
## 4           4153  432    430     SOUTHWEST                D6           716
## 5           4523  641    640 NORTH CENTRAL               D11          5600
## 6           2167  346    340     SOUTHEAST                D7          4600
## 7           1134  235    230     NORTHEAST                D9          1234
##    STREET_NAME STREET_DIRECTION STREET_TYPE
## 2        Ervay                N         St.
## 3     Ferguson             NULL         Rd.
## 4 bimebella dr             NULL         Ln.
## 5          LBJ             NULL       Frwy.
## 6    Malcolm X                S       Blvd.
## 7        Peavy             NULL         Rd.
##   LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY LOCATION_STATE
## 2                               211 N ERVAY ST        Dallas             TX
## 3                             7647 FERGUSON RD        Dallas             TX
## 4                             716 BIMEBELLA LN        Dallas             TX
## 5                               5600 L B J FWY        Dallas             TX
## 6                        4600 S MALCOLM X BLVD        Dallas             TX
## 7                                1234 PEAVY RD        Dallas             TX
##   LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON REASON_FOR_FORCE
## 2         32.782205         -96.797461          Arrest           Arrest
## 3         32.798978         -96.717493          Arrest           Arrest
## 4          32.73971          -96.92519          Arrest           Arrest
## 5                                               Arrest           Arrest
## 6                                               Arrest           Arrest
## 7         32.837527         -96.695566          Arrest           Arrest
##      TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## 2  Hand/Arm/Elbow Strike                                        
## 3            Joint Locks                                        
## 4      Take Down - Group                                        
## 5         K-9 Deployment                                        
## 6         Verbal Command     Take Down - Arm                    
## 7 Hand Controlled Escort                                        
##   TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## 2                                                            
## 3                                                            
## 4                                                            
## 5                                                            
## 6                                                            
## 7                                                            
##   TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## 2                                  NULL             Yes
## 3                                  NULL             Yes
## 4                                  NULL             Yes
## 5                                  NULL             Yes
## 6                                  NULL         No, Yes
## 7                                  NULL             Yes
library(dplyr)
data<-data %>% mutate_if(is.character, as.factor)

The data set consists of 47 columns and 2348 observations. The data set comprises the names listed below. Because the data set has duplicate column names, the second row from the data set will be eliminated before proceeding. Using R’s str function, we were able to get insight into the data set’s structure. The data set’s columns were all in character format. The data exploration was carried out by utilising the different libraries.

Data Exploration

library(SmartEDA)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(DT)
library(ggplot2)

ExpData(data, type=2)
##    Index                                Variable_Name Variable_Type Sample_n
## 1      1                                INCIDENT_DATE        factor     2383
## 2      2                                INCIDENT_TIME        factor     2383
## 3      3                                   UOF_NUMBER        factor     2383
## 4      4                                   OFFICER_ID        factor     2383
## 5      5                               OFFICER_GENDER        factor     2383
## 6      6                                 OFFICER_RACE        factor     2383
## 7      7                            OFFICER_HIRE_DATE        factor     2383
## 8      8                       OFFICER_YEARS_ON_FORCE        factor     2383
## 9      9                               OFFICER_INJURY        factor     2383
## 10    10                          OFFICER_INJURY_TYPE        factor     2383
## 11    11                      OFFICER_HOSPITALIZATION        factor     2383
## 12    12                                   SUBJECT_ID        factor     2383
## 13    13                                 SUBJECT_RACE        factor     2383
## 14    14                               SUBJECT_GENDER        factor     2383
## 15    15                               SUBJECT_INJURY        factor     2383
## 16    16                          SUBJECT_INJURY_TYPE        factor     2383
## 17    17                         SUBJECT_WAS_ARRESTED        factor     2383
## 18    18                          SUBJECT_DESCRIPTION        factor     2383
## 19    19                              SUBJECT_OFFENSE        factor     2383
## 20    20                               REPORTING_AREA        factor     2383
## 21    21                                         BEAT        factor     2383
## 22    22                                       SECTOR        factor     2383
## 23    23                                     DIVISION        factor     2383
## 24    24                            LOCATION_DISTRICT        factor     2383
## 25    25                                STREET_NUMBER        factor     2383
## 26    26                                  STREET_NAME        factor     2383
## 27    27                             STREET_DIRECTION        factor     2383
## 28    28                                  STREET_TYPE        factor     2383
## 29    29 LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION        factor     2383
## 30    30                                LOCATION_CITY        factor     2383
## 31    31                               LOCATION_STATE        factor     2383
## 32    32                            LOCATION_LATITUDE        factor     2383
## 33    33                           LOCATION_LONGITUDE        factor     2383
## 34    34                              INCIDENT_REASON        factor     2383
## 35    35                             REASON_FOR_FORCE        factor     2383
## 36    36                          TYPE_OF_FORCE_USED1        factor     2383
## 37    37                          TYPE_OF_FORCE_USED2        factor     2383
## 38    38                          TYPE_OF_FORCE_USED3        factor     2383
## 39    39                          TYPE_OF_FORCE_USED4        factor     2383
## 40    40                          TYPE_OF_FORCE_USED5        factor     2383
## 41    41                          TYPE_OF_FORCE_USED6        factor     2383
## 42    42                          TYPE_OF_FORCE_USED7        factor     2383
## 43    43                          TYPE_OF_FORCE_USED8        factor     2383
## 44    44                          TYPE_OF_FORCE_USED9        factor     2383
## 45    45                         TYPE_OF_FORCE_USED10        factor     2383
## 46    46                             NUMBER_EC_CYCLES        factor     2383
## 47    47                              FORCE_EFFECTIVE        factor     2383
##    Missing_Count Per_of_Missing No_of_distinct_values
## 1              0              0                   353
## 2              0              0                   543
## 3              0              0                  2328
## 4              0              0                  1041
## 5              0              0                     2
## 6              0              0                     6
## 7              0              0                   291
## 8              0              0                    36
## 9              0              0                     2
## 10             0              0                    76
## 11             0              0                     2
## 12             0              0                  1433
## 13             0              0                     7
## 14             0              0                     4
## 15             0              0                     2
## 16             0              0                   193
## 17             0              0                     2
## 18             0              0                    15
## 19             0              0                   551
## 20             0              0                   576
## 21             0              0                   227
## 22             0              0                    35
## 23             0              0                     7
## 24             0              0                    14
## 25             0              0                   856
## 26             0              0                  1080
## 27             0              0                     5
## 28             0              0                    22
## 29             0              0                  1322
## 30             0              0                     1
## 31             0              0                     1
## 32             0              0                  1283
## 33             0              0                  1283
## 34             0              0                    14
## 35             0              0                    12
## 36             0              0                    29
## 37             0              0                    27
## 38             0              0                    25
## 39             0              0                    23
## 40             0              0                    22
## 41             0              0                    18
## 42             0              0                    14
## 43             0              0                     6
## 44             0              0                     2
## 45             0              0                     2
## 46             0              0                    12
## 47             0              0                   104
datatable(data)
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html

The expdata function was used from smartEDa library. The result showed that data contains 2383 observations with 47 columns in the data set as described in the previous section. The data contains factor variables are 47. %. of variables having complete cases 76.6% (36),%. of variables having >=90% missing cases 12.77% (6).

Frequency of Office and Subject Gender

a<-data %>%
  group_by(OFFICER_GENDER) %>%
  summarise(counts = n()) %>%  
  ggplot( aes(x = OFFICER_GENDER, y = counts)) +
  geom_bar(fill = "#0073C2FF", stat = "identity") 
ggplotly(a)
b<-data %>%
  group_by(SUBJECT_GENDER) %>%
  summarise(counts = n()) %>%  
  ggplot( aes(x = SUBJECT_GENDER, y = counts)) +
  geom_bar(fill = "#0073C2FF", stat = "identity") 
ggplotly(b)

The officer gender plot showed that Male police officer was more as compared to Female officers. In data subject gender variable, male was also high as compared to the other variables.

Compariosn between gender and race

c<-data %>% group_by(OFFICER_GENDER,OFFICER_RACE) %>%
    summarise(counts = n()) %>%  
  ggplot() +
  aes(
    x = OFFICER_RACE,
    fill = OFFICER_GENDER,
    weight = counts
  ) +
  geom_bar(position = "dodge") +
  scale_fill_hue(direction = 1) +
  theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_GENDER'. You can override using
## the `.groups` argument.
ggplotly(c)

From the graph it was observed that white race was predominant in the police fore as compared to the black and Hispanic group. In white group male was predominant as we see in previous graph.

Comparison between Officer year of experince with race

d<-data %>% group_by(OFFICER_YEARS_ON_FORCE,OFFICER_RACE) %>%
    summarise(counts = n()) %>%  
  ggplot() +
  aes(
    x = OFFICER_YEARS_ON_FORCE,
    fill = OFFICER_RACE,
    weight = counts
  ) +
  geom_bar(position = "dodge") +
  scale_fill_hue(direction = 1) +
  theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_YEARS_ON_FORCE'. You can override
## using the `.groups` argument.
ggplotly(d)

in the above graph it was observed that white officers are more experience group in the data set. Other race groups have job experience. # Comparison between Officer injury with officer race

e<-data %>% group_by(OFFICER_INJURY,OFFICER_RACE) %>%
    summarise(counts = n()) %>%  
  ggplot() +
  aes(
    x = OFFICER_INJURY,
    fill = OFFICER_RACE,
    weight = counts
  ) +
  geom_bar(position = "dodge") +
  scale_fill_hue(direction = 1) +
  theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_INJURY'. You can override using
## the `.groups` argument.
ggplotly(e)

The comparison between officer injury with race results showed that officer who was injured was was white, the officer who was no injured was also white.

Comparison between Officer hospitalization with officer race

f<-data %>%filter(OFFICER_INJURY=="Yes") %>%  group_by(OFFICER_RACE,OFFICER_HOSPITALIZATION) %>%
    summarise(counts = n()) %>%  
  ggplot() +
  aes(
    x = OFFICER_HOSPITALIZATION,
    fill = OFFICER_RACE,
    weight = counts
  ) +
  geom_bar(position = "dodge") +
  scale_fill_hue(direction = 1) +
  theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
ggplotly(f)

The results showed that white was mostly hosspitalized during incident followed by hispanic group.

Comparison between Officer hospitalization with officer race

g<-data %>%filter(OFFICER_INJURY=="Yes") %>%  group_by(SUBJECT_INJURY,SUBJECT_GENDER) %>%
    summarise(counts = n()) %>%  
  ggplot() +
  aes(
    x = SUBJECT_INJURY,
    fill = SUBJECT_GENDER,
    weight = counts
  ) +
  geom_bar(position = "dodge") +
  scale_fill_hue(direction = 1) +
  theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'SUBJECT_INJURY'. You can override using
## the `.groups` argument.
ggplotly(g)

The result showed that Males was mostly injured in the during the incident.

Comparison between Street type with officer injury type race

data %>%filter(OFFICER_INJURY=="Yes") %>%  group_by(STREET_TYPE,OFFICER_INJURY_TYPE) %>%
    summarise(counts = n()) %>% arrange(desc(counts))
## `summarise()` has grouped output by 'STREET_TYPE'. You can override using the
## `.groups` argument.
## # A tibble: 128 × 3
## # Groups:   STREET_TYPE [15]
##    STREET_TYPE OFFICER_INJURY_TYPE          counts
##    <fct>       <fct>                         <int>
##  1 St.         Abrasion/Scrape                  16
##  2 Rd.         Abrasion/Scrape                  11
##  3 St.         No injuries noted or visible     10
##  4 Ave.        Abrasion/Scrape                   8
##  5 Ave.        No injuries noted or visible      8
##  6 Blvd.       Abrasion/Scrape                   7
##  7 Blvd.       No injuries noted or visible      7
##  8 Rd.         No injuries noted or visible      6
##  9 Dr.         No injuries noted or visible      5
## 10 Dr.         Abrasion/Scrape                   4
## # ℹ 118 more rows

On the St. the mostly officer was abrasion injured type followed by Rd streat type. # Comparison of location with officer injury type

data %>%filter(OFFICER_INJURY=="Yes") %>%  group_by(LOCATION_STATE,OFFICER_INJURY_TYPE) %>%
    summarise(counts = n()) %>% arrange(desc(counts))
## `summarise()` has grouped output by 'LOCATION_STATE'. You can override using
## the `.groups` argument.
## # A tibble: 71 × 3
## # Groups:   LOCATION_STATE [1]
##    LOCATION_STATE OFFICER_INJURY_TYPE               counts
##    <fct>          <fct>                              <int>
##  1 TX             Abrasion/Scrape                       59
##  2 TX             No injuries noted or visible          44
##  3 TX             Laceration/Cut                        13
##  4 TX             Sprain/Strain                         13
##  5 TX             Redness/Swelling                      10
##  6 TX             Bruise                                 7
##  7 TX             Fluid Exposure                         6
##  8 TX             Laceration/Cut, Abrasion/Scrape        6
##  9 TX             Abrasion/Scrape, Redness/Swelling      4
## 10 TX             Abrasion/Scrape, Bruise                3
## # ℹ 61 more rows

In the TX state mostly officer was injured that was highest (59).