Transform your raw data into actionable insights. Let my expertise in R and advanced data analysis techniques unlock the power of your information. Get a personalized consultation and see how I can streamline your projects, saving you time and driving better decision-making. Contact me today at info@data03.online or visit to schedule a call.
The data set provided by the Dallas Police Department includes information about the subjects involved in occurrences. Injuries may occur throughout this course, which has also been recorded. Other information provided includes officer and subject race, gender, officer force type, and so on. The primary goal of the data presented is to investigate whether there is a race effect on arrests and other crime-related incidents involving both parties. We shall examine this question using the procedures outlined below.
data <- read.csv("37-00049_UOF-P_2016_prepped.csv")
Data was successfully loaded in R and called it as data.
dim(data)
## [1] 2384 47
colnames(data)
## [1] "INCIDENT_DATE"
## [2] "INCIDENT_TIME"
## [3] "UOF_NUMBER"
## [4] "OFFICER_ID"
## [5] "OFFICER_GENDER"
## [6] "OFFICER_RACE"
## [7] "OFFICER_HIRE_DATE"
## [8] "OFFICER_YEARS_ON_FORCE"
## [9] "OFFICER_INJURY"
## [10] "OFFICER_INJURY_TYPE"
## [11] "OFFICER_HOSPITALIZATION"
## [12] "SUBJECT_ID"
## [13] "SUBJECT_RACE"
## [14] "SUBJECT_GENDER"
## [15] "SUBJECT_INJURY"
## [16] "SUBJECT_INJURY_TYPE"
## [17] "SUBJECT_WAS_ARRESTED"
## [18] "SUBJECT_DESCRIPTION"
## [19] "SUBJECT_OFFENSE"
## [20] "REPORTING_AREA"
## [21] "BEAT"
## [22] "SECTOR"
## [23] "DIVISION"
## [24] "LOCATION_DISTRICT"
## [25] "STREET_NUMBER"
## [26] "STREET_NAME"
## [27] "STREET_DIRECTION"
## [28] "STREET_TYPE"
## [29] "LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION"
## [30] "LOCATION_CITY"
## [31] "LOCATION_STATE"
## [32] "LOCATION_LATITUDE"
## [33] "LOCATION_LONGITUDE"
## [34] "INCIDENT_REASON"
## [35] "REASON_FOR_FORCE"
## [36] "TYPE_OF_FORCE_USED1"
## [37] "TYPE_OF_FORCE_USED2"
## [38] "TYPE_OF_FORCE_USED3"
## [39] "TYPE_OF_FORCE_USED4"
## [40] "TYPE_OF_FORCE_USED5"
## [41] "TYPE_OF_FORCE_USED6"
## [42] "TYPE_OF_FORCE_USED7"
## [43] "TYPE_OF_FORCE_USED8"
## [44] "TYPE_OF_FORCE_USED9"
## [45] "TYPE_OF_FORCE_USED10"
## [46] "NUMBER_EC_CYCLES"
## [47] "FORCE_EFFECTIVE"
data<-data[-1,]
head(data)
## INCIDENT_DATE INCIDENT_TIME UOF_NUMBER OFFICER_ID OFFICER_GENDER
## 2 09-03-16 4:14:00 AM 37702 10810 Male
## 3 3/22/16 11:00:00 PM 33413 7706 Male
## 4 5/22/16 1:29:00 PM 34567 11014 Male
## 5 01-10-16 8:55:00 PM 31460 6692 Male
## 6 11-08-16 2:30:00 AM 37879, 37898 9844 Male
## 7 09-11-16 7:20:00 PM 36724 9855 Male
## OFFICER_RACE OFFICER_HIRE_DATE OFFICER_YEARS_ON_FORCE OFFICER_INJURY
## 2 Black 05-07-14 2 No
## 3 White 01-08-99 17 Yes
## 4 Black 5/20/15 1 No
## 5 Black 7/29/91 24 No
## 6 White 10-04-09 7 No
## 7 White 06-10-09 7 No
## OFFICER_INJURY_TYPE OFFICER_HOSPITALIZATION SUBJECT_ID SUBJECT_RACE
## 2 No injuries noted or visible No 46424 Black
## 3 Sprain/Strain Yes 44324 Hispanic
## 4 No injuries noted or visible No 45126 Hispanic
## 5 No injuries noted or visible No 43150 Hispanic
## 6 No injuries noted or visible No 47307 Black
## 7 No injuries noted or visible No 46549 White
## SUBJECT_GENDER SUBJECT_INJURY SUBJECT_INJURY_TYPE
## 2 Female Yes Non-Visible Injury/Pain
## 3 Male No No injuries noted or visible
## 4 Male No No injuries noted or visible
## 5 Male Yes Laceration/Cut
## 6 Male No No injuries noted or visible
## 7 Female No No injuries noted or visible
## SUBJECT_WAS_ARRESTED SUBJECT_DESCRIPTION SUBJECT_OFFENSE
## 2 Yes Mentally unstable APOWW
## 3 Yes Mentally unstable APOWW
## 4 Yes Unknown APOWW
## 5 Yes FD-Unknown if Armed Evading Arrest
## 6 Yes Unknown Other Misdemeanor Arrest
## 7 Yes Unknown Assault/FV
## REPORTING_AREA BEAT SECTOR DIVISION LOCATION_DISTRICT STREET_NUMBER
## 2 2062 134 130 CENTRAL D14 211
## 3 1197 237 230 NORTHEAST D9 7647
## 4 4153 432 430 SOUTHWEST D6 716
## 5 4523 641 640 NORTH CENTRAL D11 5600
## 6 2167 346 340 SOUTHEAST D7 4600
## 7 1134 235 230 NORTHEAST D9 1234
## STREET_NAME STREET_DIRECTION STREET_TYPE
## 2 Ervay N St.
## 3 Ferguson NULL Rd.
## 4 bimebella dr NULL Ln.
## 5 LBJ NULL Frwy.
## 6 Malcolm X S Blvd.
## 7 Peavy NULL Rd.
## LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION LOCATION_CITY LOCATION_STATE
## 2 211 N ERVAY ST Dallas TX
## 3 7647 FERGUSON RD Dallas TX
## 4 716 BIMEBELLA LN Dallas TX
## 5 5600 L B J FWY Dallas TX
## 6 4600 S MALCOLM X BLVD Dallas TX
## 7 1234 PEAVY RD Dallas TX
## LOCATION_LATITUDE LOCATION_LONGITUDE INCIDENT_REASON REASON_FOR_FORCE
## 2 32.782205 -96.797461 Arrest Arrest
## 3 32.798978 -96.717493 Arrest Arrest
## 4 32.73971 -96.92519 Arrest Arrest
## 5 Arrest Arrest
## 6 Arrest Arrest
## 7 32.837527 -96.695566 Arrest Arrest
## TYPE_OF_FORCE_USED1 TYPE_OF_FORCE_USED2 TYPE_OF_FORCE_USED3
## 2 Hand/Arm/Elbow Strike
## 3 Joint Locks
## 4 Take Down - Group
## 5 K-9 Deployment
## 6 Verbal Command Take Down - Arm
## 7 Hand Controlled Escort
## TYPE_OF_FORCE_USED4 TYPE_OF_FORCE_USED5 TYPE_OF_FORCE_USED6
## 2
## 3
## 4
## 5
## 6
## 7
## TYPE_OF_FORCE_USED7 TYPE_OF_FORCE_USED8 TYPE_OF_FORCE_USED9
## 2
## 3
## 4
## 5
## 6
## 7
## TYPE_OF_FORCE_USED10 NUMBER_EC_CYCLES FORCE_EFFECTIVE
## 2 NULL Yes
## 3 NULL Yes
## 4 NULL Yes
## 5 NULL Yes
## 6 NULL No, Yes
## 7 NULL Yes
library(dplyr)
data<-data %>% mutate_if(is.character, as.factor)
The data set consists of 47 columns and 2348 observations. The data set comprises the names listed below. Because the data set has duplicate column names, the second row from the data set will be eliminated before proceeding. Using R’s str function, we were able to get insight into the data set’s structure. The data set’s columns were all in character format. The data exploration was carried out by utilising the different libraries.
library(SmartEDA)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(DT)
library(ggplot2)
ExpData(data, type=2)
## Index Variable_Name Variable_Type Sample_n
## 1 1 INCIDENT_DATE factor 2383
## 2 2 INCIDENT_TIME factor 2383
## 3 3 UOF_NUMBER factor 2383
## 4 4 OFFICER_ID factor 2383
## 5 5 OFFICER_GENDER factor 2383
## 6 6 OFFICER_RACE factor 2383
## 7 7 OFFICER_HIRE_DATE factor 2383
## 8 8 OFFICER_YEARS_ON_FORCE factor 2383
## 9 9 OFFICER_INJURY factor 2383
## 10 10 OFFICER_INJURY_TYPE factor 2383
## 11 11 OFFICER_HOSPITALIZATION factor 2383
## 12 12 SUBJECT_ID factor 2383
## 13 13 SUBJECT_RACE factor 2383
## 14 14 SUBJECT_GENDER factor 2383
## 15 15 SUBJECT_INJURY factor 2383
## 16 16 SUBJECT_INJURY_TYPE factor 2383
## 17 17 SUBJECT_WAS_ARRESTED factor 2383
## 18 18 SUBJECT_DESCRIPTION factor 2383
## 19 19 SUBJECT_OFFENSE factor 2383
## 20 20 REPORTING_AREA factor 2383
## 21 21 BEAT factor 2383
## 22 22 SECTOR factor 2383
## 23 23 DIVISION factor 2383
## 24 24 LOCATION_DISTRICT factor 2383
## 25 25 STREET_NUMBER factor 2383
## 26 26 STREET_NAME factor 2383
## 27 27 STREET_DIRECTION factor 2383
## 28 28 STREET_TYPE factor 2383
## 29 29 LOCATION_FULL_STREET_ADDRESS_OR_INTERSECTION factor 2383
## 30 30 LOCATION_CITY factor 2383
## 31 31 LOCATION_STATE factor 2383
## 32 32 LOCATION_LATITUDE factor 2383
## 33 33 LOCATION_LONGITUDE factor 2383
## 34 34 INCIDENT_REASON factor 2383
## 35 35 REASON_FOR_FORCE factor 2383
## 36 36 TYPE_OF_FORCE_USED1 factor 2383
## 37 37 TYPE_OF_FORCE_USED2 factor 2383
## 38 38 TYPE_OF_FORCE_USED3 factor 2383
## 39 39 TYPE_OF_FORCE_USED4 factor 2383
## 40 40 TYPE_OF_FORCE_USED5 factor 2383
## 41 41 TYPE_OF_FORCE_USED6 factor 2383
## 42 42 TYPE_OF_FORCE_USED7 factor 2383
## 43 43 TYPE_OF_FORCE_USED8 factor 2383
## 44 44 TYPE_OF_FORCE_USED9 factor 2383
## 45 45 TYPE_OF_FORCE_USED10 factor 2383
## 46 46 NUMBER_EC_CYCLES factor 2383
## 47 47 FORCE_EFFECTIVE factor 2383
## Missing_Count Per_of_Missing No_of_distinct_values
## 1 0 0 353
## 2 0 0 543
## 3 0 0 2328
## 4 0 0 1041
## 5 0 0 2
## 6 0 0 6
## 7 0 0 291
## 8 0 0 36
## 9 0 0 2
## 10 0 0 76
## 11 0 0 2
## 12 0 0 1433
## 13 0 0 7
## 14 0 0 4
## 15 0 0 2
## 16 0 0 193
## 17 0 0 2
## 18 0 0 15
## 19 0 0 551
## 20 0 0 576
## 21 0 0 227
## 22 0 0 35
## 23 0 0 7
## 24 0 0 14
## 25 0 0 856
## 26 0 0 1080
## 27 0 0 5
## 28 0 0 22
## 29 0 0 1322
## 30 0 0 1
## 31 0 0 1
## 32 0 0 1283
## 33 0 0 1283
## 34 0 0 14
## 35 0 0 12
## 36 0 0 29
## 37 0 0 27
## 38 0 0 25
## 39 0 0 23
## 40 0 0 22
## 41 0 0 18
## 42 0 0 14
## 43 0 0 6
## 44 0 0 2
## 45 0 0 2
## 46 0 0 12
## 47 0 0 104
datatable(data)
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
The expdata function was used from smartEDa library. The result showed that data contains 2383 observations with 47 columns in the data set as described in the previous section. The data contains factor variables are 47. %. of variables having complete cases 76.6% (36),%. of variables having >=90% missing cases 12.77% (6).
a<-data %>%
group_by(OFFICER_GENDER) %>%
summarise(counts = n()) %>%
ggplot( aes(x = OFFICER_GENDER, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity")
ggplotly(a)
b<-data %>%
group_by(SUBJECT_GENDER) %>%
summarise(counts = n()) %>%
ggplot( aes(x = SUBJECT_GENDER, y = counts)) +
geom_bar(fill = "#0073C2FF", stat = "identity")
ggplotly(b)
The officer gender plot showed that Male police officer was more as compared to Female officers. In data subject gender variable, male was also high as compared to the other variables.
c<-data %>% group_by(OFFICER_GENDER,OFFICER_RACE) %>%
summarise(counts = n()) %>%
ggplot() +
aes(
x = OFFICER_RACE,
fill = OFFICER_GENDER,
weight = counts
) +
geom_bar(position = "dodge") +
scale_fill_hue(direction = 1) +
theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_GENDER'. You can override using
## the `.groups` argument.
ggplotly(c)
From the graph it was observed that white race was predominant in the police fore as compared to the black and Hispanic group. In white group male was predominant as we see in previous graph.
d<-data %>% group_by(OFFICER_YEARS_ON_FORCE,OFFICER_RACE) %>%
summarise(counts = n()) %>%
ggplot() +
aes(
x = OFFICER_YEARS_ON_FORCE,
fill = OFFICER_RACE,
weight = counts
) +
geom_bar(position = "dodge") +
scale_fill_hue(direction = 1) +
theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_YEARS_ON_FORCE'. You can override
## using the `.groups` argument.
ggplotly(d)
in the above graph it was observed that white officers are more experience group in the data set. Other race groups have job experience. # Comparison between Officer injury with officer race
e<-data %>% group_by(OFFICER_INJURY,OFFICER_RACE) %>%
summarise(counts = n()) %>%
ggplot() +
aes(
x = OFFICER_INJURY,
fill = OFFICER_RACE,
weight = counts
) +
geom_bar(position = "dodge") +
scale_fill_hue(direction = 1) +
theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_INJURY'. You can override using
## the `.groups` argument.
ggplotly(e)
The comparison between officer injury with race results showed that officer who was injured was was white, the officer who was no injured was also white.
f<-data %>%filter(OFFICER_INJURY=="Yes") %>% group_by(OFFICER_RACE,OFFICER_HOSPITALIZATION) %>%
summarise(counts = n()) %>%
ggplot() +
aes(
x = OFFICER_HOSPITALIZATION,
fill = OFFICER_RACE,
weight = counts
) +
geom_bar(position = "dodge") +
scale_fill_hue(direction = 1) +
theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'OFFICER_RACE'. You can override using the
## `.groups` argument.
ggplotly(f)
The results showed that white was mostly hosspitalized during incident followed by hispanic group.
g<-data %>%filter(OFFICER_INJURY=="Yes") %>% group_by(SUBJECT_INJURY,SUBJECT_GENDER) %>%
summarise(counts = n()) %>%
ggplot() +
aes(
x = SUBJECT_INJURY,
fill = SUBJECT_GENDER,
weight = counts
) +
geom_bar(position = "dodge") +
scale_fill_hue(direction = 1) +
theme_minimal() +coord_flip()
## `summarise()` has grouped output by 'SUBJECT_INJURY'. You can override using
## the `.groups` argument.
ggplotly(g)
The result showed that Males was mostly injured in the during the incident.
data %>%filter(OFFICER_INJURY=="Yes") %>% group_by(STREET_TYPE,OFFICER_INJURY_TYPE) %>%
summarise(counts = n()) %>% arrange(desc(counts))
## `summarise()` has grouped output by 'STREET_TYPE'. You can override using the
## `.groups` argument.
## # A tibble: 128 × 3
## # Groups: STREET_TYPE [15]
## STREET_TYPE OFFICER_INJURY_TYPE counts
## <fct> <fct> <int>
## 1 St. Abrasion/Scrape 16
## 2 Rd. Abrasion/Scrape 11
## 3 St. No injuries noted or visible 10
## 4 Ave. Abrasion/Scrape 8
## 5 Ave. No injuries noted or visible 8
## 6 Blvd. Abrasion/Scrape 7
## 7 Blvd. No injuries noted or visible 7
## 8 Rd. No injuries noted or visible 6
## 9 Dr. No injuries noted or visible 5
## 10 Dr. Abrasion/Scrape 4
## # ℹ 118 more rows
On the St. the mostly officer was abrasion injured type followed by Rd streat type. # Comparison of location with officer injury type
data %>%filter(OFFICER_INJURY=="Yes") %>% group_by(LOCATION_STATE,OFFICER_INJURY_TYPE) %>%
summarise(counts = n()) %>% arrange(desc(counts))
## `summarise()` has grouped output by 'LOCATION_STATE'. You can override using
## the `.groups` argument.
## # A tibble: 71 × 3
## # Groups: LOCATION_STATE [1]
## LOCATION_STATE OFFICER_INJURY_TYPE counts
## <fct> <fct> <int>
## 1 TX Abrasion/Scrape 59
## 2 TX No injuries noted or visible 44
## 3 TX Laceration/Cut 13
## 4 TX Sprain/Strain 13
## 5 TX Redness/Swelling 10
## 6 TX Bruise 7
## 7 TX Fluid Exposure 6
## 8 TX Laceration/Cut, Abrasion/Scrape 6
## 9 TX Abrasion/Scrape, Redness/Swelling 4
## 10 TX Abrasion/Scrape, Bruise 3
## # ℹ 61 more rows
In the TX state mostly officer was injured that was highest (59).