Load The Data

Data used: “Baltimore_Officer_Involved_Use_Of_Force2019.csv”. File saved to a designated folder and named for first looks.

setwd("~/Desktop/MC Data Science /DATA 110 /DataSets ")
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
Balt <- read_csv("Baltimore_Officer_Involved_Use_Of_Force2019.csv")
## Parsed with column specification:
## cols(
##   DATE = col_character(),
##   `CC#` = col_character(),
##   DISTRICT = col_character(),
##   LOCATION = col_character(),
##   TYPE = col_character(),
##   `X (LONG)` = col_double(),
##   `Y (LAT)` = col_double(),
##   COORDINATES = col_character(),
##   `Zip Codes` = col_double()
## )
Balt
## # A tibble: 68 x 9
##    DATE  `CC#` DISTRICT LOCATION TYPE  `X (LONG)` `Y (LAT)` COORDINATES
##    <chr> <chr> <chr>    <chr>    <chr>      <dbl>     <dbl> <chr>      
##  1 6/28… 158F… SWD      3436 Wi… Shoo…      -76.7      39.3 (39.27293,…
##  2 9/28… 158I… SWD      1900 Gr… Shoo…      -76.7      39.3 (39.266968…
##  3 11/1… 153K… ED       700 Mur… Shoo…      -76.6      39.3 (39.304279…
##  4 10/1… 154J… NED      3627 El… Taser      -76.6      39.3 (39.315777…
##  5 11/6… 157K… WD       1007 Br… Inju…      -76.6      39.3 (39.296035…
##  6 11/2… 155K… ND       3300 Fa… Inju…      -76.6      39.3 (39.327054…
##  7 7/27… 158G… SWD      200 S A… Shoo…      -76.7      39.3 (39.283278…
##  8 10/6… 154J… NED      7208 Ha… Taser      -76.5      39.4 (39.368546…
##  9 11/8… 154K… NED      3400 Ha… Hands      -76.6      39.3 (39.329344…
## 10 11/1… 156K… NWD      3700 Oa… Shoo…      -76.7      39.3 (39.343138…
## # … with 58 more rows, and 1 more variable: `Zip Codes` <dbl>

Look At The Data

Looking at the breakdown of the data by looking at structure, the top half and the bottom half of the data.

str(Balt)
## tibble [68 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ DATE       : chr [1:68] "6/28/2015 0:00" "9/28/2015 0:00" "11/15/2015 0:00" "10/11/2015 0:00" ...
##  $ CC#        : chr [1:68] "158F12891" "158I12884" "153K06453" "154J04680" ...
##  $ DISTRICT   : chr [1:68] "SWD" "SWD" "ED" "NED" ...
##  $ LOCATION   : chr [1:68] "3436 Wilkens Ave" "1900 Grinnalds Ave" "700 Mura St" "3627 Elmora Ave" ...
##  $ TYPE       : chr [1:68] "Shooting" "Shooting" "Shooting" "Taser" ...
##  $ X (LONG)   : num [1:68] -76.7 -76.7 -76.6 -76.6 -76.6 ...
##  $ Y (LAT)    : num [1:68] 39.3 39.3 39.3 39.3 39.3 ...
##  $ COORDINATES: chr [1:68] "(39.27293, -76.675684)" "(39.266968, -76.653546)" "(39.304279, -76.610221)" "(39.315777, -76.569463)" ...
##  $ Zip Codes  : num [1:68] 27950 27953 13645 26956 27301 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   DATE = col_character(),
##   ..   `CC#` = col_character(),
##   ..   DISTRICT = col_character(),
##   ..   LOCATION = col_character(),
##   ..   TYPE = col_character(),
##   ..   `X (LONG)` = col_double(),
##   ..   `Y (LAT)` = col_double(),
##   ..   COORDINATES = col_character(),
##   ..   `Zip Codes` = col_double()
##   .. )
head(Balt)
## # A tibble: 6 x 9
##   DATE  `CC#` DISTRICT LOCATION TYPE  `X (LONG)` `Y (LAT)` COORDINATES
##   <chr> <chr> <chr>    <chr>    <chr>      <dbl>     <dbl> <chr>      
## 1 6/28… 158F… SWD      3436 Wi… Shoo…      -76.7      39.3 (39.27293,…
## 2 9/28… 158I… SWD      1900 Gr… Shoo…      -76.7      39.3 (39.266968…
## 3 11/1… 153K… ED       700 Mur… Shoo…      -76.6      39.3 (39.304279…
## 4 10/1… 154J… NED      3627 El… Taser      -76.6      39.3 (39.315777…
## 5 11/6… 157K… WD       1007 Br… Inju…      -76.6      39.3 (39.296035…
## 6 11/2… 155K… ND       3300 Fa… Inju…      -76.6      39.3 (39.327054…
## # … with 1 more variable: `Zip Codes` <dbl>
tail(Balt)
## # A tibble: 6 x 9
##   DATE  `CC#` DISTRICT LOCATION TYPE  `X (LONG)` `Y (LAT)` COORDINATES
##   <chr> <chr> <chr>    <chr>    <chr>      <dbl>     <dbl> <chr>      
## 1 11/2… 143K… ED       2200 Ki… Vehi…      -76.6      39.3 (39.314761…
## 2 5/16… 143E… ED       1200 En… Shoo…      -76.6      39.3 (39.304325…
## 3 3/28… 132C… SED      6706 Da… Shoo…      -76.5      39.3 (39.276956…
## 4 3/26… 144C… NED      4200 An… Vehi…      -76.5      39.3 (39.339846…
## 5 1/22… 156A… NWD      4311 Pi… Shoo…      -76.7      39.3 (39.33955,…
## 6 10/1… 134J… NED      3800 Ma… Shoo…      -76.6      39.3 (39.322979…
## # … with 1 more variable: `Zip Codes` <dbl>

Narrowing Down The Scope

I narrowed down the specific columns I want to focus on further.

Balt1 <- Balt %>% select(DATE, DISTRICT, TYPE,`Zip Codes`)
Balt1
## # A tibble: 68 x 4
##    DATE            DISTRICT TYPE           `Zip Codes`
##    <chr>           <chr>    <chr>                <dbl>
##  1 6/28/2015 0:00  SWD      Shooting             27950
##  2 9/28/2015 0:00  SWD      Shooting             27953
##  3 11/15/2015 0:00 ED       Shooting             13645
##  4 10/11/2015 0:00 NED      Taser                26956
##  5 11/6/2015 0:00  WD       Injured Person       27301
##  6 11/20/2015 0:00 ND       Injured Person       14006
##  7 7/27/2015 0:00  SWD      Shooting             27950
##  8 10/6/2015 0:00  NED      Taser                27957
##  9 11/8/2015 0:00  NED      Hands                27307
## 10 11/11/2015 0:00 NWD      Shooting             27295
## # … with 58 more rows

Remove NAs

Cleaning up the data a little bit more by getting rid of any rows that are missing the full set of information. Also, changed the format of the DATE column to remove the presence of “0:00” as it provides no valuable information.

Balt2 <- na.omit(Balt1)
Balt2$DATE <- as.Date(Balt2$DATE, "%m/%d/%Y")
Balt2
## # A tibble: 65 x 4
##    DATE       DISTRICT TYPE           `Zip Codes`
##    <date>     <chr>    <chr>                <dbl>
##  1 2015-06-28 SWD      Shooting             27950
##  2 2015-09-28 SWD      Shooting             27953
##  3 2015-11-15 ED       Shooting             13645
##  4 2015-10-11 NED      Taser                26956
##  5 2015-11-06 WD       Injured Person       27301
##  6 2015-11-20 ND       Injured Person       14006
##  7 2015-07-27 SWD      Shooting             27950
##  8 2015-10-06 NED      Taser                27957
##  9 2015-11-08 NED      Hands                27307
## 10 2015-11-11 NWD      Shooting             27295
## # … with 55 more rows

Incidents By Type of Force

Want to look at the most incidents by graphing the types of force present in the data.

library(ggplot2)
Balt_bar <- ggplot(Balt2, aes(x= TYPE)) +
  geom_bar()+
  xlab("Type of Force") + ylab("Total")+
  ggtitle("Use of Force By Type")
Balt_bar

Top 3 uses of force: Shooting, Vehicle, Discharge

#Breakdown By Year Add a new column to just show years of incidents using mutate. Plot the new column to see which year had the most incidents of use of force.

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
Balt3 <- Balt2 %>% 
  group_by(TYPE) %>% 
  arrange (desc(DATE)) %>% 
  mutate(year= ymd(DATE))
Balt3$year <- format(Balt3$DATE, "%Y")
head(Balt3)
## # A tibble: 6 x 5
## # Groups:   TYPE [3]
##   DATE       DISTRICT TYPE           `Zip Codes` year 
##   <date>     <chr>    <chr>                <dbl> <chr>
## 1 2015-11-20 ND       Injured Person       14006 2015 
## 2 2015-11-15 ED       Shooting             13645 2015 
## 3 2015-11-15 SD       Injured Person       27953 2015 
## 4 2015-11-11 NWD      Shooting             27295 2015 
## 5 2015-11-08 NED      Hands                27307 2015 
## 6 2015-11-06 WD       Injured Person       27301 2015
Balt_bar2 <- Balt3 %>% ggplot(aes(x=year)) + 
  geom_bar() + 
  xlab("Year") + ylab("Total") + 
  ggtitle("Years of Incidents") 
Balt_bar2

2014 had the most years of incidents.

How the Top 3 Types of Force Vary Over the Years

Update the data to only show the top 3 types of use of force. Plot to show the variations in its use by police in 2013-2015.

library(RColorBrewer)
Balt4 <- subset(Balt3, TYPE== c("Shooting", "Vehicle", "Discharge"))
## Warning in TYPE == c("Shooting", "Vehicle", "Discharge"): longer object length
## is not a multiple of shorter object length
head(Balt4)
## # A tibble: 6 x 5
## # Groups:   TYPE [3]
##   DATE       DISTRICT TYPE      `Zip Codes` year 
##   <date>     <chr>    <chr>           <dbl> <chr>
## 1 2015-11-11 NWD      Shooting        27295 2015 
## 2 2015-07-27 SWD      Shooting        27950 2015 
## 3 2015-04-18 SWD      Shooting        27297 2015 
## 4 2014-12-28 ED       Shooting        13987 2014 
## 5 2014-11-23 ED       Vehicle         27307 2014 
## 6 2014-11-15 SD       Discharge       27632 2014
Balt_p <- ggplot(Balt4, aes(TYPE,fill= year)) +
  geom_bar() +
  xlab("Type of Force") + ylab("Total")+
  ggtitle("Top 3 Types of Force Over the Years") + 
  scale_fill_brewer(palette = "Set3") + 
  coord_flip()
Balt_p

With the most amount of incidients in 2014, it makes sense that the top 3 types of force were used all together in various instances in that year. 2013 and 2015 had one or the other.

Essay

For this project, I used one of the policing datasets provided. The specific one I chose shows the breakdown of Officer involved use of Force in Baltimore. I downloaded the csv file to the folder I created for the class then set the working directory of the markdown file to that same folder so that everything is in one place. Loaded the tidyverse package and named the dataset “Balt” after having the software convert the file so that it can be further explored. Before proceeding with cleaning, I used structure, head and tail commands to get a broad understanding of what I am working with. This dataset has 9 columns. Date, CC#, District, Location, Type, and Coordinates are characters while X(Long), Y(Lat), and Zip Codes are numeric. I wanted to narrow down how many columns I was looking at to help me figure out what I wanted to explore. Balt1 was created by using select to create a dataset that only has Date, District, Type and Zip Codes. From there, Balt 2 was created to remove any rows of data that were incomplete by using na.omit as well as clean up the Date column to remove the constant of “0:00”. After getting a dataset I was more comfortable working with, I wanted to see the distribution of the types of incidents. Loaded the ggplot library and created a bar graph that is titled “Use of Force By Type”. The graph was instantly crowded as there was a wide variety of force used so I figured for further looks, I would look at the top 3 types of force that were used the most by police; that was shooting, vehicle and discharge. Next variable I wanted to look at was time. The way the dates were formatted needed to be condensed even more so I wanted to be able to look at everything by year. To do that, I created Balt3 by adding a completely new column that shows the year of each row. Then I created another bar graph to see the distribution of the data by year. From that graph, 2014 was the year of the most incidents. The final visualization combines the types of force and years. I condensed the data even further using subset so that the only types of force shown were that of the top 3 called Balt4. I then used Balt4 to create one last bar graph where the results of types of force are observed over the years of the data. From that graph shooting is the one that was used consistently in all 3 years while discharge and vehicle were only used in 2014. Something that was quite prevalent from diving into this dataset is that police use shooting heavily in Baltimore. So from a microscale of data from a city, it highlights that police like to use a means of force that increases the risk of serious injury and even death. I think in only looking at a 3 different types was too small. I probably should have expanded it to 5 or 6 of the types then created a facet wrap to see the variations in their uses in the three years. Another aspect I wanted to look at was the distribution of the data in 2015, that was the year Freddie Gray was killed by police. It would be interesting to see if there was any changes in police action before his death and after his death in that year.