Data used: “Baltimore_Officer_Involved_Use_Of_Force2019.csv”. File saved to a designated folder and named for first looks.
setwd("~/Desktop/MC Data Science /DATA 110 /DataSets ")
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Balt <- read_csv("Baltimore_Officer_Involved_Use_Of_Force2019.csv")
## Parsed with column specification:
## cols(
## DATE = col_character(),
## `CC#` = col_character(),
## DISTRICT = col_character(),
## LOCATION = col_character(),
## TYPE = col_character(),
## `X (LONG)` = col_double(),
## `Y (LAT)` = col_double(),
## COORDINATES = col_character(),
## `Zip Codes` = col_double()
## )
Balt
## # A tibble: 68 x 9
## DATE `CC#` DISTRICT LOCATION TYPE `X (LONG)` `Y (LAT)` COORDINATES
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 6/28… 158F… SWD 3436 Wi… Shoo… -76.7 39.3 (39.27293,…
## 2 9/28… 158I… SWD 1900 Gr… Shoo… -76.7 39.3 (39.266968…
## 3 11/1… 153K… ED 700 Mur… Shoo… -76.6 39.3 (39.304279…
## 4 10/1… 154J… NED 3627 El… Taser -76.6 39.3 (39.315777…
## 5 11/6… 157K… WD 1007 Br… Inju… -76.6 39.3 (39.296035…
## 6 11/2… 155K… ND 3300 Fa… Inju… -76.6 39.3 (39.327054…
## 7 7/27… 158G… SWD 200 S A… Shoo… -76.7 39.3 (39.283278…
## 8 10/6… 154J… NED 7208 Ha… Taser -76.5 39.4 (39.368546…
## 9 11/8… 154K… NED 3400 Ha… Hands -76.6 39.3 (39.329344…
## 10 11/1… 156K… NWD 3700 Oa… Shoo… -76.7 39.3 (39.343138…
## # … with 58 more rows, and 1 more variable: `Zip Codes` <dbl>
Looking at the breakdown of the data by looking at structure, the top half and the bottom half of the data.
str(Balt)
## tibble [68 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ DATE : chr [1:68] "6/28/2015 0:00" "9/28/2015 0:00" "11/15/2015 0:00" "10/11/2015 0:00" ...
## $ CC# : chr [1:68] "158F12891" "158I12884" "153K06453" "154J04680" ...
## $ DISTRICT : chr [1:68] "SWD" "SWD" "ED" "NED" ...
## $ LOCATION : chr [1:68] "3436 Wilkens Ave" "1900 Grinnalds Ave" "700 Mura St" "3627 Elmora Ave" ...
## $ TYPE : chr [1:68] "Shooting" "Shooting" "Shooting" "Taser" ...
## $ X (LONG) : num [1:68] -76.7 -76.7 -76.6 -76.6 -76.6 ...
## $ Y (LAT) : num [1:68] 39.3 39.3 39.3 39.3 39.3 ...
## $ COORDINATES: chr [1:68] "(39.27293, -76.675684)" "(39.266968, -76.653546)" "(39.304279, -76.610221)" "(39.315777, -76.569463)" ...
## $ Zip Codes : num [1:68] 27950 27953 13645 26956 27301 ...
## - attr(*, "spec")=
## .. cols(
## .. DATE = col_character(),
## .. `CC#` = col_character(),
## .. DISTRICT = col_character(),
## .. LOCATION = col_character(),
## .. TYPE = col_character(),
## .. `X (LONG)` = col_double(),
## .. `Y (LAT)` = col_double(),
## .. COORDINATES = col_character(),
## .. `Zip Codes` = col_double()
## .. )
head(Balt)
## # A tibble: 6 x 9
## DATE `CC#` DISTRICT LOCATION TYPE `X (LONG)` `Y (LAT)` COORDINATES
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 6/28… 158F… SWD 3436 Wi… Shoo… -76.7 39.3 (39.27293,…
## 2 9/28… 158I… SWD 1900 Gr… Shoo… -76.7 39.3 (39.266968…
## 3 11/1… 153K… ED 700 Mur… Shoo… -76.6 39.3 (39.304279…
## 4 10/1… 154J… NED 3627 El… Taser -76.6 39.3 (39.315777…
## 5 11/6… 157K… WD 1007 Br… Inju… -76.6 39.3 (39.296035…
## 6 11/2… 155K… ND 3300 Fa… Inju… -76.6 39.3 (39.327054…
## # … with 1 more variable: `Zip Codes` <dbl>
tail(Balt)
## # A tibble: 6 x 9
## DATE `CC#` DISTRICT LOCATION TYPE `X (LONG)` `Y (LAT)` COORDINATES
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 11/2… 143K… ED 2200 Ki… Vehi… -76.6 39.3 (39.314761…
## 2 5/16… 143E… ED 1200 En… Shoo… -76.6 39.3 (39.304325…
## 3 3/28… 132C… SED 6706 Da… Shoo… -76.5 39.3 (39.276956…
## 4 3/26… 144C… NED 4200 An… Vehi… -76.5 39.3 (39.339846…
## 5 1/22… 156A… NWD 4311 Pi… Shoo… -76.7 39.3 (39.33955,…
## 6 10/1… 134J… NED 3800 Ma… Shoo… -76.6 39.3 (39.322979…
## # … with 1 more variable: `Zip Codes` <dbl>
I narrowed down the specific columns I want to focus on further.
Balt1 <- Balt %>% select(DATE, DISTRICT, TYPE,`Zip Codes`)
Balt1
## # A tibble: 68 x 4
## DATE DISTRICT TYPE `Zip Codes`
## <chr> <chr> <chr> <dbl>
## 1 6/28/2015 0:00 SWD Shooting 27950
## 2 9/28/2015 0:00 SWD Shooting 27953
## 3 11/15/2015 0:00 ED Shooting 13645
## 4 10/11/2015 0:00 NED Taser 26956
## 5 11/6/2015 0:00 WD Injured Person 27301
## 6 11/20/2015 0:00 ND Injured Person 14006
## 7 7/27/2015 0:00 SWD Shooting 27950
## 8 10/6/2015 0:00 NED Taser 27957
## 9 11/8/2015 0:00 NED Hands 27307
## 10 11/11/2015 0:00 NWD Shooting 27295
## # … with 58 more rows
Cleaning up the data a little bit more by getting rid of any rows that are missing the full set of information. Also, changed the format of the DATE column to remove the presence of “0:00” as it provides no valuable information.
Balt2 <- na.omit(Balt1)
Balt2$DATE <- as.Date(Balt2$DATE, "%m/%d/%Y")
Balt2
## # A tibble: 65 x 4
## DATE DISTRICT TYPE `Zip Codes`
## <date> <chr> <chr> <dbl>
## 1 2015-06-28 SWD Shooting 27950
## 2 2015-09-28 SWD Shooting 27953
## 3 2015-11-15 ED Shooting 13645
## 4 2015-10-11 NED Taser 26956
## 5 2015-11-06 WD Injured Person 27301
## 6 2015-11-20 ND Injured Person 14006
## 7 2015-07-27 SWD Shooting 27950
## 8 2015-10-06 NED Taser 27957
## 9 2015-11-08 NED Hands 27307
## 10 2015-11-11 NWD Shooting 27295
## # … with 55 more rows
Want to look at the most incidents by graphing the types of force present in the data.
library(ggplot2)
Balt_bar <- ggplot(Balt2, aes(x= TYPE)) +
geom_bar()+
xlab("Type of Force") + ylab("Total")+
ggtitle("Use of Force By Type")
Balt_bar
Top 3 uses of force: Shooting, Vehicle, Discharge
#Breakdown By Year Add a new column to just show years of incidents using mutate. Plot the new column to see which year had the most incidents of use of force.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Balt3 <- Balt2 %>%
group_by(TYPE) %>%
arrange (desc(DATE)) %>%
mutate(year= ymd(DATE))
Balt3$year <- format(Balt3$DATE, "%Y")
head(Balt3)
## # A tibble: 6 x 5
## # Groups: TYPE [3]
## DATE DISTRICT TYPE `Zip Codes` year
## <date> <chr> <chr> <dbl> <chr>
## 1 2015-11-20 ND Injured Person 14006 2015
## 2 2015-11-15 ED Shooting 13645 2015
## 3 2015-11-15 SD Injured Person 27953 2015
## 4 2015-11-11 NWD Shooting 27295 2015
## 5 2015-11-08 NED Hands 27307 2015
## 6 2015-11-06 WD Injured Person 27301 2015
Balt_bar2 <- Balt3 %>% ggplot(aes(x=year)) +
geom_bar() +
xlab("Year") + ylab("Total") +
ggtitle("Years of Incidents")
Balt_bar2
2014 had the most years of incidents.
Update the data to only show the top 3 types of use of force. Plot to show the variations in its use by police in 2013-2015.
library(RColorBrewer)
Balt4 <- subset(Balt3, TYPE== c("Shooting", "Vehicle", "Discharge"))
## Warning in TYPE == c("Shooting", "Vehicle", "Discharge"): longer object length
## is not a multiple of shorter object length
head(Balt4)
## # A tibble: 6 x 5
## # Groups: TYPE [3]
## DATE DISTRICT TYPE `Zip Codes` year
## <date> <chr> <chr> <dbl> <chr>
## 1 2015-11-11 NWD Shooting 27295 2015
## 2 2015-07-27 SWD Shooting 27950 2015
## 3 2015-04-18 SWD Shooting 27297 2015
## 4 2014-12-28 ED Shooting 13987 2014
## 5 2014-11-23 ED Vehicle 27307 2014
## 6 2014-11-15 SD Discharge 27632 2014
Balt_p <- ggplot(Balt4, aes(TYPE,fill= year)) +
geom_bar() +
xlab("Type of Force") + ylab("Total")+
ggtitle("Top 3 Types of Force Over the Years") +
scale_fill_brewer(palette = "Set3") +
coord_flip()
Balt_p
With the most amount of incidients in 2014, it makes sense that the top 3 types of force were used all together in various instances in that year. 2013 and 2015 had one or the other.
For this project, I used one of the policing datasets provided. The specific one I chose shows the breakdown of Officer involved use of Force in Baltimore. I downloaded the csv file to the folder I created for the class then set the working directory of the markdown file to that same folder so that everything is in one place. Loaded the tidyverse package and named the dataset “Balt” after having the software convert the file so that it can be further explored. Before proceeding with cleaning, I used structure, head and tail commands to get a broad understanding of what I am working with. This dataset has 9 columns. Date, CC#, District, Location, Type, and Coordinates are characters while X(Long), Y(Lat), and Zip Codes are numeric. I wanted to narrow down how many columns I was looking at to help me figure out what I wanted to explore. Balt1 was created by using select to create a dataset that only has Date, District, Type and Zip Codes. From there, Balt 2 was created to remove any rows of data that were incomplete by using na.omit as well as clean up the Date column to remove the constant of “0:00”. After getting a dataset I was more comfortable working with, I wanted to see the distribution of the types of incidents. Loaded the ggplot library and created a bar graph that is titled “Use of Force By Type”. The graph was instantly crowded as there was a wide variety of force used so I figured for further looks, I would look at the top 3 types of force that were used the most by police; that was shooting, vehicle and discharge. Next variable I wanted to look at was time. The way the dates were formatted needed to be condensed even more so I wanted to be able to look at everything by year. To do that, I created Balt3 by adding a completely new column that shows the year of each row. Then I created another bar graph to see the distribution of the data by year. From that graph, 2014 was the year of the most incidents. The final visualization combines the types of force and years. I condensed the data even further using subset so that the only types of force shown were that of the top 3 called Balt4. I then used Balt4 to create one last bar graph where the results of types of force are observed over the years of the data. From that graph shooting is the one that was used consistently in all 3 years while discharge and vehicle were only used in 2014. Something that was quite prevalent from diving into this dataset is that police use shooting heavily in Baltimore. So from a microscale of data from a city, it highlights that police like to use a means of force that increases the risk of serious injury and even death. I think in only looking at a 3 different types was too small. I probably should have expanded it to 5 or 6 of the types then created a facet wrap to see the variations in their uses in the three years. Another aspect I wanted to look at was the distribution of the data in 2015, that was the year Freddie Gray was killed by police. It would be interesting to see if there was any changes in police action before his death and after his death in that year.