Overview

For the purpose of this assignment, I will import data on NYPD arrests for this year up until July. I then want to manipulate the data to get a summary of the top 5 crimes in NYC, where the top crimes are happening and break the perps down by age group.

Here is a list

Import Libraries

This code will need tidyverse and lubridate to manipulate the data as well as ggplot2 to plot the data.

library(tidyverse)
library(ggplot2)
library(lubridate)
library(RColorBrewer)
library(ggpubr)

Import Data

I downloadeda dataset of NYPD arrests from NYC Open Data for 2019 up unti July. Let’s import it using read_csv()

This imports the data from the csv and makes it available in a dataFrame called nypdarrests

nypdarrests <- read_csv('NYPD_Arrest_Data__Year_to_Date_.csv')

Simplify the Data

I would like to simplify this dataset by looking only at the Offense Description. Let’s make a new dataFrame that only has one column called Arrests

Arrests <- nypdarrests %>% count(OFNS_DESC)

Find the Most Common Arrests

I would like to simplify this data further by viewing the top 5 arrests only and arrange them in descending order. We will use dplyr’s desc function to arrange the data in order and the filter function to extract a new dataFrame for the top 5 arrests.

TopArrests <-Arrests %>% arrange(desc(n))

Visual1 <- filter(TopArrests, n >= 7484)

Find the Age Group and Borough with the Most Arrests

I would like to find the age group with the most arrests. We will use dplyr’s count function to get the sum of arrests per age group and the same coding for the Boroughs.

Visual2<- nypdarrests %>% count(AGE_GROUP, name = 'count')
Visual3<- nypdarrests %>% count(ARREST_BORO, name = 'n')

Summarise dataFrames with Percentages

I would like to capture the percentage breakdown for each variable for both age group and borough. We will use mutate function from the ggpubr package to create a new column to the dataFrame with the percentages.

Visual2 <- Visual2 %>% group_by(AGE_GROUP) %>% summarise(count = sum(count)) %>%
  mutate(Percent= paste0(round((count/sum(count)*100),0), '%'))
Visual3 <- Visual3 %>% group_by(ARREST_BORO) %>% summarise(n = sum(n)) %>%
  mutate(Percent= paste0(round((n/sum(n)*100),0), '%'))

Make a Bar Graph for the Most Common Arrests

Use ggplot2 package to create a simple Bar graph

Visual1 <- ggplot(Visual1) +
  aes(x = OFNS_DESC, y = n ) + 
  theme_classic() +
  geom_bar(stat = 'identity', fill = "pink") +
  theme(axis.text.x = element_text(size= 7, angle = 45)) +
  labs(
    title ='Most Common NYPD Arrests',
    subtitle = 'January to July of 2019', 
    y= 'count', 
    x= 'Offense Description', 
    caption = 'Source: NYC Open Data'
  )

Visual1

Make a Pie Chart for Arrests by Age Group and a Pie Chart for Arrests by Borough

We will use ggpubr package to graph a pie chart and we will use the RColorBrewer package to color the pie chart.

ggpie(Visual2, 'count', label = 'Percent', 
      fill = 'AGE_GROUP', palette = brewer.pal(n=5, name = 'Set2'), color = 'white', lab.pos = 'in', lab.font = 'white', 
      main= 'NYPD ARRESTS BY AGE GROUP',caption = 'SOURCE: NYC OPEN DATA 2019')

ggpie(Visual3, 'n', label = 'Percent', 
      fill = 'ARREST_BORO', palette = brewer.pal(n=5, name = 'Pastel2'), color = 'black', lab.pos = 'in', lab.font = 'black', 
      main= 'NYPD ARRESTS BY BOROUGH',caption = 'SOURCE: NYC OPEN DATA 2019')