Abstract

The current project is aimed to explore the crime rate in the current year. The dataset used in this project is found in this link which is provided by New York Police Department

Data Source

NYPD Complaint Data Current (Year To Date) link: https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243

Data Last Updated: October 24, 2018 Data Provided by: Police Department (NYPD)

This is a breakdown of every criminal complaint report filed in NYC by the NYPD for the current calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning. Each record represents a criminal complaint in NYC and includes information abot the type of crime, the location and time of enforcement. In addition, information related to victim and suspect demographics is also included.

Required Packages/libraries

if (!require('ggplot2')) install.packages('ggplot2')
if (!require('dplyr')) install.packages('dplyr')
if (!require('leaflet')) install.packages('leaflet')
if (!require('scales')) install.packages('scales')
if (!require('readr')) install.packages('readr')
if (!require('ggmap')) install.packages('ggmap')
if(!require('ggrepel')) install.packages("ggrepel")
library(lubridate)
library(stringr)

Data Preprocessing

Read the data

Load the data using readr and read_csv().

Importing data

# Import data
path <- "C:\\Users\\patel\\Desktop\\SPS\\SPS_DATA_607\\final_project\\NYPD_Complaint_Data_Current__Year_To_Date_.csv"
df <- read_csv(path)

df_sub <- df[1:100,]  # display the first 100 rows
df_sub$CMPLNT_FR_TM <- as.character(df_sub$CMPLNT_FR_TM) 
df_sub

sprintf("Number of Rows in Dataframe: %s", format(nrow(df),big.mark = ","))

## [1] "Number of Rows in Dataframe: 228,905"

Preprocess Data

The All-Caps text is difficult to read. Let’s force the text in the appropriate columns into proper case.

proper_case <- function(x) {
  return (gsub("\\b([A-Z])([A-Z]+)", "\\U\\1\\L\\2" , x, perl=TRUE))
}

library(dplyr)
df <- df %>% mutate(BORO_NM = proper_case(BORO_NM),
                    JURIS_DESC = proper_case(JURIS_DESC),
                    LAW_CAT_CD = proper_case(LAW_CAT_CD),
                    LOC_OF_OCCUR_DESC = proper_case(LOC_OF_OCCUR_DESC),
                    OFNS_DESC = proper_case(OFNS_DESC),
                    PARKS_NM = proper_case(PARKS_NM),
                    PATROL_BORO = proper_case(PATROL_BORO),
                    PD_DESC = proper_case(PD_DESC),
                    PREM_TYP_DESC = proper_case(PREM_TYP_DESC),
                    CMPLNT_FR_TM = as.character(CMPLNT_FR_TM))
df_sub <- df[1:100,]  # display the first 100 rows
df_sub

# add column Day of week. 
data_dayOfWeek<-df
data_dayOfWeek$CMPLNT_FR_DT <- as.Date(data_dayOfWeek$CMPLNT_FR_DT,format = "%m/%d/%Y")
data_dayOfWeek$day_of_week<- wday(data_dayOfWeek$CMPLNT_FR_DT, label=TRUE)

Visualize Data

Crime across space

Display crime incident locations on the map using leaflet. Click icons on the map to show incident details.

data <- df[1:30000,] # display the first 10,000 rows
data$popup <- paste("<b>Incident #: </b>", data$CMPLNT_NUM, "<br>", "<b>Category: </b>", data$LAW_CAT_CD,
                    "<br>", "<b>Offence Description: </b>", data$OFNS_DESC,
                    "<br>", "<b>Day of week: </b>", data$DayOfWeek,
                    "<br>", "<b>Date: </b>", data$CMPLNT_FR_DT,
                    "<br>", "<b>Time: </b>", data$CMPLNT_FR_TM,
                    "<br>", "<b>PD Case: </b>", data$PD_CD,
                    "<br>", "<b>PD Description: </b>", data$PD_DESC,
                    "<br>", "<b>Longitude: </b>", data$Longitude,
                    "<br>", "<b>Latitude: </b>", data$Latitude)

## Warning: Unknown or uninitialised column: 'DayOfWeek'.

leaflet(data, width = "100%") %>% addTiles() %>%
  addTiles(group = "OSM (default)") %>%
  #addProviderTiles(provider = "Esri.WorldStreetMap",group = "World StreetMap") %>%
  #addProviderTiles(provider = "Esri.WorldImagery",group = "World Imagery") %>%
  # addProviderTiles(provider = "NASAGIBS.ViirsEarthAtNight2012",group = "Nighttime Imagery") %>%
  addMarkers(lng = ~Longitude, lat = ~Latitude, popup = data$popup, clusterOptions = markerClusterOptions()) %>%
  addLayersControl(
    baseGroups = c("OSM (default)","World StreetMap", "World Imagery"),
    options = layersControlOptions(collapsed = FALSE)
)

## Warning in validateCoords(lng, lat, funcName): Data contains 1 rows with
## either missing or invalid lat/lon values and will be ignored

Aggregate Data

Summarize the data by incident category.

df_category <- sort(table(df$LAW_CAT_CD),decreasing = TRUE)
df_category <- data.frame(df_category[df_category > 5000])
colnames(df_category) <- c("Category", "Frequency")
df_category$Percentage <- df_category$Frequency / sum(df_category$Frequency)*100
df_category

Create a bar plot based on the incident category.

bp<-ggplot(df_category, aes(x=Category, y=Frequency, fill=Category)) + geom_bar(stat="identity") + 
  theme(axis.text.x=element_blank()) + geom_text_repel(data=df_category, aes(label=Category))
bp

**Result: The most crime occurend in 2018 is Misdemeanor around 55 percents. second, Felony is around 29.93 percent. At last, violation is around 15 percent.

Aggregate Data

Summarize the data by incident category.

df_OFNS_DESC <- sort(table(df$OFNS_DESC),decreasing = TRUE)
df_OFNS_DESC <- data.frame(df_OFNS_DESC[df_OFNS_DESC > 3000])
colnames(df_OFNS_DESC) <- c("Category", "Frequency")
df_OFNS_DESC$Percentage <- df_OFNS_DESC$Frequency / sum(df_OFNS_DESC$Frequency)*100
df_OFNS_DESC

Create a bar plot based on the incident category.

ofns_cat<-ggplot(df_OFNS_DESC, aes(x=Category, y=Frequency, fill=Category)) + geom_bar(stat="identity") + 
  theme(axis.text.x=element_blank()) + geom_text_repel(data=df_OFNS_DESC, aes(label=Category))
ofns_cat

Result: Petit Larceny is crime occured most in New York City, followed by Harrassment and Assault & Related Offenses 19.70%, 16.48%, and 12.67 % respectivity.

data_dayOfWeek<-df
data_dayOfWeek$CMPLNT_FR_DT <- as.Date(data_dayOfWeek$CMPLNT_FR_DT,format = "%m/%d/%Y")
data_dayOfWeek$day_of_week<- wday(data_dayOfWeek$CMPLNT_FR_DT, label=TRUE)

crime Time Heatmap

Aggregate counts of crimes by Day-of-Week and Time to create heat map. Fortunately, the Day-Of-Week part is pre-derived, but Hour is slightly harder.

get_hour <- function(x) {
  return (as.numeric(strsplit(x,":")[[1]][1]))
}

df_crime_time <- data_dayOfWeek %>%
  mutate(Hour = sapply(CMPLNT_FR_TM, get_hour)) %>%
  group_by(day_of_week, Hour) %>%
  summarize(count = n())
head(df_crime_time)

#Reorder and format Factors.
dow_format <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
hour_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

df_crime_time$day_of_week <- factor(df_crime_time$day_of_week, level = rev(dow_format))
df_crime_time$Hour <- factor(df_crime_time$Hour, level = 0:23, label = hour_format)

head(df_crime_time)

RESULT: Here, I am trying to find crime occured by time. so I converted report 24hour into 12h clock. I count the crime occured based on the hour #Create Time Heatmap

plot <- ggplot(df_crime_time, aes(x = Hour, y = day_of_week, fill = count)) +geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6), legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.width=unit(2, "cm"), legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm"), panel.margin=element_blank()) +
  labs(x = "Hour of crime (Local Time)", y = "Day of Week", title = "Number of crime in Crime reported by Time") +
  scale_fill_gradient(low = "white", high = "#FF0000", labels = comma)
  
plot

Analysis: The purpose was to create heatmap to find out time and day of week were most crime occured. As heatmap, It is clear that there is huge amount of crime occured between 3PM and 6PM. Results also suggest that Friday is also popular among the criminal

Analysis

I am trying to find out when most felony occured in New York city area.

data_Felony<-filter(data_dayOfWeek, LAW_CAT_CD == "Felony")
head(data_Felony)

df_Felony_time <- data_Felony %>%
  mutate(Hour = sapply(CMPLNT_FR_TM, get_hour)) %>%
  group_by(day_of_week, Hour) %>%
  summarize(count = n())

#Reorder and format Factors.
dow_format <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
hour_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

df_Felony_time$day_of_week <- factor(df_Felony_time$day_of_week, level = rev(dow_format))
df_Felony_time$Hour <- factor(df_Felony_time$Hour, level = 0:23, label = hour_format)

felony_plot <- ggplot(df_Felony_time, aes(x = Hour, y = day_of_week, fill = count)) +geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6), legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.width=unit(2, "cm"), legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm"), panel.margin=element_blank()) +
  labs(x = "Hour of Felony (Local Time)", y = "Day of Week", title = "Number of Felony reported by Time") +
  scale_fill_gradient(low = "white", high = "#FF0000", labels = comma)
  
felony_plot

As heatmap, It is clear that there is huge amount of felony occured 12AM and 12PM. There is more felony happen at saturday at 12AM. Also there is some felony occured on friday between 4PM TO 6PM.

data_Misdemeanor<-filter(data_dayOfWeek, LAW_CAT_CD == "Misdemeanor")
head(data_Misdemeanor)

df_Misdemeanor_time <- data_Misdemeanor %>%
  mutate(Hour = sapply(CMPLNT_FR_TM, get_hour)) %>%
  group_by(day_of_week, Hour) %>%
  summarize(count = n())

#Reorder and format Factors.

dow_format <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
hour_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

df_Misdemeanor_time$day_of_week <- factor(df_Misdemeanor_time$day_of_week, level = rev(dow_format))
df_Misdemeanor_time$Hour <- factor(df_Misdemeanor_time$Hour, level = 0:23, label = hour_format)

head(df_Misdemeanor_time)

Misdemeanor_time_plot <- ggplot(df_Misdemeanor_time, aes(x = Hour, y = day_of_week, fill = count)) +geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6), legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.width=unit(2, "cm"), legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm"), panel.margin=element_blank()) +
  labs(x = "Hour of Misdemeanor (Local Time)", y = "Day of Week", title = "Number of Misdemeanor reported by Time") +
  scale_fill_gradient(low = "white", high = "#FF0000", labels = comma)
  
Misdemeanor_time_plot

data_Violation<-filter(data_dayOfWeek, LAW_CAT_CD == "Violation")
head(data_Violation)

df_Violation_time <- data_Violation %>%
  mutate(Hour = sapply(CMPLNT_FR_TM, get_hour)) %>%
  group_by(day_of_week, Hour) %>%
  summarize(count = n())

#Reorder and format Factors.

dow_format <- c("Sun","Mon","Tue","Wed","Thu","Fri","Sat")
hour_format <- c(paste(c(12,1:11),"AM"), paste(c(12,1:11),"PM"))

df_Violation_time$day_of_week <- factor(df_Violation_time$day_of_week, level = rev(dow_format))
df_Violation_time$Hour <- factor(df_Violation_time$Hour, level = 0:23, label = hour_format)

head(df_Violation_time)

Violation_time_plot <- ggplot(df_Violation_time, aes(x = Hour, y = day_of_week, fill = count)) +geom_tile() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.6), legend.title = element_blank(), legend.position="top", legend.direction="horizontal", legend.key.width=unit(2, "cm"), legend.key.height=unit(0.25, "cm"), legend.margin=unit(-0.5,"cm"), panel.margin=element_blank()) +
  labs(x = "Hour of Violation (Local Time)", y = "Day of Week", title = "Number of Violation reported by Time") +
  scale_fill_gradient(low = "white", high = "#FF0000", labels = comma)
  
Violation_time_plot

Analysis: The purpose was to create heatmap to find out time and day of week were most Violation occured. As heatmap, It is clear that there is huge amount of Violation occured at 3PM on Thursday

Factor by Month

If crime is tied to activities, the period at which activies end may impact. I want to findout crime by month now.

df_report_time_month <- data_dayOfWeek %>%
  mutate(Month = format(as.Date(CMPLNT_FR_DT, "%m/%d/%Y"), "%B"), Hour = sapply(CMPLNT_FR_TM, get_hour)) %>%
  group_by(Month, day_of_week, Hour) %>% 
  summarize(count = n()) %>%
  group_by(Month) %>%
  mutate(norm = count/sum(count))
head(df_report_time_month)

df_report_time_month$day_of_week <- factor(df_report_time_month$day_of_week, level = rev(dow_format))
df_report_time_month$Hour <- factor(df_report_time_month$Hour, level = 0:23, label = hour_format)
# Set order of month facets by chronological order instead of alphabetical
df_report_time_month$Month <- factor(df_report_time_month$Month, level = c("January","February","March","April","May","June","July","August","September","October","November","December"))

plot <- ggplot(df_report_time_month, aes(x = Hour, y = day_of_week, fill = count)) +
  geom_tile() +

  theme(axis.text.x = element_text(angle = 90, vjust = 0.6, size = 4)) +
  labs(x = "Hour of Arrest (Local Time)", y = "Day of Week", title = "Reported Crime 2018 by Time and, Normalized by Month") +
  scale_fill_gradient(low = "White", high = "#FF0000") +
  facet_wrap(~ Month, nrow = 6)
plot

Error: Result wasn’t that i was expect. I forgot that I choose the dataset was this year and dataset doesn’t include reported crime after JUNE 2018. There are some records of JUNE to December but That data was from 2017 because these crime updated recently it includes in this dataset

there is huge spiked at January’s Monday at 12PM

data_dayOfWeek <- data_dayOfWeek %>%
  filter(!is.na(BORO_NM))

borobp <- ggplot(data_dayOfWeek, aes(x = BORO_NM, fill=as.factor(BORO_NM))) + 
                geom_bar(width=0.9, stat="count") + 
                theme(legend.position="none") + 
                coord_flip()

borobp

boro.totals <- data.frame(table(data_dayOfWeek$BORO_NM))

names(boro.totals)[1] <- "Borough"

boro.totals

There are many crime happen in brooklyn county around 67489 than Manhattan around 56700

# NYC.gov has 2017 estimates at: 1471160, 2648771, 1664727, 2358582, and 479458 for BX, BK, MH, QN, and SI respectively.

boropops <- c(1471160, 2648771, 1664727, 2358582, 479458)

boro.totals[,"Freq"] <- ((boro.totals[,"Freq"]/boropops)*100)

scaled.boro.bp <- ggplot(boro.totals, aes(x= Borough, y = Freq, fill = as.factor(boro.totals$Borough))) +
                geom_bar(width=0.9, stat="identity") + 
                ggtitle("Crime Records per Capita by Borough") + 
                theme(legend.position="none") + 
                coord_flip()
scaled.boro.bp

Result crime record per capita sugests that queens is safe borough in New York City.

data_dayOfWeek$OFNS_DESC <- as.factor(data_dayOfWeek$OFNS_DESC)
  
data_dayOfWeek_map <- data_dayOfWeek%>%
  select(CMPLNT_NUM,BORO_NM,CMPLNT_FR_DT,LAW_CAT_CD,OFNS_DESC,VIC_RACE, VIC_SEX, Latitude,Longitude) %>%
  filter(OFNS_DESC=="Grand Larceny" | OFNS_DESC=="Petit Larceny" | OFNS_DESC=="Harrassment 2"   | OFNS_DESC=="Criminal Mischief & Related Of" | OFNS_DESC== "Theft-Fraud" | OFNS_DESC=="Sex Crimes" | OFNS_DESC== "Assault 3 & Related Offenses" | OFNS_DESC=="Miscellaneous Penal Law" | OFNS_DESC== "Frauds")

data_dayOfWeek_map

QUEENS

DATA

data_queens<-filter(data_dayOfWeek_map, BORO_NM == "Queens")
head(data_queens)

queens_OFNS_DESC <- sort(table(data_queens$OFNS_DESC),decreasing = TRUE)
queens_OFNS_DESC <- data.frame(queens_OFNS_DESC[queens_OFNS_DESC > 2000])
colnames(queens_OFNS_DESC) <- c("Category", "Frequency")
queens_OFNS_DESC$Percentage <- queens_OFNS_DESC$Frequency / sum(queens_OFNS_DESC$Frequency)*100
queens_OFNS_DESC

GRAPH

#queens 
leaflet(data_queens, width = "100%") %>% addTiles() %>%
  addTiles(group = "OSM (default)") %>%
  addProviderTiles(provider = "Esri.WorldStreetMap",group = "World StreetMap") %>%
  #addProviderTiles(provider = "Esri.WorldImagery",group = "World Imagery") %>%
   #addProviderTiles(provider = "NASAGIBS.ViirsEarthAtNight2012",group = "Nighttime Imagery") %>%
  addMarkers(lng = ~Longitude, lat = ~Latitude, popup = data$popup, clusterOptions = markerClusterOptions()) %>%
  addLayersControl(
    baseGroups = c("OSM (default)","World StreetMap", "Nighttime Imagery"),
    options = layersControlOptions(collapsed = FALSE)
  )

RESULT: According to result, Queens has Petit Larceny is popular crime reported like overall New York area. Petit Larceny reported 26.15 percents and Harrassment 25.09165 percent

BROOKLYN

DATA

data_Brooklyn<-filter(data_dayOfWeek_map, BORO_NM == "Brooklyn")
head(data_Brooklyn)

brooklyn_OFNS_DESC <- sort(table(data_Brooklyn$OFNS_DESC),decreasing = TRUE)
brooklyn_OFNS_DESC <- data.frame(brooklyn_OFNS_DESC[brooklyn_OFNS_DESC > 2000])
colnames(brooklyn_OFNS_DESC) <- c("Category", "Frequency")
brooklyn_OFNS_DESC$Percentage <- brooklyn_OFNS_DESC$Frequency / sum(brooklyn_OFNS_DESC$Frequency)*100
brooklyn_OFNS_DESC

GRAPH

BROOKLYN <- c(left = -74.04, bottom = 40.56, right = -73.85, top = 40.742)
map <- get_stamenmap(BROOKLYN, maptype = "toner-lite")

## Map from URL : http://tile.stamen.com/toner-lite/10/301/384.png

## Map from URL : http://tile.stamen.com/toner-lite/10/301/385.png

BROOKLYN_Map<- ggmap(map)+
     geom_point(data=data_Brooklyn, aes(x=Longitude, y=Latitude, color=factor(data_Brooklyn$OFNS_DESC)), alpha=1.0) +
     guides(colour = guide_legend(override.aes = list(alpha=1, size=5),
                                  title="Type of Crime")) +
     scale_colour_brewer(type="qual",palette="Paired") + 
     ggtitle("Top Crimes in Brooklyn") +
     theme_light(base_size=10) +
     theme(axis.line=element_blank(),
           axis.text.x=element_blank(),
           axis.text.y=element_blank(),
           axis.ticks=element_blank(),
           axis.title.x=element_blank(),
           axis.title.y=element_blank())
BROOKLYN_Map

## Warning: Removed 26 rows containing missing values (geom_point).

RESULT: According to result, Brooklyn has Petit Larceny is popular crime reported like Queens. Petit Larceny reported 26.15 percents and Harrassment 22.70 percent. There is 1100 cases difference between Petit Larceny and Harrassment.

BRONX

DATA

data_Bronx<-filter(data_dayOfWeek_map, BORO_NM == "Bronx")
head(data_Bronx)

bronx_OFNS_DESC <- sort(table(data_Bronx$OFNS_DESC),decreasing = TRUE)
bronx_OFNS_DESC <- data.frame(bronx_OFNS_DESC[bronx_OFNS_DESC > 2000])
colnames(bronx_OFNS_DESC) <- c("Category", "Frequency")
bronx_OFNS_DESC$Percentage <- bronx_OFNS_DESC$Frequency / sum(bronx_OFNS_DESC$Frequency)*100
bronx_OFNS_DESC

GRAPH

BRONX <- c(left = -73.96, bottom = 40.74, right = -73.69, top = 40.95)
map <- get_stamenmap(BRONX, maptype = "toner-lite")

## Map from URL : http://tile.stamen.com/toner-lite/10/302/384.png

Bronx_Map<- ggmap(map)+
     geom_point(data=data_Bronx, aes(x=Longitude, y=Latitude, color=factor(data_Bronx$OFNS_DESC)), alpha=1.0) +
     guides(colour = guide_legend(override.aes = list(alpha=1, size=5),
                                  title="Type of Crime")) +
     scale_colour_brewer(type="qual",palette="Paired") + 
     ggtitle("Top Crimes in BRONX") +
     theme_light(base_size=10) +
     theme(axis.line=element_blank(),
           axis.text.x=element_blank(),
           axis.text.y=element_blank(),
           axis.ticks=element_blank(),
           axis.title.x=element_blank(),
           axis.title.y=element_blank())
Bronx_Map

## Warning: Removed 14 rows containing missing values (geom_point).

RESULT: Unlike, New York. The result are different than expected. There are more Harassment insidents than Petit Larceny. There is 25.90 percent Harrassment cases reported and 23.82 percent Petit Larceny.

MANHATTAN

DATA

data_Manhattan<-filter(data_dayOfWeek_map, BORO_NM == "Manhattan")
head(data_Manhattan)

Manhattan_OFNS_DESC <- sort(table(data_Manhattan$OFNS_DESC),decreasing = TRUE)
Manhattan_OFNS_DESC <- data.frame(Manhattan_OFNS_DESC[Manhattan_OFNS_DESC > 2000])
colnames(Manhattan_OFNS_DESC) <- c("Category", "Frequency")
Manhattan_OFNS_DESC$Percentage <- Manhattan_OFNS_DESC$Frequency / sum(Manhattan_OFNS_DESC$Frequency)*100
Manhattan_OFNS_DESC

GRAPH

MANHATTAN <- c(left = -74.09, bottom = 40.69, right = -73.83, top = 40.89)
map <- get_stamenmap(MANHATTAN, maptype = "toner-lite")

MANHATTAN_Map<- ggmap(map)+
     geom_point(data=data_Manhattan, aes(x=Longitude, y=Latitude, color=factor(data_Manhattan$OFNS_DESC)), alpha=1.0) +
     guides(colour = guide_legend(override.aes = list(alpha=1, size=5),
                                  title="Type of Crime")) +
     scale_colour_brewer(type="qual",palette="Paired") + 
     ggtitle("Top Crimes in Manhattan") +
     theme_light(base_size=10) +
     theme(axis.line=element_blank(),
           axis.text.x=element_blank(),
           axis.text.y=element_blank(),
           axis.ticks=element_blank(),
           axis.title.x=element_blank(),
           axis.title.y=element_blank())
MANHATTAN_Map

## Warning: Removed 2 rows containing missing values (geom_point).

RESULT: According to result, Manhattan has Petit Larceny is popular crime reported with 34.587 percents And Harrassment is second popular with 20 percent. There is mpre than 5000 cases difference between Petit Larceny and Harrassment in the Manhattan area.

Staten Island

DATA

data_SIsland<-filter(data_dayOfWeek_map, BORO_NM == "Staten Island")
head(data_SIsland)

S_Island_OFNS_DESC <- sort(table(data_SIsland$OFNS_DESC),decreasing = TRUE)
S_Island_OFNS_DESC <- data.frame(S_Island_OFNS_DESC[S_Island_OFNS_DESC > 200])
colnames(S_Island_OFNS_DESC) <- c("Category", "Frequency")
S_Island_OFNS_DESC$Percentage <- S_Island_OFNS_DESC$Frequency / sum(S_Island_OFNS_DESC$Frequency)*100
S_Island_OFNS_DESC

GRAPH

Staten_Island <- c(left = -74.35, bottom = 40.45, right = -73.85, top = 40.70)
map <- get_stamenmap(Staten_Island, maptype = "toner-lite")

## Map from URL : http://tile.stamen.com/toner-lite/10/300/385.png

Staten_Island_Map<- ggmap(map)+
     geom_point(data=data_SIsland, aes(x=Longitude, y=Latitude, color=factor(data_SIsland$OFNS_DESC)), alpha=1.0) +
     guides(colour = guide_legend(override.aes = list(alpha=1, size=5),
                                  title="Type of Crime")) +
     scale_colour_brewer(type="qual",palette="Paired") + 
     ggtitle("Top Crimes in Manhattan") +
     theme_light(base_size=10) +
     theme(axis.line=element_blank(),
           axis.text.x=element_blank(),
           axis.text.y=element_blank(),
           axis.ticks=element_blank(),
           axis.title.x=element_blank(),
           axis.title.y=element_blank())
Staten_Island_Map

RESULT: Unlike, New York. The result are different than expected. There are more Harassment insidents than Petit Larceny in Staten Island. There is 31.60 percent Harrassment cases reported and 21.68 percent Petit Larceny.

Reference

The R-Graph Gallery For likert type Chart [LINK] https://www.r-graph-gallery.com/202-barplot-for-likert-type-items/
NYPD Complaint Data Current (Year To Date) [LINK] https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243

Data 607 Final Project

VINAYAK PATEL

November 18, 2018

Abstract

Data Source

Required Packages/libraries

Data Preprocessing

Read the data

Importing data

Preprocess Data

Visualize Data

Crime across space

Aggregate Data

Aggregate Data

crime Time Heatmap

Analysis

Factor by Month

QUEENS

DATA

GRAPH

BROOKLYN

DATA

GRAPH

BRONX

DATA

GRAPH

MANHATTAN

DATA

GRAPH

Staten Island

DATA

GRAPH

Reference