library(dplyr)
library(ggplot2)
library(plotly)
data <- read.csv(file="dat.csv")
#selecing the rows
data.new <- data %>% select(CMPLNT_FR_DT,CMPLNT_FR_TM,OFNS_DESC,PD_DESC,LAW_CAT_CD,BORO_NM,PREM_TYP_DESC,Latitude,Longitude)
#counting the grouping by crime type
crime.compl.count <- data.new %>% group_by(PD_DESC) %>% summarise(total = n()) %>% arrange(desc(total))
#the first 20 most common crime complaints
first_20 <- crime.compl.count[1:20,]
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
I want to study criminal complaints in the last few years in NYC and see if they are increasing or decreasing? comparing by borough and possible by zip code area.
What are the cases, and how many are there?
there are 469900 obs and 36 variables.
Describe the method of data collection.
The data was retrieved from the NYC OpenData website, download to CSV file and loaded to RStudio.
What type of study is this (observational/experiment)?
This is an observational study.
If you collected the data, state self-collected. If not, provide a citation/link.
https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i
What is the response variable? Is it quantitative or qualitative?
The response variable will be the count of crime complains(by borough or zip code) and will be a numerical variable.
You should have two independent variables, one quantitative and one qualitative.
The independent variables are by borough and its qualitative variables, the Crime complains level(violation, a misdemeanor, felony) it can be by count and will be quantitative.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
str(data.new)
## 'data.frame': 469900 obs. of 9 variables:
## $ CMPLNT_FR_DT : Factor w/ 1768 levels "1/1/07","1/1/08",..: 582 582 582 582 582 582 582 582 582 582 ...
## $ CMPLNT_FR_TM : Factor w/ 1440 levels "0:00:00","0:01:00",..: 679 676 676 676 671 668 666 666 665 661 ...
## $ OFNS_DESC : Factor w/ 63 levels "","ABORTION",..: 27 30 27 52 8 20 26 27 30 12 ...
## $ PD_DESC : Factor w/ 353 levels "","A.B.C.,FALSE PROOF OF AGE",..: 123 137 122 194 21 20 180 122 137 80 ...
## $ LAW_CAT_CD : Factor w/ 3 levels "FELONY","MISDEMEANOR",..: 3 2 3 2 2 1 1 3 2 2 ...
## $ BORO_NM : Factor w/ 6 levels "","BRONX","BROOKLYN",..: 2 6 5 3 3 2 5 4 5 4 ...
## $ PREM_TYP_DESC: Factor w/ 73 levels "","ABANDONED BUILDING",..: 55 62 62 20 55 53 47 62 62 20 ...
## $ Latitude : num 40.9 40.5 40.7 40.6 40.6 ...
## $ Longitude : num -73.9 -74.2 -73.9 -73.9 -74 ...
summary(data.new)
## CMPLNT_FR_DT CMPLNT_FR_TM
## 1/1/17 : 1986 12:00:00: 12824
## 6/1/17 : 1681 15:00:00: 10188
## 10/6/17: 1590 18:00:00: 9731
## 7/1/17 : 1583 17:00:00: 9429
## 5/1/17 : 1562 20:00:00: 9064
## 3/1/17 : 1540 16:00:00: 8895
## (Other):459958 (Other) :409769
## OFNS_DESC
## PETIT LARCENY : 83731
## HARRASSMENT 2 : 66601
## ASSAULT 3 & RELATED OFFENSES : 51561
## CRIMINAL MISCHIEF & RELATED OF: 49503
## GRAND LARCENY : 43468
## OFF. AGNST PUB ORD SENSBLTY & : 21942
## (Other) :153094
## PD_DESC LAW_CAT_CD
## HARASSMENT,SUBD 3,4,5 : 47941 FELONY :143381
## ASSAULT 3 : 41661 MISDEMEANOR:259066
## LARCENY,PETIT FROM STORE-SHOPL: 29062 VIOLATION : 67453
## AGGRAVATED HARASSMENT 2 : 20384
## LARCENY,PETIT FROM BUILDING,UN: 18933
## HARASSMENT,SUBD 1,CIVILIAN : 18660
## (Other) :293259
## BORO_NM PREM_TYP_DESC
## : 293 STREET :137499
## BRONX :103701 RESIDENCE - APT. HOUSE :102503
## BROOKLYN :137928 RESIDENCE-HOUSE : 43442
## MANHATTAN :114802 RESIDENCE - PUBLIC HOUSING: 37203
## QUEENS : 91804 CHAIN STORE : 13897
## STATEN ISLAND: 21372 OTHER : 13739
## (Other) :121617
## Latitude Longitude
## Min. :40.50 Min. :-74.25
## 1st Qu.:40.67 1st Qu.:-73.97
## Median :40.73 Median :-73.93
## Mean :40.74 Mean :-73.93
## 3rd Qu.:40.81 3rd Qu.:-73.88
## Max. :40.91 Max. :-73.70
## NA's :39 NA's :39
#ascending order the first 20 most common crime reported
ggplot(first_20, aes(x=total, y=PD_DESC)) + geom_point()
#count crime complains by borough
pl <- data.new %>% group_by(BORO_NM,.drop = T) %>%
summarise(total = n()) %>%
plot_ly(x=~ BORO_NM ,
y = ~total ,
type = "bar",
marker=list(color= ~total , size=20 , opacity=0.9)) %>% layout(xaxis = list(title = "Borough"),
yaxis = list(title = "Number of Crime Complaints"))
pl
pl <- data.new %>% group_by(OFNS_DESC) %>%
summarise(total = n()) %>%
plot_ly(x=~ OFNS_DESC ,
y = ~total ,
type = "bar",
xaxis = list(autotick = T, dtick = 1),
marker=list(color= ~OFNS_DESC , size=20 , opacity=0.9) ) %>% layout(xaxis = list(title = "Crime Type Name"),yaxis = list(title = "Number of Crime Complaints"),margin = list(b = 240))
pl
boxplot(first_20$total)