My DATA 698 Project will be based on NYC Crime. I was wondering how violent is the city, if some neighborhoods or type of persons are most affected, and if I could identify some patterns for shooting over time as well compare to other crimes. My primary goal is to take the tools and resources learned in the DATA SCIENCE MS courses such as time series, trend, seasonality, cyclical analysis as well as graphical visualizations to portray my findings. The Data sets found contain very useful information to perform such as analysis such as date of occurrence, type of offense, borough, age, race and gender. My hypothesis consists of identifying or try to pin point the specific zones/times more likely to have crime and analyze this deeply to come up with solutions correlating to the techniques used by other Crime Analysis experts in the literature chosen. The literature chosen such as “The crime numbers game: Management by manipulation” and “Understanding new York’s crime drop.” may also be useful to identify any possible fallacy or errors in the data.
A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend as “changing direction,” when it might go from an increasing trend to a decreasing trend. A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known frequency. A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency.
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.
It is a good practice to understand the data first and try to gather as many insights from it. EDA is all about making sense of data in hand,before getting them dirty with it.
In addition, after exploring the data as mentioned above we could possibly be able to identify what day of the week certain crimes may be committed, is there a specific time of day when the crimes occurr and which boroughs may be the safest.
The initial exploration for this Project began by me creating a shiny app[dashboard] in R to visualize the Open Source NYC Shooting Data. I created selection options for users to be able to view the data by year(2006-2020), select the type of incident[murder or non-murder shootings] and by borough. In addition to these options further information such as Total Number of Incidents, Percent Change vs. Previous Year and a Heatmap are provided.
This is viewable here: https://johnmazon90.shinyapps.io/nyc_shooting_app_jmazon/
NYC SHOOTING SHINY APP - BY JMAZON
NYC OPEN DATA NYPD Complaint Data Historic
This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of last year (2019). For additional details, please see the attached data dictionary in the ‘About’ section.
List of every shooting incident that occurred in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every shooting incident that occurred in NYC going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents a shooting incident in NYC and includes information about the event, the location and time of occurrence. In addition, information related to suspect and victim demographics is also included. This data can be used by the public to explore the nature of shooting/criminal activity. Please refer to the attached data footnotes for additional information about this dataset.
The NYPD maintains statistical data which is used as a management tool in reducing crime, improving procedures and training, and providing transparency to the public and government oversight agencies. In 1994, the department implemented CompStat, which through management, statistics, and accountability, successfully drove down crime to record levels not seen since the 1950s.
The department provides up-to-date crime-related statistics in the seven major crime categories on the citywide, borough, and precinct levels, as well as historical crime data. The public can also access this data through the department’s CompStat 2.0 portal.
Tags: shooting, crime, law enforcement, public safety, nypd
df_nycshooting_historic <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Historic.csv")
df_nycshoot_ytd <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Year_To_Date_.csv")
#df_nycshoot_ytd <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Year_To_Date_.csv")
head(df_nycshooting_historic)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO PRECINCT JURISDICTION_CODE
## 1 201575314 08/23/2019 22:10:00 QUEENS 103 0
## 2 205748546 11/27/2019 15:54:00 BRONX 40 0
## 3 193118596 02/02/2019 19:40:00 MANHATTAN 23 0
## 4 204192600 10/24/2019 00:52:00 STATEN ISLAND 121 0
## 5 201483468 08/22/2019 18:03:00 BRONX 46 0
## 6 198255460 06/07/2019 17:50:00 BROOKLYN 73 0
## LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX PERP_RACE
## 1 false
## 2 false <18 M BLACK
## 3 false 18-24 M WHITE HISPANIC
## 4 PVT HOUSE true 25-44 M BLACK
## 5 false 25-44 M BLACK HISPANIC
## 6 false 45-64 M WHITE HISPANIC
## VIC_AGE_GROUP VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude
## 1 25-44 M BLACK 1037451 193561 40.69781 -73.80814
## 2 25-44 F BLACK 1006789 237559 40.81870 -73.91857
## 3 18-24 M BLACK HISPANIC 999347 227795 40.79192 -73.94548
## 4 25-44 F BLACK 938149 171781 40.63806 -74.16611
## 5 18-24 M BLACK 1008224 250621 40.85455 -73.91334
## 6 25-44 M BLACK 1009650 186966 40.67983 -73.90843
## Lon_Lat
## 1 POINT (-73.80814071699996 40.697805308000056)
## 2 POINT (-73.91857061799993 40.81869973000005)
## 3 POINT (-73.94547965999999 40.791916091000076)
## 4 POINT (-74.16610830199996 40.63806398200006)
## 5 POINT (-73.91333944399999 40.85454734900003)
## 6 POINT (-73.90842523899994 40.67982701600005)
head(df_nycshoot_ytd)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO PRECINCT JURISDICTION_CODE
## 1 229643180 06/16/2021 21:34:00 BROOKLYN 73 0
## 2 233147632 09/03/2021 16:28:00 BRONX 43 0
## 3 231637053 07/31/2021 22:36:00 QUEENS 115 0
## 4 238041594 12/17/2021 12:00:00 BRONX 50 0
## 5 228798560 05/27/2021 22:50:00 QUEENS 103 0
## 6 226542151 04/05/2021 23:15:00 BROOKLYN 73 2
## LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX
## 1 false
## 2 false
## 3 false
## 4 false
## 5 false 18-24 M
## 6 MULTI DWELL - PUBLIC HOUS true 45-64 M
## PERP_RACE VIC_AGE_GROUP VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude
## 1 25-44 M BLACK 1006621 185426 40.67561
## 2 18-24 F BLACK 1022465 242774 40.83296
## 3 18-24 M WHITE HISPANIC 1020765 213373 40.75227
## 4 25-44 M BLACK 1010914 260940 40.88286
## 5 BLACK 25-44 M BLACK 1037559 194576 40.70059
## 6 BLACK <18 F BLACK 1010363 182581 40.66779
## Longitude New.Georeferenced.Column
## 1 -73.91935 POINT (-73.91935098699997 40.67560823700006)
## 2 -73.86191 POINT (-73.86190538999993 40.83295950100006)
## 3 -73.86821 POINT (-73.86820844899995 40.75226896500004)
## 4 -73.90357 POINT (-73.90357448999998 40.88286213100001)
## 5 -73.80774 POINT (-73.80774319999993 40.70059059000005)
## 6 -73.90587 POINT (-73.90587160499997 40.667789106000036)
#head(df_nycshoot_ytd)
summary(df_nycshoot_ytd)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO
## Min. :222524732 Length:2011 Length:2011 Length:2011
## 1st Qu.:227647474 Class :character Class :character Class :character
## Median :230741371 Mode :character Mode :character Mode :character
## Mean :230857771
## 3rd Qu.:234086538
## Max. :238490103
## PRECINCT JURISDICTION_CODE LOCATION_DESC STATISTICAL_MURDER_FLAG
## Min. : 5.00 Min. :0.0000 Length:2011 Length:2011
## 1st Qu.: 42.00 1st Qu.:0.0000 Class :character Class :character
## Median : 52.00 Median :0.0000 Mode :character Mode :character
## Mean : 61.82 Mean :0.3148
## 3rd Qu.: 79.00 3rd Qu.:0.0000
## Max. :123.00 Max. :2.0000
## PERP_AGE_GROUP PERP_SEX PERP_RACE VIC_AGE_GROUP
## Length:2011 Length:2011 Length:2011 Length:2011
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD
## Length:2011 Length:2011 Min. : 926731 Min. :142481
## Class :character Class :character 1st Qu.:1000891 1st Qu.:185415
## Mode :character Mode :character Median :1008795 Median :218912
## Mean :1010338 Mean :214861
## 3rd Qu.:1017136 3rd Qu.:243782
## Max. :1059372 Max. :269635
## Latitude Longitude New.Georeferenced.Column
## Min. :40.56 Min. :-74.21 Length:2011
## 1st Qu.:40.68 1st Qu.:-73.94 Class :character
## Median :40.77 Median :-73.91 Mode :character
## Mean :40.76 Mean :-73.91
## 3rd Qu.:40.84 3rd Qu.:-73.88
## Max. :40.91 Max. :-73.73
summary(df_nycshooting_historic)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO
## Min. : 9953245 Length:23568 Length:23568 Length:23568
## 1st Qu.: 55317014 Class :character Class :character Class :character
## Median : 83365370 Mode :character Mode :character Mode :character
## Mean :102218616
## 3rd Qu.:150772442
## Max. :222473262
##
## PRECINCT JURISDICTION_CODE LOCATION_DESC STATISTICAL_MURDER_FLAG
## Min. : 1.00 Min. :0.0000 Length:23568 Length:23568
## 1st Qu.: 44.00 1st Qu.:0.0000 Class :character Class :character
## Median : 69.00 Median :0.0000 Mode :character Mode :character
## Mean : 66.21 Mean :0.3323
## 3rd Qu.: 81.00 3rd Qu.:0.0000
## Max. :123.00 Max. :2.0000
## NA's :2
## PERP_AGE_GROUP PERP_SEX PERP_RACE VIC_AGE_GROUP
## Length:23568 Length:23568 Length:23568 Length:23568
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD
## Length:23568 Length:23568 Length:23568 Length:23568
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Latitude Longitude Lon_Lat
## Min. :40.51 Min. :-74.25 Length:23568
## 1st Qu.:40.67 1st Qu.:-73.94 Class :character
## Median :40.70 Median :-73.92 Mode :character
## Mean :40.74 Mean :-73.91
## 3rd Qu.:40.82 3rd Qu.:-73.88
## Max. :40.91 Max. :-73.70
##
unique(df_nycshoot_ytd$BORO)
## [1] "BROOKLYN" "BRONX" "QUEENS" "MANHATTAN"
## [5] "STATEN ISLAND"
colnames(df_nycshoot_ytd)[!colnames(df_nycshoot_ytd) %in% colnames(df_nycshooting_historic)]
## [1] "New.Georeferenced.Column"
colnames(df_nycshooting_historic)
## [1] "INCIDENT_KEY" "OCCUR_DATE"
## [3] "OCCUR_TIME" "BORO"
## [5] "PRECINCT" "JURISDICTION_CODE"
## [7] "LOCATION_DESC" "STATISTICAL_MURDER_FLAG"
## [9] "PERP_AGE_GROUP" "PERP_SEX"
## [11] "PERP_RACE" "VIC_AGE_GROUP"
## [13] "VIC_SEX" "VIC_RACE"
## [15] "X_COORD_CD" "Y_COORD_CD"
## [17] "Latitude" "Longitude"
## [19] "Lon_Lat"
colnames(df_nycshoot_ytd)
## [1] "INCIDENT_KEY" "OCCUR_DATE"
## [3] "OCCUR_TIME" "BORO"
## [5] "PRECINCT" "JURISDICTION_CODE"
## [7] "LOCATION_DESC" "STATISTICAL_MURDER_FLAG"
## [9] "PERP_AGE_GROUP" "PERP_SEX"
## [11] "PERP_RACE" "VIC_AGE_GROUP"
## [13] "VIC_SEX" "VIC_RACE"
## [15] "X_COORD_CD" "Y_COORD_CD"
## [17] "Latitude" "Longitude"
## [19] "New.Georeferenced.Column"
df_nycshoot_ytd <- rename(df_nycshoot_ytd, Lon_Lat = New.Georeferenced.Column)
df_merged <- rbind(df_nycshooting_historic,df_nycshoot_ytd)
head(df_merged)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO PRECINCT JURISDICTION_CODE
## 1 201575314 08/23/2019 22:10:00 QUEENS 103 0
## 2 205748546 11/27/2019 15:54:00 BRONX 40 0
## 3 193118596 02/02/2019 19:40:00 MANHATTAN 23 0
## 4 204192600 10/24/2019 00:52:00 STATEN ISLAND 121 0
## 5 201483468 08/22/2019 18:03:00 BRONX 46 0
## 6 198255460 06/07/2019 17:50:00 BROOKLYN 73 0
## LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX PERP_RACE
## 1 false
## 2 false <18 M BLACK
## 3 false 18-24 M WHITE HISPANIC
## 4 PVT HOUSE true 25-44 M BLACK
## 5 false 25-44 M BLACK HISPANIC
## 6 false 45-64 M WHITE HISPANIC
## VIC_AGE_GROUP VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude
## 1 25-44 M BLACK 1037451 193561 40.69781 -73.80814
## 2 25-44 F BLACK 1006789 237559 40.81870 -73.91857
## 3 18-24 M BLACK HISPANIC 999347 227795 40.79192 -73.94548
## 4 25-44 F BLACK 938149 171781 40.63806 -74.16611
## 5 18-24 M BLACK 1008224 250621 40.85455 -73.91334
## 6 25-44 M BLACK 1009650 186966 40.67983 -73.90843
## Lon_Lat
## 1 POINT (-73.80814071699996 40.697805308000056)
## 2 POINT (-73.91857061799993 40.81869973000005)
## 3 POINT (-73.94547965999999 40.791916091000076)
## 4 POINT (-74.16610830199996 40.63806398200006)
## 5 POINT (-73.91333944399999 40.85454734900003)
## 6 POINT (-73.90842523899994 40.67982701600005)
nrow(df_merged)
## [1] 25579
nrow(df_nycshoot_ytd)
## [1] 2011
nrow(df_nycshooting_historic)
## [1] 23568
factor_columns <- c("BORO","PRECINCT","JURISDICTION_CODE","LOCATION_DESC","STATISTICAL_MURDER_FLAG",
"PERP_AGE_GROUP","PERP_AGE_GROUP","PERP_SEX", "PERP_RACE","VIC_AGE_GROUP","VIC_SEX","VIC_RACE")
df_merged <- df_merged %>%
mutate_at(factor_columns, factor)
#convert to character
df_merged <- df_merged %>%
mutate(INCIDENT_KEY = as.character(INCIDENT_KEY))
str(df_merged)
## 'data.frame': 25579 obs. of 19 variables:
## $ INCIDENT_KEY : chr "201575314" "205748546" "193118596" "204192600" ...
## $ OCCUR_DATE : chr "08/23/2019" "11/27/2019" "02/02/2019" "10/24/2019" ...
## $ OCCUR_TIME : chr "22:10:00" "15:54:00" "19:40:00" "00:52:00" ...
## $ BORO : Factor w/ 5 levels "BRONX","BROOKLYN",..: 4 1 3 5 1 2 2 2 4 2 ...
## $ PRECINCT : Factor w/ 77 levels "1","5","6","7",..: 61 23 14 75 29 46 52 40 72 42 ...
## $ JURISDICTION_CODE : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 3 1 ...
## $ LOCATION_DESC : Factor w/ 40 levels "","ATM","BANK",..: 1 1 1 29 1 1 1 25 26 1 ...
## $ STATISTICAL_MURDER_FLAG: Factor w/ 2 levels "false","true": 1 1 1 2 1 1 1 2 1 1 ...
## $ PERP_AGE_GROUP : Factor w/ 10 levels "","<18","1020",..: 1 2 4 6 6 7 4 1 4 6 ...
## $ PERP_SEX : Factor w/ 4 levels "","F","M","U": 1 3 3 3 3 3 3 1 3 3 ...
## $ PERP_RACE : Factor w/ 8 levels "","AMERICAN INDIAN/ALASKAN NATIVE",..: 1 4 8 4 5 8 4 1 4 4 ...
## $ VIC_AGE_GROUP : Factor w/ 6 levels "<18","18-24",..: 3 3 2 3 2 3 3 3 3 3 ...
## $ VIC_SEX : Factor w/ 3 levels "F","M","U": 2 1 2 1 2 2 2 2 2 2 ...
## $ VIC_RACE : Factor w/ 7 levels "AMERICAN INDIAN/ALASKAN NATIVE",..: 3 3 4 3 3 3 3 3 3 3 ...
## $ X_COORD_CD : chr "1037451" "1006789" "999347" "938149" ...
## $ Y_COORD_CD : chr "193561" "237559" "227795" "171781" ...
## $ Latitude : num 40.7 40.8 40.8 40.6 40.9 ...
## $ Longitude : num -73.8 -73.9 -73.9 -74.2 -73.9 ...
## $ Lon_Lat : chr "POINT (-73.80814071699996 40.697805308000056)" "POINT (-73.91857061799993 40.81869973000005)" "POINT (-73.94547965999999 40.791916091000076)" "POINT (-74.16610830199996 40.63806398200006)" ...
summary(df_merged)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO
## Length:25579 Length:25579 Length:25579 BRONX : 7401
## Class :character Class :character Class :character BROOKLYN :10353
## Mode :character Mode :character Mode :character MANHATTAN : 3264
## QUEENS : 3823
## STATEN ISLAND: 738
##
##
## PRECINCT JURISDICTION_CODE LOCATION_DESC
## 75 : 1462 0 :21316 :14977
## 73 : 1370 1 : 59 MULTI DWELL - PUBLIC HOUS: 4549
## 67 : 1161 2 : 4202 MULTI DWELL - APT BUILD : 2662
## 79 : 981 NA's: 2 PVT HOUSE : 894
## 44 : 950 GROCERY/BODEGA : 620
## 47 : 900 BAR/NIGHT CLUB : 584
## (Other):18755 (Other) : 1293
## STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX PERP_RACE
## false:20663 :9508 : 9474 BLACK :10498
## true : 4916 18-24 :5784 F: 370 : 9474
## 25-44 :5101 M:14231 WHITE HISPANIC: 2137
## UNKNOWN:3156 U: 1504 UNKNOWN : 1869
## <18 :1449 BLACK HISPANIC: 1188
## 45-64 : 521 WHITE : 272
## (Other): 60 (Other) : 141
## VIC_AGE_GROUP VIC_SEX VIC_RACE
## <18 : 2681 F: 2394 AMERICAN INDIAN/ALASKAN NATIVE: 9
## 18-24 : 9601 M:23165 ASIAN / PACIFIC ISLANDER : 347
## 25-44 :11370 U: 20 BLACK :18258
## 45-64 : 1693 BLACK HISPANIC : 2484
## 65+ : 168 UNKNOWN : 102
## UNKNOWN: 66 WHITE : 655
## WHITE HISPANIC : 3724
## X_COORD_CD Y_COORD_CD Latitude Longitude
## Length:25579 Length:25579 Min. :40.51 Min. :-74.25
## Class :character Class :character 1st Qu.:40.67 1st Qu.:-73.94
## Mode :character Mode :character Median :40.70 Median :-73.92
## Mean :40.74 Mean :-73.91
## 3rd Qu.:40.82 3rd Qu.:-73.88
## Max. :40.91 Max. :-73.70
##
## Lon_Lat
## Length:25579
## Class :character
## Mode :character
##
##
##
##
df_merged$OCCUR_DATE <- mdy(df_merged$OCCUR_DATE)
n_occur <- data.frame(table(df_merged$INCIDENT_KEY))
head(n_occur[n_occur$Freq > 1,])
## Var1 Freq
## 11 10038637 2
## 22 10137408 2
## 24 10137411 3
## 25 10137412 2
## 27 10137414 2
## 32 10137422 2
df_merged_single <- df_merged[df_merged$INCIDENT_KEY %in% n_occur$Var1[n_occur$Freq == 1],]
df_merged_single <- df_merged_single %>%
mutate(Quarter = yearquarter(OCCUR_DATE)) %>%
select(-OCCUR_DATE) %>%
as_tsibble(key = INCIDENT_KEY,
index = Quarter)
names(df_merged_single)
## [1] "INCIDENT_KEY" "OCCUR_TIME"
## [3] "BORO" "PRECINCT"
## [5] "JURISDICTION_CODE" "LOCATION_DESC"
## [7] "STATISTICAL_MURDER_FLAG" "PERP_AGE_GROUP"
## [9] "PERP_SEX" "PERP_RACE"
## [11] "VIC_AGE_GROUP" "VIC_SEX"
## [13] "VIC_RACE" "X_COORD_CD"
## [15] "Y_COORD_CD" "Latitude"
## [17] "Longitude" "Lon_Lat"
## [19] "Quarter"
nycshooting_grouped <- df_merged_single %>%
index_by(Quarter) %>%
summarize(shootings = n())
autoplot(nycshooting_grouped, shootings) +
labs(title = "NYC SHOOTING",
subtitle = "Quarter",
y = "Shootings")
TIMES SERIES DECOMPOSITION
nycshooting_grouped %>%
gg_season(shootings, labels = "both") +
labs(y = " ",
title = "Seasonal plot: NYC SHOOTING")
table(df_merged_single$BORO)
##
## BRONX BROOKLYN MANHATTAN QUEENS STATEN ISLAND
## 4593 7127 2083 2545 506
shootings_by_Boro <- df_merged_single %>%
group_by(BORO) %>%
summarise(shootings = n())
shootings_by_Boro %>%
ggplot(aes(x = Quarter, y = shootings)) +
geom_line() +
facet_grid(vars(BORO), scales = "free_y") +
labs(title = "NYC Shooting by Boro",
y= "# of Shootings")
#STL
nycshooting_grouped %>%
model(
STL(shootings ~ trend(window = 7) +
season(window = "periodic"),
robust = TRUE)) %>%
components() %>%
autoplot()
#simple stats
nycshooting_grouped %>%
features(shootings, list(mean = mean)) %>%
arrange(mean)
## # A tibble: 1 x 1
## mean
## <dbl>
## 1 263.
#ACF
nycshooting_grouped %>% features(shootings, feat_acf)
## # A tibble: 1 x 7
## acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 season_acf1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.445 1.12 -0.0594 2.21 -0.155 1.90 0.663
#STL
#used shootings by boro instead
shootings_by_Boro%>%
features(shootings, feat_stl)
## # A tibble: 5 x 10
## BORO trend_strength seasonal_streng~ seasonal_peak_y~ seasonal_trough~
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 BRONX 0.795 0.771 3 1
## 2 BROOKLYN 0.854 0.823 3 1
## 3 MANHATTAN 0.826 0.722 3 1
## 4 QUEENS 0.757 0.672 3 1
## 5 STATEN ISLA~ 0.432 0.454 3 1
## # ... with 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
## # stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
The feat_acf() function computes a selection of the autocorrelations discussed here. It will return six or seven features:
SEASONAL STRENGTH
compare with other categorical
#include in shiny app the other categorical v.
boroandrace_shootings <- df_merged_single %>%
group_by(BORO,VIC_AGE_GROUP) %>%
summarise(shootings = n())
boroandrace_shootings %>%
features(shootings, feat_stl) %>%
ggplot(aes(x = trend_strength, y = seasonal_strength_year,
col = VIC_AGE_GROUP)) +
geom_point() +
facet_wrap(vars(BORO))
## Warning: 1 error encountered for feature 1
## [1] 'degree' must be less than number of unique points
## Warning: Removed 5 rows containing missing values (geom_point).
#lag plots
lag_boroandrace_shootings <- shootings_by_Boro %>%
filter(BORO == 'BRONX')
lag_boroandrace_shootings <- lag_boroandrace_shootings %>%
filter(year(Quarter) == 2020)
lag_boroandrace_shootings %>%
gg_lag(shootings, geom = "point") +
labs(x = "lag(shootings, k)")
#autocorrelation
lag_boroandrace_shootings %>% ACF(shootings, lag_max = 9)
## # A tsibble: 3 x 3 [1Q]
## # Key: BORO [1]
## BORO lag acf
## <fct> <lag> <dbl>
## 1 BRONX 1Q 0.0519
## 2 BRONX 2Q -0.498
## 3 BRONX 3Q -0.0539
# recent_production %>%
ACF(shootings_by_Boro) %>%
autoplot() + labs(title="NYC SHOOTINGS")
## Response variable not specified, automatically selected `var = shootings`
#decomp
dcmp <- lag_boroandrace_shootings %>%
model(stl = STL(shootings))
components(dcmp)
## # A dable: 4 x 7 [1Q]
## # Key: BORO, .model [1]
## # : shootings = trend + remainder
## BORO .model Quarter shootings trend remainder season_adjust
## <fct> <chr> <qtr> <int> <dbl> <dbl> <dbl>
## 1 BRONX stl 2020 Q1 31 48.5 -17.5 31
## 2 BRONX stl 2020 Q2 78 69.5 8.5 78
## 3 BRONX stl 2020 Q3 126 90.5 35.5 126
## 4 BRONX stl 2020 Q4 85 112. -26.5 85