DATA 698 - Project Proposal

Overview and Motivation

My DATA 698 Project will be based on NYC Crime. I was wondering how violent is the city, if some neighborhoods or type of persons are most affected, and if I could identify some patterns for shooting over time as well compare to other crimes. My primary goal is to take the tools and resources learned in the DATA SCIENCE MS courses such as time series, trend, seasonality, cyclical analysis as well as graphical visualizations to portray my findings. The Data sets found contain very useful information to perform such as analysis such as date of occurrence, type of offense, borough, age, race and gender. My hypothesis consists of identifying or try to pin point the specific zones/times more likely to have crime and analyze this deeply to come up with solutions correlating to the techniques used by other Crime Analysis experts in the literature chosen. The literature chosen such as “The crime numbers game: Management by manipulation” and “Understanding new York’s crime drop.” may also be useful to identify any possible fallacy or errors in the data.

A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend as “changing direction,” when it might go from an increasing trend to a decreasing trend. A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known frequency. A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency.

Exploration

Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.

It is a good practice to understand the data first and try to gather as many insights from it. EDA is all about making sense of data in hand,before getting them dirty with it.

In addition, after exploring the data as mentioned above we could possibly be able to identify what day of the week certain crimes may be committed, is there a specific time of day when the crimes occurr and which boroughs may be the safest.

The initial exploration for this Project began by me creating a shiny app[dashboard] in R to visualize the Open Source NYC Shooting Data. I created selection options for users to be able to view the data by year(2006-2020), select the type of incident[murder or non-murder shootings] and by borough. In addition to these options further information such as Total Number of Incidents, Percent Change vs. Previous Year and a Heatmap are provided.

This is viewable here: https://johnmazon90.shinyapps.io/nyc_shooting_app_jmazon/

NYC SHOOTING SHINY APP - BY JMAZON

Data Source

NYC OPEN DATA NYPD Complaint Data Historic

This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of last year (2019). For additional details, please see the attached data dictionary in the ‘About’ section.

List of every shooting incident that occurred in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every shooting incident that occurred in NYC going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents a shooting incident in NYC and includes information about the event, the location and time of occurrence. In addition, information related to suspect and victim demographics is also included. This data can be used by the public to explore the nature of shooting/criminal activity. Please refer to the attached data footnotes for additional information about this dataset.

The NYPD maintains statistical data which is used as a management tool in reducing crime, improving procedures and training, and providing transparency to the public and government oversight agencies. In 1994, the department implemented CompStat, which through management, statistics, and accountability, successfully drove down crime to record levels not seen since the 1950s.

The department provides up-to-date crime-related statistics in the seven major crime categories on the citywide, borough, and precinct levels, as well as historical crime data. The public can also access this data through the department’s CompStat 2.0 portal.

Tags: shooting, crime, law enforcement, public safety, nypd

DATA PREPARATION

df_nycshooting_historic <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Historic.csv")
df_nycshoot_ytd <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Year_To_Date_.csv")
#df_nycshoot_ytd <- read.csv("https://raw.githubusercontent.com/johnm1990/DATA698/main/NYPD_Shooting_Incident_Data__Year_To_Date_.csv")

head(df_nycshooting_historic)
##   INCIDENT_KEY OCCUR_DATE OCCUR_TIME          BORO PRECINCT JURISDICTION_CODE
## 1    201575314 08/23/2019   22:10:00        QUEENS      103                 0
## 2    205748546 11/27/2019   15:54:00         BRONX       40                 0
## 3    193118596 02/02/2019   19:40:00     MANHATTAN       23                 0
## 4    204192600 10/24/2019   00:52:00 STATEN ISLAND      121                 0
## 5    201483468 08/22/2019   18:03:00         BRONX       46                 0
## 6    198255460 06/07/2019   17:50:00      BROOKLYN       73                 0
##   LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX      PERP_RACE
## 1                                 false                                       
## 2                                 false            <18        M          BLACK
## 3                                 false          18-24        M WHITE HISPANIC
## 4     PVT HOUSE                    true          25-44        M          BLACK
## 5                                 false          25-44        M BLACK HISPANIC
## 6                                 false          45-64        M WHITE HISPANIC
##   VIC_AGE_GROUP VIC_SEX       VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude
## 1         25-44       M          BLACK    1037451     193561 40.69781 -73.80814
## 2         25-44       F          BLACK    1006789     237559 40.81870 -73.91857
## 3         18-24       M BLACK HISPANIC     999347     227795 40.79192 -73.94548
## 4         25-44       F          BLACK     938149     171781 40.63806 -74.16611
## 5         18-24       M          BLACK    1008224     250621 40.85455 -73.91334
## 6         25-44       M          BLACK    1009650     186966 40.67983 -73.90843
##                                         Lon_Lat
## 1 POINT (-73.80814071699996 40.697805308000056)
## 2  POINT (-73.91857061799993 40.81869973000005)
## 3 POINT (-73.94547965999999 40.791916091000076)
## 4  POINT (-74.16610830199996 40.63806398200006)
## 5  POINT (-73.91333944399999 40.85454734900003)
## 6  POINT (-73.90842523899994 40.67982701600005)
head(df_nycshoot_ytd)
##   INCIDENT_KEY OCCUR_DATE OCCUR_TIME     BORO PRECINCT JURISDICTION_CODE
## 1    229643180 06/16/2021   21:34:00 BROOKLYN       73                 0
## 2    233147632 09/03/2021   16:28:00    BRONX       43                 0
## 3    231637053 07/31/2021   22:36:00   QUEENS      115                 0
## 4    238041594 12/17/2021   12:00:00    BRONX       50                 0
## 5    228798560 05/27/2021   22:50:00   QUEENS      103                 0
## 6    226542151 04/05/2021   23:15:00 BROOKLYN       73                 2
##               LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX
## 1                                             false                        
## 2                                             false                        
## 3                                             false                        
## 4                                             false                        
## 5                                             false          18-24        M
## 6 MULTI DWELL - PUBLIC HOUS                    true          45-64        M
##   PERP_RACE VIC_AGE_GROUP VIC_SEX       VIC_RACE X_COORD_CD Y_COORD_CD Latitude
## 1                   25-44       M          BLACK    1006621     185426 40.67561
## 2                   18-24       F          BLACK    1022465     242774 40.83296
## 3                   18-24       M WHITE HISPANIC    1020765     213373 40.75227
## 4                   25-44       M          BLACK    1010914     260940 40.88286
## 5     BLACK         25-44       M          BLACK    1037559     194576 40.70059
## 6     BLACK           <18       F          BLACK    1010363     182581 40.66779
##   Longitude                      New.Georeferenced.Column
## 1 -73.91935  POINT (-73.91935098699997 40.67560823700006)
## 2 -73.86191  POINT (-73.86190538999993 40.83295950100006)
## 3 -73.86821  POINT (-73.86820844899995 40.75226896500004)
## 4 -73.90357  POINT (-73.90357448999998 40.88286213100001)
## 5 -73.80774  POINT (-73.80774319999993 40.70059059000005)
## 6 -73.90587 POINT (-73.90587160499997 40.667789106000036)
#head(df_nycshoot_ytd)

DATA SUMMARIES

summary(df_nycshoot_ytd)
##   INCIDENT_KEY        OCCUR_DATE         OCCUR_TIME            BORO          
##  Min.   :222524732   Length:2011        Length:2011        Length:2011       
##  1st Qu.:227647474   Class :character   Class :character   Class :character  
##  Median :230741371   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :230857771                                                           
##  3rd Qu.:234086538                                                           
##  Max.   :238490103                                                           
##     PRECINCT      JURISDICTION_CODE LOCATION_DESC      STATISTICAL_MURDER_FLAG
##  Min.   :  5.00   Min.   :0.0000    Length:2011        Length:2011            
##  1st Qu.: 42.00   1st Qu.:0.0000    Class :character   Class :character       
##  Median : 52.00   Median :0.0000    Mode  :character   Mode  :character       
##  Mean   : 61.82   Mean   :0.3148                                              
##  3rd Qu.: 79.00   3rd Qu.:0.0000                                              
##  Max.   :123.00   Max.   :2.0000                                              
##  PERP_AGE_GROUP       PERP_SEX          PERP_RACE         VIC_AGE_GROUP     
##  Length:2011        Length:2011        Length:2011        Length:2011       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    VIC_SEX            VIC_RACE           X_COORD_CD        Y_COORD_CD    
##  Length:2011        Length:2011        Min.   : 926731   Min.   :142481  
##  Class :character   Class :character   1st Qu.:1000891   1st Qu.:185415  
##  Mode  :character   Mode  :character   Median :1008795   Median :218912  
##                                        Mean   :1010338   Mean   :214861  
##                                        3rd Qu.:1017136   3rd Qu.:243782  
##                                        Max.   :1059372   Max.   :269635  
##     Latitude       Longitude      New.Georeferenced.Column
##  Min.   :40.56   Min.   :-74.21   Length:2011             
##  1st Qu.:40.68   1st Qu.:-73.94   Class :character        
##  Median :40.77   Median :-73.91   Mode  :character        
##  Mean   :40.76   Mean   :-73.91                           
##  3rd Qu.:40.84   3rd Qu.:-73.88                           
##  Max.   :40.91   Max.   :-73.73
summary(df_nycshooting_historic)
##   INCIDENT_KEY        OCCUR_DATE         OCCUR_TIME            BORO          
##  Min.   :  9953245   Length:23568       Length:23568       Length:23568      
##  1st Qu.: 55317014   Class :character   Class :character   Class :character  
##  Median : 83365370   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :102218616                                                           
##  3rd Qu.:150772442                                                           
##  Max.   :222473262                                                           
##                                                                              
##     PRECINCT      JURISDICTION_CODE LOCATION_DESC      STATISTICAL_MURDER_FLAG
##  Min.   :  1.00   Min.   :0.0000    Length:23568       Length:23568           
##  1st Qu.: 44.00   1st Qu.:0.0000    Class :character   Class :character       
##  Median : 69.00   Median :0.0000    Mode  :character   Mode  :character       
##  Mean   : 66.21   Mean   :0.3323                                              
##  3rd Qu.: 81.00   3rd Qu.:0.0000                                              
##  Max.   :123.00   Max.   :2.0000                                              
##                   NA's   :2                                                   
##  PERP_AGE_GROUP       PERP_SEX          PERP_RACE         VIC_AGE_GROUP     
##  Length:23568       Length:23568       Length:23568       Length:23568      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    VIC_SEX            VIC_RACE          X_COORD_CD         Y_COORD_CD       
##  Length:23568       Length:23568       Length:23568       Length:23568      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##     Latitude       Longitude        Lon_Lat         
##  Min.   :40.51   Min.   :-74.25   Length:23568      
##  1st Qu.:40.67   1st Qu.:-73.94   Class :character  
##  Median :40.70   Median :-73.92   Mode  :character  
##  Mean   :40.74   Mean   :-73.91                     
##  3rd Qu.:40.82   3rd Qu.:-73.88                     
##  Max.   :40.91   Max.   :-73.70                     
## 
unique(df_nycshoot_ytd$BORO)
## [1] "BROOKLYN"      "BRONX"         "QUEENS"        "MANHATTAN"    
## [5] "STATEN ISLAND"

DATA EXPLORATION

colnames(df_nycshoot_ytd)[!colnames(df_nycshoot_ytd) %in% colnames(df_nycshooting_historic)]
## [1] "New.Georeferenced.Column"
colnames(df_nycshooting_historic)
##  [1] "INCIDENT_KEY"            "OCCUR_DATE"             
##  [3] "OCCUR_TIME"              "BORO"                   
##  [5] "PRECINCT"                "JURISDICTION_CODE"      
##  [7] "LOCATION_DESC"           "STATISTICAL_MURDER_FLAG"
##  [9] "PERP_AGE_GROUP"          "PERP_SEX"               
## [11] "PERP_RACE"               "VIC_AGE_GROUP"          
## [13] "VIC_SEX"                 "VIC_RACE"               
## [15] "X_COORD_CD"              "Y_COORD_CD"             
## [17] "Latitude"                "Longitude"              
## [19] "Lon_Lat"
colnames(df_nycshoot_ytd)
##  [1] "INCIDENT_KEY"             "OCCUR_DATE"              
##  [3] "OCCUR_TIME"               "BORO"                    
##  [5] "PRECINCT"                 "JURISDICTION_CODE"       
##  [7] "LOCATION_DESC"            "STATISTICAL_MURDER_FLAG" 
##  [9] "PERP_AGE_GROUP"           "PERP_SEX"                
## [11] "PERP_RACE"                "VIC_AGE_GROUP"           
## [13] "VIC_SEX"                  "VIC_RACE"                
## [15] "X_COORD_CD"               "Y_COORD_CD"              
## [17] "Latitude"                 "Longitude"               
## [19] "New.Georeferenced.Column"
df_nycshoot_ytd <- rename(df_nycshoot_ytd, Lon_Lat = New.Georeferenced.Column)


df_merged <-  rbind(df_nycshooting_historic,df_nycshoot_ytd)
head(df_merged)
##   INCIDENT_KEY OCCUR_DATE OCCUR_TIME          BORO PRECINCT JURISDICTION_CODE
## 1    201575314 08/23/2019   22:10:00        QUEENS      103                 0
## 2    205748546 11/27/2019   15:54:00         BRONX       40                 0
## 3    193118596 02/02/2019   19:40:00     MANHATTAN       23                 0
## 4    204192600 10/24/2019   00:52:00 STATEN ISLAND      121                 0
## 5    201483468 08/22/2019   18:03:00         BRONX       46                 0
## 6    198255460 06/07/2019   17:50:00      BROOKLYN       73                 0
##   LOCATION_DESC STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX      PERP_RACE
## 1                                 false                                       
## 2                                 false            <18        M          BLACK
## 3                                 false          18-24        M WHITE HISPANIC
## 4     PVT HOUSE                    true          25-44        M          BLACK
## 5                                 false          25-44        M BLACK HISPANIC
## 6                                 false          45-64        M WHITE HISPANIC
##   VIC_AGE_GROUP VIC_SEX       VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude
## 1         25-44       M          BLACK    1037451     193561 40.69781 -73.80814
## 2         25-44       F          BLACK    1006789     237559 40.81870 -73.91857
## 3         18-24       M BLACK HISPANIC     999347     227795 40.79192 -73.94548
## 4         25-44       F          BLACK     938149     171781 40.63806 -74.16611
## 5         18-24       M          BLACK    1008224     250621 40.85455 -73.91334
## 6         25-44       M          BLACK    1009650     186966 40.67983 -73.90843
##                                         Lon_Lat
## 1 POINT (-73.80814071699996 40.697805308000056)
## 2  POINT (-73.91857061799993 40.81869973000005)
## 3 POINT (-73.94547965999999 40.791916091000076)
## 4  POINT (-74.16610830199996 40.63806398200006)
## 5  POINT (-73.91333944399999 40.85454734900003)
## 6  POINT (-73.90842523899994 40.67982701600005)
nrow(df_merged)
## [1] 25579
nrow(df_nycshoot_ytd)
## [1] 2011
nrow(df_nycshooting_historic)
## [1] 23568
factor_columns <- c("BORO","PRECINCT","JURISDICTION_CODE","LOCATION_DESC","STATISTICAL_MURDER_FLAG",
             "PERP_AGE_GROUP","PERP_AGE_GROUP","PERP_SEX", "PERP_RACE","VIC_AGE_GROUP","VIC_SEX","VIC_RACE")



df_merged <- df_merged %>% 
  mutate_at(factor_columns, factor)

#convert to character
df_merged <- df_merged %>% 
  mutate(INCIDENT_KEY = as.character(INCIDENT_KEY))




str(df_merged)
## 'data.frame':    25579 obs. of  19 variables:
##  $ INCIDENT_KEY           : chr  "201575314" "205748546" "193118596" "204192600" ...
##  $ OCCUR_DATE             : chr  "08/23/2019" "11/27/2019" "02/02/2019" "10/24/2019" ...
##  $ OCCUR_TIME             : chr  "22:10:00" "15:54:00" "19:40:00" "00:52:00" ...
##  $ BORO                   : Factor w/ 5 levels "BRONX","BROOKLYN",..: 4 1 3 5 1 2 2 2 4 2 ...
##  $ PRECINCT               : Factor w/ 77 levels "1","5","6","7",..: 61 23 14 75 29 46 52 40 72 42 ...
##  $ JURISDICTION_CODE      : Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 3 1 ...
##  $ LOCATION_DESC          : Factor w/ 40 levels "","ATM","BANK",..: 1 1 1 29 1 1 1 25 26 1 ...
##  $ STATISTICAL_MURDER_FLAG: Factor w/ 2 levels "false","true": 1 1 1 2 1 1 1 2 1 1 ...
##  $ PERP_AGE_GROUP         : Factor w/ 10 levels "","<18","1020",..: 1 2 4 6 6 7 4 1 4 6 ...
##  $ PERP_SEX               : Factor w/ 4 levels "","F","M","U": 1 3 3 3 3 3 3 1 3 3 ...
##  $ PERP_RACE              : Factor w/ 8 levels "","AMERICAN INDIAN/ALASKAN NATIVE",..: 1 4 8 4 5 8 4 1 4 4 ...
##  $ VIC_AGE_GROUP          : Factor w/ 6 levels "<18","18-24",..: 3 3 2 3 2 3 3 3 3 3 ...
##  $ VIC_SEX                : Factor w/ 3 levels "F","M","U": 2 1 2 1 2 2 2 2 2 2 ...
##  $ VIC_RACE               : Factor w/ 7 levels "AMERICAN INDIAN/ALASKAN NATIVE",..: 3 3 4 3 3 3 3 3 3 3 ...
##  $ X_COORD_CD             : chr  "1037451" "1006789" "999347" "938149" ...
##  $ Y_COORD_CD             : chr  "193561" "237559" "227795" "171781" ...
##  $ Latitude               : num  40.7 40.8 40.8 40.6 40.9 ...
##  $ Longitude              : num  -73.8 -73.9 -73.9 -74.2 -73.9 ...
##  $ Lon_Lat                : chr  "POINT (-73.80814071699996 40.697805308000056)" "POINT (-73.91857061799993 40.81869973000005)" "POINT (-73.94547965999999 40.791916091000076)" "POINT (-74.16610830199996 40.63806398200006)" ...
summary(df_merged)
##  INCIDENT_KEY        OCCUR_DATE         OCCUR_TIME                   BORO      
##  Length:25579       Length:25579       Length:25579       BRONX        : 7401  
##  Class :character   Class :character   Class :character   BROOKLYN     :10353  
##  Mode  :character   Mode  :character   Mode  :character   MANHATTAN    : 3264  
##                                                           QUEENS       : 3823  
##                                                           STATEN ISLAND:  738  
##                                                                                
##                                                                                
##     PRECINCT     JURISDICTION_CODE                   LOCATION_DESC  
##  75     : 1462   0   :21316                                 :14977  
##  73     : 1370   1   :   59        MULTI DWELL - PUBLIC HOUS: 4549  
##  67     : 1161   2   : 4202        MULTI DWELL - APT BUILD  : 2662  
##  79     :  981   NA's:    2        PVT HOUSE                :  894  
##  44     :  950                     GROCERY/BODEGA           :  620  
##  47     :  900                     BAR/NIGHT CLUB           :  584  
##  (Other):18755                     (Other)                  : 1293  
##  STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX           PERP_RACE    
##  false:20663                    :9508    : 9474   BLACK         :10498  
##  true : 4916             18-24  :5784   F:  370                 : 9474  
##                          25-44  :5101   M:14231   WHITE HISPANIC: 2137  
##                          UNKNOWN:3156   U: 1504   UNKNOWN       : 1869  
##                          <18    :1449             BLACK HISPANIC: 1188  
##                          45-64  : 521             WHITE         :  272  
##                          (Other):  60             (Other)       :  141  
##  VIC_AGE_GROUP   VIC_SEX                             VIC_RACE    
##  <18    : 2681   F: 2394   AMERICAN INDIAN/ALASKAN NATIVE:    9  
##  18-24  : 9601   M:23165   ASIAN / PACIFIC ISLANDER      :  347  
##  25-44  :11370   U:   20   BLACK                         :18258  
##  45-64  : 1693             BLACK HISPANIC                : 2484  
##  65+    :  168             UNKNOWN                       :  102  
##  UNKNOWN:   66             WHITE                         :  655  
##                            WHITE HISPANIC                : 3724  
##   X_COORD_CD         Y_COORD_CD           Latitude       Longitude     
##  Length:25579       Length:25579       Min.   :40.51   Min.   :-74.25  
##  Class :character   Class :character   1st Qu.:40.67   1st Qu.:-73.94  
##  Mode  :character   Mode  :character   Median :40.70   Median :-73.92  
##                                        Mean   :40.74   Mean   :-73.91  
##                                        3rd Qu.:40.82   3rd Qu.:-73.88  
##                                        Max.   :40.91   Max.   :-73.70  
##                                                                        
##    Lon_Lat         
##  Length:25579      
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

TIME SERIES ANALYSIS : QUARTERLY

df_merged$OCCUR_DATE <- mdy(df_merged$OCCUR_DATE)

n_occur <- data.frame(table(df_merged$INCIDENT_KEY))
head(n_occur[n_occur$Freq > 1,])
##        Var1 Freq
## 11 10038637    2
## 22 10137408    2
## 24 10137411    3
## 25 10137412    2
## 27 10137414    2
## 32 10137422    2
df_merged_single <- df_merged[df_merged$INCIDENT_KEY %in% n_occur$Var1[n_occur$Freq == 1],]


df_merged_single <- df_merged_single %>%
  mutate(Quarter = yearquarter(OCCUR_DATE)) %>%
  select(-OCCUR_DATE) %>%
  as_tsibble(key = INCIDENT_KEY,
             index = Quarter)


names(df_merged_single)
##  [1] "INCIDENT_KEY"            "OCCUR_TIME"             
##  [3] "BORO"                    "PRECINCT"               
##  [5] "JURISDICTION_CODE"       "LOCATION_DESC"          
##  [7] "STATISTICAL_MURDER_FLAG" "PERP_AGE_GROUP"         
##  [9] "PERP_SEX"                "PERP_RACE"              
## [11] "VIC_AGE_GROUP"           "VIC_SEX"                
## [13] "VIC_RACE"                "X_COORD_CD"             
## [15] "Y_COORD_CD"              "Latitude"               
## [17] "Longitude"               "Lon_Lat"                
## [19] "Quarter"
nycshooting_grouped <- df_merged_single %>%
  index_by(Quarter) %>%
  summarize(shootings = n())
autoplot(nycshooting_grouped, shootings) +
  labs(title = "NYC SHOOTING",
       subtitle = "Quarter",
       y = "Shootings")

SEASONALITY ANALYSIS BY BORO

TIMES SERIES DECOMPOSITION

nycshooting_grouped %>%
  gg_season(shootings, labels = "both") +
  labs(y = " ",
       title = "Seasonal plot: NYC SHOOTING")

table(df_merged_single$BORO)
## 
##         BRONX      BROOKLYN     MANHATTAN        QUEENS STATEN ISLAND 
##          4593          7127          2083          2545           506

STL DECOMPOSITION

shootings_by_Boro <- df_merged_single %>%
  group_by(BORO) %>%
  summarise(shootings = n())
shootings_by_Boro %>%
  ggplot(aes(x = Quarter, y = shootings)) +
  geom_line() +
  facet_grid(vars(BORO), scales = "free_y") +
  labs(title = "NYC Shooting by Boro",
       y= "# of Shootings")

#STL
nycshooting_grouped %>%
  model(
    STL(shootings ~ trend(window = 7) +
                   season(window = "periodic"),
    robust = TRUE)) %>%
  components() %>%
  autoplot()

#simple stats
nycshooting_grouped %>%
  features(shootings, list(mean = mean)) %>%
  arrange(mean)
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  263.
#ACF
nycshooting_grouped %>% features(shootings, feat_acf)
## # A tibble: 1 x 7
##    acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 season_acf1
##   <dbl> <dbl>      <dbl>       <dbl>      <dbl>       <dbl>       <dbl>
## 1 0.445  1.12    -0.0594        2.21     -0.155        1.90       0.663
#STL
#used shootings by boro instead
shootings_by_Boro%>%
  features(shootings, feat_stl)
## # A tibble: 5 x 10
##   BORO         trend_strength seasonal_streng~ seasonal_peak_y~ seasonal_trough~
##   <fct>                 <dbl>            <dbl>            <dbl>            <dbl>
## 1 BRONX                 0.795            0.771                3                1
## 2 BROOKLYN              0.854            0.823                3                1
## 3 MANHATTAN             0.826            0.722                3                1
## 4 QUEENS                0.757            0.672                3                1
## 5 STATEN ISLA~          0.432            0.454                3                1
## # ... with 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
## #   stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

The feat_acf() function computes a selection of the autocorrelations discussed here. It will return six or seven features:

TIME SERIES FEATURES

SEASONAL STRENGTH

compare with other categorical

#include in shiny app the other categorical v.

boroandrace_shootings <- df_merged_single %>%
  group_by(BORO,VIC_AGE_GROUP) %>%
  summarise(shootings = n())




boroandrace_shootings %>%
  features(shootings, feat_stl) %>%
  ggplot(aes(x = trend_strength, y = seasonal_strength_year,
             col = VIC_AGE_GROUP)) +
  geom_point() +
  facet_wrap(vars(BORO))
## Warning: 1 error encountered for feature 1
## [1] 'degree' must be less than number of unique points
## Warning: Removed 5 rows containing missing values (geom_point).

LAG PLOTS AND AUTOCORRELATION

#lag plots
lag_boroandrace_shootings <- shootings_by_Boro %>%
  filter(BORO == 'BRONX')

lag_boroandrace_shootings <- lag_boroandrace_shootings %>%
  filter(year(Quarter) == 2020)

lag_boroandrace_shootings %>%
  gg_lag(shootings, geom = "point") +
  labs(x = "lag(shootings, k)")

#autocorrelation
lag_boroandrace_shootings %>% ACF(shootings, lag_max = 9)
## # A tsibble: 3 x 3 [1Q]
## # Key:       BORO [1]
##   BORO    lag     acf
##   <fct> <lag>   <dbl>
## 1 BRONX    1Q  0.0519
## 2 BRONX    2Q -0.498 
## 3 BRONX    3Q -0.0539
# recent_production %>%
ACF(shootings_by_Boro) %>%
autoplot() + labs(title="NYC SHOOTINGS")
## Response variable not specified, automatically selected `var = shootings`

#decomp
dcmp <- lag_boroandrace_shootings %>%
  model(stl = STL(shootings))
components(dcmp)
## # A dable: 4 x 7 [1Q]
## # Key:     BORO, .model [1]
## # :        shootings = trend + remainder
##   BORO  .model Quarter shootings trend remainder season_adjust
##   <fct> <chr>    <qtr>     <int> <dbl>     <dbl>         <dbl>
## 1 BRONX stl    2020 Q1        31  48.5     -17.5            31
## 2 BRONX stl    2020 Q2        78  69.5       8.5            78
## 3 BRONX stl    2020 Q3       126  90.5      35.5           126
## 4 BRONX stl    2020 Q4        85 112.      -26.5            85