rm(list = ls())
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 524288 28.0 1167772 62.4 660385 35.3
## Vcells 951845 7.3 8388608 64.0 1770057 13.6
# Set working directory
directory <- "C:/Users/mikem/iCloudDrive/Spring 2025/DATA 712/Data sets"
setwd(directory)
# Set seed for reproducibility
set.seed(123)
# Load Data set
shootings <- read.csv("NYPD_Shooting_Data.csv", stringsAsFactors = FALSE)
For this assignment, you will use the NYPD_Shooting_Data.csv data set. This data set contains information on shootings in New York City from 2006 to 2023. The reason I chose this data set opposed to the NYC Disadvantaged Neighborhood Data Set that I had submitted for the previous assignment is that this data contains more binary variables, thus making it ideal to work with for the this assignment. This is the other major data set that I used for my ESS presentation, which will eventually be turned into my graduate thesis.
This data set was provided by the NYPD and is available on the NYC Open Data website.The link to the data set, along with other information pertaining to the data set, can be found here.
Of the 21 variables within this data set, 3 are binary variables. The variables are:
I had to install the below package due to my GitHub Co Pilot not working correctly. I am currently troubleshooting it and should have it working by the time the next assignment is due.
install.packages("remotes", repos = "https://cran.rstudio.com/") # If not installed
## Installing package into 'C:/Users/mikem/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'remotes' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\mikem\AppData\Local\Temp\Rtmp2fAbU3\downloaded_packages
remotes::install_github("IQSS/Zelig")
## Skipping install of 'Zelig' from a github remote, the SHA1 (4cfde99c) has not changed since last install.
## Use `force = TRUE` to force installation
install.packages("radiant.data", dependencies = TRUE, repos = "https://cran.rstudio.com/")
## Installing package into 'C:/Users/mikem/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'radiant.data' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\mikem\AppData\Local\Temp\Rtmp2fAbU3\downloaded_packages
library(radiant.data)
## Warning: package 'radiant.data' was built under R version 4.3.3
## Loading required package: magrittr
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.3.3
## Loading required package: lubridate
## Warning: package 'lubridate' was built under R version 4.3.2
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Loading required package: tidyr
## Warning: package 'tidyr' was built under R version 4.3.3
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Attaching package: 'radiant.data'
## The following objects are masked from 'package:lubridate':
##
## month, wday
## The following object is masked from 'package:ggplot2':
##
## diamonds
## The following object is masked from 'package:magrittr':
##
## set_attr
## The following object is masked from 'package:base':
##
## date
head(shootings)
## INCIDENT_KEY OCCUR_DATE OCCUR_TIME BORO LOC_OF_OCCUR_DESC PRECINCT
## 1 231974218 08/09/2021 01:06:00 BRONX 40
## 2 177934247 04/07/2018 19:48:00 BROOKLYN 79
## 3 255028563 12/02/2022 22:57:00 BRONX OUTSIDE 47
## 4 25384540 11/19/2006 01:50:00 BROOKLYN 66
## 5 72616285 05/09/2010 01:58:00 BRONX 46
## 6 85875439 07/22/2012 21:35:00 BRONX 42
## JURISDICTION_CODE LOC_CLASSFCTN_DESC LOCATION_DESC
## 1 0
## 2 0
## 3 0 STREET GROCERY/BODEGA
## 4 0 PVT HOUSE
## 5 0 MULTI DWELL - APT BUILD
## 6 2 MULTI DWELL - PUBLIC HOUS
## STATISTICAL_MURDER_FLAG PERP_AGE_GROUP PERP_SEX PERP_RACE VIC_AGE_GROUP
## 1 false 18-24
## 2 true 25-44 M WHITE HISPANIC 25-44
## 3 false (null) (null) (null) 25-44
## 4 true UNKNOWN U UNKNOWN 18-24
## 5 true 25-44 M BLACK <18
## 6 false 18-24 M BLACK 18-24
## VIC_SEX VIC_RACE X_COORD_CD Y_COORD_CD Latitude Longitude
## 1 M BLACK 1006343.0 234270.0 40.80967 -73.92019
## 2 M BLACK 1000082.9 189064.7 40.68561 -73.94291
## 3 M BLACK 1020691.0 257125.0 40.87235 -73.86823
## 4 M BLACK 985107.3 173349.8 40.64249 -73.99691
## 5 F BLACK 1009853.5 247502.6 40.84598 -73.90746
## 6 M BLACK 1011046.7 239814.2 40.82488 -73.90318
## Lon_Lat
## 1 POINT (-73.92019278899994 40.80967347200004)
## 2 POINT (-73.94291302299996 40.685609672000055)
## 3 POINT (-73.868233 40.872349)
## 4 POINT (-73.99691224999998 40.642489932000046)
## 5 POINT (-73.90746098599993 40.84598358900007)
## 6 POINT (-73.90317908399999 40.82487781900005)
library(texreg)
## Warning: package 'texreg' was built under R version 4.3.3
## Version: 1.39.4
## Date: 2024-07-23
## Author: Philip Leifeld (University of Manchester)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:magrittr':
##
## extract
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.3.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dplyr)
# Convert necessary variables to factors
shootings <- shootings %>%
mutate(
STATISTICAL_MURDER_FLAG = as.factor(STATISTICAL_MURDER_FLAG),
VIC_SEX = as.factor(VIC_SEX),
VIC_AGE_GROUP = as.factor(VIC_AGE_GROUP),
VIC_RACE = as.factor(VIC_RACE)
)
# Model 1: Null model (Intercept only)
m0 <- glm(STATISTICAL_MURDER_FLAG ~ 1, data = shootings, family = binomial)
# Model 2: Adding VIC_SEX
m1 <- glm(STATISTICAL_MURDER_FLAG ~ VIC_SEX, data = shootings, family = binomial)
# Model 3: Adding VIC_AGE_GROUP
m2 <- glm(STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP, data = shootings, family = binomial)
# Model 4: Adding VIC_RACE
m3 <- glm(STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP + VIC_RACE, data = shootings, family = binomial)
# Conduct Likelihood Ratio Tests (Comparing nested models)
lrt_m1 <- lrtest(m1, m0) # VIC_SEX vs Null
lrt_m2 <- lrtest(m2, m1) # Adding VIC_AGE_GROUP
lrt_m3 <- lrtest(m3, m2) # Adding VIC_RACE
# Print LRT results
print(lrt_m1)
## Likelihood ratio test
##
## Model 1: STATISTICAL_MURDER_FLAG ~ VIC_SEX
## Model 2: STATISTICAL_MURDER_FLAG ~ 1
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 3 -14029
## 2 1 -14030 -2 2.4651 0.2915
print(lrt_m2)
## Likelihood ratio test
##
## Model 1: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP
## Model 2: STATISTICAL_MURDER_FLAG ~ VIC_SEX
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 9 -13909
## 2 3 -14029 -6 240.13 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(lrt_m3)
## Likelihood ratio test
##
## Model 1: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP + VIC_RACE
## Model 2: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 15 -13886
## 2 9 -13909 -6 46.115 2.808e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Compute AIC and BIC for all models (fixing the print issue)
aic_values <- c(AIC(m0), AIC(m1), AIC(m2), AIC(m3))
bic_values <- c(BIC(m0), BIC(m1), BIC(m2), BIC(m3))
# Print AIC and BIC values properly
cat("AIC Values:\n", paste(aic_values, collapse = ", "), "\n")
## AIC Values:
## 28062.5113949385, 28064.0462947588, 27835.9139313256, 27801.7984486193
cat("BIC Values:\n", paste(bic_values, collapse = ", "), "\n")
## BIC Values:
## 28070.7712273806, 28088.8257920848, 27910.2524233037, 27925.6959352495
# Generate a well-organized HTML table for the models
htmlreg(list(m0, m1, m2, m3),
doctype = FALSE, # Removes unnecessary HTML tags
caption = "Progressive Logistic Regression Models",
custom.model.names = c("Null Model", "Model 1",
"Model 2",
"Model 3"),
file = "regression_comparison.html") # Saves the table to an HTML file
## The table was written to the file 'regression_comparison.html'.
# Open the results in a web browser
browseURL("regression_comparison.html")
The table presents the results of a logistic regression analysis examining factors associated with whether a shooting resulted in a murder. Model 1 includes victim sex, but the coefficients for male and Female (labeled as VIC_SEXU) sex are not statistically significant, suggesting that victim sex alone does not strongly predict the outcome. Model 2 adds victim age group, showing that older victims, particularly those aged 65 and above, have a significantly higher likelihood of the shooting resulting in a murder, while younger victims (18-24, 25-44, and 45-64) also show increased risk compared to the reference group. Model 3 introduces victim race, but the coefficients for racial groups do not appear statistically significant, potentially due to high standard errors. Across models, the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values decrease, indicating improved model fit with additional predictors. The likelihood ratio tests between models suggest that age group contributes significantly to predicting the outcome, whereas victim sex and race do not appear to have strong independent effects.
# Perform likelihood ratio test (LRT) using anova()
anova(m0, m1, test = "Chisq") # Compare Model 1 (VIC_SEX) to Null Model
## Analysis of Deviance Table
##
## Model 1: STATISTICAL_MURDER_FLAG ~ 1
## Model 2: STATISTICAL_MURDER_FLAG ~ VIC_SEX
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 28561 28061
## 2 28559 28058 2 2.4651 0.2915
anova(m1, m2, test = "Chisq") # Compare Model 2 (+VIC_AGE_GROUP) to Model 1
## Analysis of Deviance Table
##
## Model 1: STATISTICAL_MURDER_FLAG ~ VIC_SEX
## Model 2: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 28559 28058
## 2 28553 27818 6 240.13 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m2, m3, test = "Chisq") # Compare Model 3 (+VIC_RACE) to Model 2
## Analysis of Deviance Table
##
## Model 1: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP
## Model 2: STATISTICAL_MURDER_FLAG ~ VIC_SEX + VIC_AGE_GROUP + VIC_RACE
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 28553 27818
## 2 28547 27772 6 46.115 2.808e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Looking at the results of the likelihood ratio tests, we can see that the p-values for all the tests are less than 0.05, which indicates that the models are statistically significant. This means that the models are better than the null model. The best choice of model is the one with the lowest AIC and BIC values. In this case, Model 3 has the lowest AIC and BIC values, which means that it is the best model among the four models. This model includes the variables VIC_SEX, VIC_AGE and VIC_RACE when predicting the STATISTICAL_MURDER_FLAG.