INTRO: For my final project I decided to use the “Index, Violent, Property, and Firearm Rates By County” dataset, which came directly from the free New York data government website. The question I set out to answer was a two parter: Was there a correlation between firearm rate and overall crime, as well as which County has the most crime? The parts that were necessary to answer this question were Population, Year, Firearm Rate: how many firearm related instances were reported every year per 100,000 people, Violent Rate: how many murders, rapes, robberies and aggravated assaults were reported every year per 100,000, and Property Rate: how many burglaries, larcenies and motor vehicle thefts were reported every year per 100,000 people. (data:https://data.ny.gov/Public-Safety/Index-Violent-Property-and-Firearm-Rates-By-County/34dd-6g2j/about_data )
From lines 12-18 I load in the dataset and the appropriate packages I’ll need to perform all my visualizations.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Rows: 2108 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): County
dbl (10): Year, Population, Index Count, Index Rate, Violent Count, Violent ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
From lines 22-24 I use the filter and !is.na commands to remove any NAs from the dataset.
From lines 28-32 I summon a table to see how many values are in each County. I decided to use the five Counties Bronx, Kings, New York, Queens, and Richmond, as they correspond with the five boroughs of NYC. For whatever reason these counties were missing data from the 2006-2014, so I just filtered to only include 2015-2022.
table(IDX$County)
Albany Allegany Bronx Broome Cattaraugus Cayuga
34 34 21 34 34 34
Chautauqua Chemung Chenango Clinton Columbia Cortland
34 34 34 34 34 34
Delaware Dutchess Erie Essex Franklin Fulton
34 34 34 34 34 34
Genesee Greene Hamilton Herkimer Jefferson Kings
34 34 34 34 34 21
Lewis Livingston Madison Monroe Montgomery Nassau
34 34 34 34 34 34
New York Niagara Oneida Onondaga Ontario Orange
21 34 34 34 34 34
Orleans Oswego Otsego Putnam Queens Rensselaer
34 34 34 34 21 34
Richmond Rockland Saratoga Schenectady Schoharie Schuyler
21 34 34 34 34 34
Seneca St Lawrence Steuben Suffolk Sullivan Tioga
34 34 34 34 34 34
Tompkins Ulster Warren Washington Wayne Westchester
34 34 34 34 34 34
Wyoming Yates
34 34
Regression: From lines 35-38 I performed a linear regression comparing Firearm rate to Violent Rate and Property Rate ith P-values of 2.23e-14 and 0.0525, respectfully. Based on this, I hypothesize that there is a strong correlation between Firearm reports and Violent/Property Crime.
lm_model <-lm(`Firearm Rate`~`Violent Rate`+`Property Rate`, data = IDX2)summary(lm_model)
Call:
lm(formula = `Firearm Rate` ~ `Violent Rate` + `Property Rate`,
data = IDX2)
Residuals:
Min 1Q Median 3Q Max
-41.814 -7.078 0.549 9.311 35.654
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.440434 8.286313 -1.260 0.2156
`Violent Rate` 0.165634 0.013740 12.055 2.23e-14 ***
`Property Rate` -0.010629 0.005305 -2.004 0.0525 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.52 on 37 degrees of freedom
Multiple R-squared: 0.8235, Adjusted R-squared: 0.814
F-statistic: 86.31 on 2 and 37 DF, p-value: 1.161e-14
From lines 42-50 I created a scatterplot showing the correlation between Violent Crime and Firearm Rates in the five counties. As we can see there is a substantial correlation between the two variables.
ggplot(data = IDX2, aes(x =`Firearm Rate`, y =`Violent Rate`)) +geom_point(color ="black", alpha =0.6) +geom_smooth(method ="lm", color ="red", se =FALSE) +labs( title ="Firearm Rate vs Violent Crime Rate in New York City Counties",x ="Insidences involving Firearms",y ="Violent Crime Rate") +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
From lines 54-62 I created a second scatterplot showing the correlation between Property Crime and Firearm Rates in the five counties. As we can see there is a relatively strong correlation between the two variables, but not as strong as with the Violent Crimes Scatterplot.
ggplot(data = IDX2, aes(x =`Firearm Rate`, y =`Property Rate`)) +geom_point(color ="black", alpha =0.6) +geom_smooth(method ="lm", color ="blue", se =FALSE) +labs(title ="Firearm Rate vs Violent Crime Rate in New York City Counties",x ="Insidences involving Firearms",y ="Property Crime Rate") +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Regression: From lines 66-70 I performed a linear regression comparing Firearm rate between the counties. To do this I had to convert County to a factor using the as.factor command, hence why the county Bronx is missing, as it’s the reference level for the regression. Because Bronx (Intercept) has the smallest p-value, 5.34e-15, I hypothesize it will have the most overall crime.
IDX2$County <-as.factor(IDX2$County)lm_model2 <-lm(`Firearm Rate`~ County, data = IDX2)summary(lm_model2)
Call:
lm(formula = `Firearm Rate` ~ County, data = IDX2)
Residuals:
Min 1Q Median 3Q Max
-44.962 -14.838 -3.125 9.981 89.937
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 124.262 9.528 13.041 5.34e-15 ***
CountyKings -44.250 13.475 -3.284 0.00233 **
CountyNew York -66.787 13.475 -4.956 1.83e-05 ***
CountyQueens -75.225 13.475 -5.583 2.74e-06 ***
CountyRichmond -89.325 13.475 -6.629 1.16e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.95 on 35 degrees of freedom
Multiple R-squared: 0.605, Adjusted R-squared: 0.5598
F-statistic: 13.4 on 4 and 35 DF, p-value: 1.011e-06
From lines 74-78 I created a line chart that shows the differences in Firearm Rate by each County by year. All are in an upward trend since 2019, but the Bronx is the clear standout with an additional 100+ reports per 100k compared to the next County.
hc1 <- IDX2 %>%hchart('line', hcaes(x = Year, y =`Firearm Rate`, group = County)) %>%hc_colors(c("#00AFBB", "green", "#FC4E07", "orange", "hotpink"))hc1
From lines 83-86 I altered the dataset to include the ten most populous counties to use for the following graph.
From lines 91 to 110 I created a lollipop graph to demonstrate the differences in overall crime between the counties. First I used the mean() function to combine Property Rate and Violent Rate. Second I used the group_by() function to group the counties together. Third I used the summarise() function to Property and Violent Rate and then I used that summary to mutate the column County. Fourth I added a y-intercept to show the mean. Lastly I used geom_point in conjunction geom_segement to show the lines and points on the graph. I added extra counties for the visualization because it works better with 10 points. As we can see New York County has the most overall crime.
overall_mean_crime <-mean(IDX3$`Property Rate`+ IDX3$`Violent Rate`, na.rm =TRUE)Ovrl_Cr <- IDX3 %>%group_by(County) %>%summarise(mean_crime =mean(`Property Rate`+`Violent Rate`, na.rm =TRUE)) %>%mutate(County =fct_reorder(County, mean_crime)) %>%ggplot(aes(x = County, y = mean_crime)) +labs(title ="Overall Crime Rate Per County",x ="Counties",y ="Rate") +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1), axis.text.y =element_text(face ="bold"),plot.title =element_text(hjust =0.5, size =20, face ="bold")) +geom_hline(yintercept = overall_mean_crime, color ="lightblue", size =1) +geom_segment(aes(xend = County, y = overall_mean_crime, yend = mean_crime), color ="violet") +geom_point(aes(color = mean_crime), size =4) +scale_color_gradient(low ="navy", high ="lightgreen")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Ovrl_Cr
From lines 115-126 I made two codes. First I used the filter function to solely get 2022 data. For the second code I used the function data.frame() to manually add the coordinates of each county I got the Longitude and Latitude from a website (check references).
County Latitude Longitude
1 Bronx 40.83705 -73.86543
2 Kings 40.65000 -73.96000
3 Queens 40.73000 -73.79000
4 Richmond 40.58333 -74.15000
5 New York 40.78000 -73.97000
6 Nassau 40.74722 -73.63806
7 Suffolk 40.93333 -72.68333
8 Westchester 41.15000 -73.76667
9 Erie 42.70000 -78.75000
10 Monroe 43.30000 -77.68333
From lines 130-133 I joined the two previously created datasets into one called “merged”. By doing this I’ve matched the coordinates to the proper County.
merged <- IDX4 %>%left_join(IDX5, by ="County")
From lines 138-144 I created a popup for the final visualization.
From lines 148-158 I created a map to show the location of each county. The radius is also based off of the Firearm rate, meaning larger the circle the higher the firearm rate. This shows that Bronx County has the most cases of Violent and Firearm Rate, showing that it has the most violent County.
Assuming "Longitude" and "Latitude" are longitude and latitude, respectively
CONCLUSION: Based on the analysis of the “Index, Violent, Property, and Firearm Rates By County” dataset, several significant insights were drawn regarding crime patterns in the five counties of New York City. The analysis confirms that firearm rates are a significant predictor of violent crimes in New York City’s boroughs, with the Bronx being disproportionately affected. While New York County leads in overall crime rates due to property crimes, the Bronx faces the highest levels of violence and firearm-related incidents. These findings highlight the need for targeted interventions in the Bronx to address violent crime and firearm usage while continuing to monitor overall crime trends across the city. Future research could include investigating the relationship between socioeconomic variables such as median income, unemployment rates and education levels to see how they correspond to crime rate. Additionally, comparing New York to other large cities and their subsequent counties could further show how New York compares to the rest of the country on these issues.
References: New York State Index Data: https://data.ny.gov/Public-Safety/Index-Violent-Property-and-Firearm-Rates-By-County/34dd-6g2j/about_data U.S County coordinates data: https://public.opendatasoft.com/explore/dataset/us-county-boundaries/table/?flg=en-us&disjunctive.statefp&disjunctive.countyfp&disjunctive.name&disjunctive.namelsad&disjunctive.stusab&disjunctive.state_name