Gun Violence in the US: Exploratory Data Analysis

Load libraries:

library(ggplot2)
library(readr)
library(RColorBrewer)
library(reshape2)
library(dplyr)
library(plotly)
library(scales)

Read the dataset:

df <- read_csv("data/gun-violence-data_01-2013_03-2018.csv")

There are a few columns we don’t care about - drop those

gun <- select(df, -c(incident_url, source_url, incident_url_fields_missing, congressional_district, sources, state_house_district, state_senate_district))

How many gun casualties (defined as number killed and injured) were there for each year?

gun %>% group_by(Year = format(date, "%Y")) %>% 
      summarize(n = sum(n_injured + n_killed)) %>% 
      ggplot(aes(x = Year, y = n)) + geom_col()

Clearly there are missing data from 2013, and 2018 isn’t complete yet. Let’s drop these two years. Let’s also remove the suicides, incidents involving replica weapons, and officer involved shootings.

gun <- filter(gun, format(date, "%Y") != 2018 & format(date, "%Y") != 2013)

# Remove suicides
gun <- filter(gun, !(grepl("\\|\\|Suicide\\^", incident_characteristics)
                      & n_killed + n_injured < 2))

# Remove shootings involving replica weapons and officer involved shootings
gun <- filter(gun, !grepl("Officer Involved Shooting", incident_characteristics, fixed = TRUE))

gun <- filter(gun, !grepl("Replica", incident_characteristics, fixed = TRUE))

How many gun casualties were there on average from 2014-2017?

totalCasualties <- gun %>% 
      group_by(year = format(date, "%Y")) %>% 
      summarize(casualties = sum(n_killed + n_injured), 
                fatalities = sum(n_killed), 
                injuries = sum(n_injured))

colMeans(totalCasualties[,2:4])

## casualties fatalities   injuries 
##   38528.00   12287.25   26240.75

Over 38,000 people per year were either killed (12,000+) or injured (26,000+) by a firearm from 2014-2017. That’s a little over 100 people each day, and does not include victims of suicide.

Most Dangerous States

How many gun casualties were there in each state for 2014-2017?

cols <- brewer.pal(3, "Dark2")
gun %>% group_by(state) %>% 
      summarize(n_casualty = sum(n_killed + n_injured)) %>% 
      ggplot(aes(x = reorder(state, -n_casualty), y = n_casualty)) + 
      geom_col(fill = cols[1]) + 
      theme(axis.text.x = element_text(angle = 90, vjust = 0.5, 
                                       hjust = 1, size = 14)) + 
      theme(axis.title.y = element_text(size = 16)) + 
      theme(plot.title = element_text(size = 16)) + 
      labs(x = NULL, y = "Casualties (Injuries + Fatalities)", 
           title = "Number of Gun Casualties in the US by State for 2014-2017")

The top 5 states with the highest casualty rates are Illinois, California, Texas, Florida, and Ohio. However, these are also highly populated states. What we really want to know is the per capita rate, since population is an important factor. I got census data for each state from 2010-2017 from the U.S. Census Bureau, and then calculated the per capita rates (per hundred thousand people) for each year:

# READ IN US POPULATION DATA BY STATE
colNames <- c("id1", "id2", "state", "census2010", "cenbase2010", "2010", "2011", "2012", "2013", "2014", "2015", "2016", "2017")

pop <- read_csv("data/US\ Population\ by\ State/PEP_2017_PEPANNRES_with_ann.csv", skip = 2, col_names = colNames)

# Only keep the columns and rows we want
pop <- select(pop, id1, state, "2014", "2015", "2016", "2017") %>% 
      slice(6:n())

# Casualties by state and year; I removed Washington DC from the state rankings, since it's technically a city, not a state
casByState <- group_by(gun, state, year = format(date, "%Y")) %>% 
      filter(state != "District of Columbia") %>% 
      summarize(n.casualty = sum(n_killed + n_injured), 
                n.injured = sum(n_injured), 
                n.killed = sum(n_killed))

# Add the population and the per capita rate (per 100K people)

# First, generate the population vector:
popList <- numeric()
for (i in 1:nrow(casByState)) {
      popList <- c(popList, as.numeric(pop[pop$state == casByState$state[i], casByState$year[i]]))
}

casByState <- ungroup(casByState) %>% 
      mutate(population = popList, 
             cas.per.capita100K = n.casualty/population*100000,
             k.per.capita100K = n.killed/population*100000, 
             i.per.capita100K = n.injured/population*100000)

Now that we have the per-state data for each year, let’s take a look at it:

## # A tibble: 200 x 9
##    state  year  n.casualty n.injured n.killed population cas.per.capita10…
##    <chr>  <chr>      <int>     <int>    <int>      <dbl>             <dbl>
##  1 Alaba… 2014         857       553      304   4840037.             17.7 
##  2 Alaba… 2015         899       541      358   4850858.             18.5 
##  3 Alaba… 2016        1162       724      438   4860545.             23.9 
##  4 Alaba… 2017        1322       824      498   4874747.             27.1 
##  5 Alaska 2014          62        46       16    736759.              8.42
##  6 Alaska 2015         129        78       51    737979.             17.5 
##  7 Alaska 2016         145        89       56    741522.             19.6 
##  8 Alaska 2017         108        64       44    739795.             14.6 
##  9 Arizo… 2014         355       182      173   6706435.              5.29
## 10 Arizo… 2015         326       162      164   6802262.              4.79
## # ... with 190 more rows, and 2 more variables: k.per.capita100K <dbl>,
## #   i.per.capita100K <dbl>

To make plotting a little simpler, let’s take the mean of each per capita rate for each state across each year. So, for each state there will be four entries: 2014, 2015, 2016, and 2017.

meanCBS <- group_by(casByState, state) %>% 
      summarize(meanCasRate = mean(cas.per.capita100K), 
                meanKRate = mean(k.per.capita100K), 
                meanIRate = mean(i.per.capita100K))

Now let’s plot it and see:

ggplot(data = meanCBS, 
      aes(x = reorder(state, -meanCasRate), y = meanCasRate)) + 
      geom_col(fill = cols[2]) + 
      theme(axis.text.x  = element_text(angle = 90, vjust = 0.5, 
                                        hjust = 1, size = 14)) + 
      theme(axis.title.y = element_text(size = 16)) + 
      theme(plot.title = element_text(size = 16)) + 
      labs(x = NULL, y = "Mean Casualty Rate per 100,000 People", 
           title = "Mean Casualty Rate per 100,000 People by State (2014-2017)") +
      geom_hline(yintercept = median(meanCBS$meanCasRate), lwd = 1) +
      annotate("text", label = "Median Casualty Rate", x = 40, y = 12.5, size = 6)

The black line represents the median value for all 50 states. The rankings have changed significantly. The top 5 states with the highest per capita casualty rate are Louisiana, Illinois, Delaware, Mississippi, and Alabama. The only state that remained in the top 5 after adjusting for population was Illinois. Clearly, it is critical to consider population when talking about gun violence rates across the country (which does not always happen).

Most Dangerous Cities

Let’s now look at the cities with the highest rates of per capita gun violence. I again used data from the U.S. Census Bureau to find the population of cities in the United States. The dataset I used contained data from 2010-2016 for all cities greater than 50,000 residents. I had to do some manipulation of the city names in order to make them match the cities in the original dataset. Also, since there are no population data for 2017, I just used the population values from 2016 and assumed there wouldn’t be any significant changes.

# READ IN US POPULATION DATA BY CITY
colNames <- c("id1", "id2", "country", "tid1", "tid2", "rank", "geography1", "city", "census2010", "cenbase2010", "2010", "2011", "2012", "2013", "2014", "2015", "2016")

citypop <- read_csv("data/US\ Population\ by\ City/PEP_2016_PEPANNRSIP.US12A_with_ann.csv", skip = 2, col_names = colNames)

# Only keep the columns and rows we want (drop all the "government" and "county" entries)
citypop <- select(citypop, tid2, city, "2014", "2015", "2016") %>%
      filter(!grepl("[Cc]ounty|government", city))

# Reformatting city names
cityState <- sub("^(.+) (city( \\(balance\\))?|municipality|village|town), (.+)", "\\1, \\4", citypop$city) %>% 
      strsplit(", ")

# Split up the city column into city and state columns
cityState <- as.data.frame(matrix(unlist(cityState), nrow = length(cityState), 
                                  byrow = TRUE), stringsAsFactors = FALSE)
names(cityState) <- c("city", "state")

# Ventura, CA has a nonstandard entry; fix it
cityState$city[grep("\\(", cityState$city)] <- "Ventura"

# In the original gun violence data, the cities listed for New York are split into the five boroughs. So, for example, Brooklyn and The Bronx are separate entries, when they need to be considered "New York" for our purposes. Let's fix that manually.
ny <- filter(gun, state == "New York")
nycInd <- grep("Manhattan|Queens|Bronx|Staten Island|Brooklyn|New York City", ny$city_or_county)

nyc <- ny[nycInd,] %>% 
      filter(city_or_county != "Queensbury")

# Now go through the gun dataframe and change all relevant city_or_state entries to "New York"
for(i in 1:nrow(nyc)) {
      incident <- nyc$incident_id[i]
      index <- which(gun$incident_id == incident)
      gun[index,"city_or_county"] <- "New York"
}

# Integrate the new columns back to the data frame and rearrange
# Also, since we are missing data for 2017, just reuse the 2016 column
citypop <- mutate(citypop, 
                  city = cityState$city, 
                  state = cityState$state, "2017" = citypop$`2016`) %>% 
      select(tid2, city, state, "2014", "2015", "2016", "2017")

Get the data by city for each year. Note: some years will be missing because those cities did not have any reported gun crimes in those years. Although this will affect the mean values for cities without any reported gun crimes in some years, we are only going to look at the top 50 cities. For these cities to be in the top 50, it stands to reason that there will be crimes from all four years.

casByCity <- gun %>% 
      group_by(city_or_county, state, year = format(date, "%Y")) %>%
      summarize(n.casualty = sum(n_killed + n_injured), 
                n.killed = sum(n_killed), 
                n.injured = sum(n_injured))
casByCity

## # A tibble: 28,663 x 6
## # Groups:   city_or_county, state [?]
##    city_or_county  state          year  n.casualty n.killed n.injured
##    <chr>           <chr>          <chr>      <int>    <int>     <int>
##  1 Abbeville       Louisiana      2014           4        1         3
##  2 Abbeville       Louisiana      2015           5        0         5
##  3 Abbeville       Louisiana      2016           4        1         3
##  4 Abbeville       Louisiana      2017           3        1         2
##  5 Abbeville       South Carolina 2014           1        0         1
##  6 Abbeville       South Carolina 2016           1        0         1
##  7 Abbeville       South Carolina 2017           3        1         2
##  8 Abbotsford      Wisconsin      2015           0        0         0
##  9 Abbott Township Pennsylvania   2014           1        0         1
## 10 Abbottstown     Pennsylvania   2017           0        0         0
## # ... with 28,653 more rows

Now, we need a per capita estimate for each city. First we need to generate a vector of populations: figure out if the city in our data frame is actually in the population dataset, and if it is, grab the population for that particular year. Again, the 2017 population values are the same as the 2016 ones.

# Get a per capita (100K) estimate for each city
# First, generate the population vector:
popList <- numeric()
for (i in 1:nrow(casByCity)) {
      theCity <- casByCity$city_or_county[i]
      theState <- casByCity$state[i]
      theYear <- casByCity$year[i]
      ind <- which(citypop$city == theCity & citypop$state == theState)
      
      if (length(ind) > 0 && 
          theCity == citypop$city[ind] && 
          theState == citypop$state[ind]) {
            popList <- c(popList, 
                         as.numeric(citypop[citypop$state == theState & 
                                                  citypop$city == theCity, theYear]))
      } else {
            popList <- c(popList, NA)
      }
}

We’re also going to need the two letter abbreviations for each state for labels later, so let’s grab those and put them in the data frame as well. Don’t forget Washington, D.C.!:

# Get the short name for each state and attach it to the tibble
data(state)
abbs <- data.frame("abb" = state.abb, "state" = state.name, 
                   stringsAsFactors = FALSE)
abbs <- rbind(abbs, c("DC", "District of Columbia"), stringsAsFactors = FALSE)
abbList <- unlist(lapply(casByCity$state, 
                         function(x) abbs[which(abbs$state == x),]$abb))

Add the population and abbrevation columns, and calculate the mean values for each each state:

meanCBC <- ungroup(casByCity) %>% 
      mutate(abb = abbList, 
             population = popList, 
             cas.per.capita100K = n.casualty/population*100000, 
             k.per.capita100K = n.killed/population*100000, 
             i.per.capita100K = n.injured/population*100000)

# Remove NA values and then compute the mean on the remaining values; sort by mean
meanCBC <- filter(meanCBC, !is.na(population)) %>%
      group_by(city_or_county, state, abb) %>% 
      summarize(meanCasRate = mean(cas.per.capita100K), 
                meanKRate = mean(k.per.capita100K), 
                meanIRate = mean(i.per.capita100K)) %>% 
      arrange(desc(meanCasRate))

meanCBC

## # A tibble: 736 x 6
## # Groups:   city_or_county, state [736]
##    city_or_county state       abb   meanCasRate meanKRate meanIRate
##    <chr>          <chr>       <chr>       <dbl>     <dbl>     <dbl>
##  1 Wilmington     Delaware    DE           219.      37.4     182. 
##  2 Trenton        New Jersey  NJ           157.      22.8     134. 
##  3 New Orleans    Louisiana   LA           154.      40.6     113. 
##  4 Gary           Indiana     IN           146.      49.0      97.1
##  5 Baltimore      Maryland    MD           130.      39.5      90.9
##  6 Flint          Michigan    MI           129.      39.3      89.2
##  7 Savannah       Georgia     GA           109.      24.6      84.0
##  8 Chicago        Illinois    IL           108.      18.0      90.5
##  9 Birmingham     Alabama     AL           106.      38.2      68.1
## 10 Jackson        Mississippi MS           104.      31.9      72.3
## # ... with 726 more rows

Let’s plot the top 50:

top50 <- meanCBC[1:50,]

top50 <- arrange(top50, meanCasRate)
top50 <- mutate(top50, meanIRate = -meanIRate, 
                city = paste(city_or_county, sep = ", ", abb))
top50melt2 <- melt(top50[,c("city", "meanKRate", "meanIRate", "meanCasRate")],
                   id.vars = c("city", "meanCasRate"))

ggplot(data = top50melt2) + 
      geom_col(aes(x = reorder(city, meanCasRate), y = value, 
                   fill = variable)) +
      scale_y_continuous(limits = c(-200,150), 
                         breaks = c(200, 150, 100, 50, 0, -50, -100, -150), 
                         labels = c("200", "150", "100", "50", "0", "50", "100", "150")) + 
      theme(axis.text.x = element_text(hjust = .95, size = 12)) + 
      theme(axis.text.y = element_text(vjust = 0.5, size = 10)) +
      theme(axis.title.y = element_text(size = 14)) + 
      theme(plot.title = element_text(size = 16)) + 
      theme(legend.title = element_blank()) + 
      theme(legend.position = "bottom") + 
      theme(legend.text = element_text(size = 14)) + 
      scale_fill_discrete(labels = c("Fatalities", "Injuries"), 
                          guide = guide_legend(reverse = TRUE)) +
      labs(x = NULL, y = NULL) +
      labs(title = "Mean Injury/Fatality Rate per 100,000 People by City (2014-2017 where available)") + 
      coord_flip()

According to the data, the top 5 cities with the most gun violence are:

Wilmington, DE
Trenton, NJ
New Orleans, LA
Gary, IN
Baltimore, MD

Chicago comes in at 8th, even though Illinois was the second highest per capita state.

Effect of Gun Legislation on Gun Casualty Rates

One topic that is furiously being debated across the nation is whether or not to pass more/stricter gun laws on top of the ones that already exist. What is the effect of gun laws on casualty rate? Let’s analyze this both at the state level and city level.

By State

The data about state laws were downloaded from the State Firearm Law Database. I had to reformat the files by re-saving into Excel because they weren’t actually saved as a CSV, so it wouldn’t have been easy to read it into R. You can download the reformatted CSV files from Github.

lawsdf <- read_csv("data/Laws/48_states_1991_data.csv")
lawsdf <- rbind(lawsdf, read_csv("data/Laws/Hawaii_1991_data.csv"))
lawsdf <- rbind(lawsdf, read_csv("data/Laws/Alaska_1991_data.csv"))

# Rename one of the columns to something easy to reference
lawsdf <- rename(lawsdf, law = `Law? 0/1`)
laws.by.year <- group_by(lawsdf, State, Year) %>% summarize(laws = sum(law))

# Grab the years we are interested in, 2014-2017, and take the average number of laws for the period
mean.laws.by.year <- filter(laws.by.year, Year >= 2014) %>% summarize(meanlaw = mean(laws))

For a first pass, all we are looking at is the absolute number of laws on the books in each state. For simplicity, we assume that each law is as effective as the next (although that is probably not completely true, and each law’s effectiveness most likely varies across each state/municipality and has a number of confounding variables). Let’s look at the distribution of the number of laws on the books for all 50 states:

# Histogram of number of laws
ggplot(data = mean.laws.by.year, aes(x = meanlaw)) + 
      geom_histogram(binwidth = 10, col = "black") + 
      labs(x = "Mean Number of Laws, 2014-2017", y = "Frequency")

Half the states have around 15 or fewer laws on the books, with the average number of laws being around 25.

Is there a correlation between the number of laws in each state and their mean per capita casualty rate?

meanCBS <- arrange(meanCBS, state)
mean.laws.by.year <- arrange(mean.laws.by.year, State)
meanCBS <- mutate(meanCBS, numlaws = mean.laws.by.year$meanlaw)
qplot(x = numlaws, y = meanCasRate, data = meanCBS, xlab = "Number of Laws", 
      ylab = "Mean Casualty Rate Per 100,000 People")

It doesn’t appear so. Some states have few laws on the books and have a lower casualty rate, whereas others achieve the low casualty rate with a higher than average number of laws. There are also a lot of states with a high casualty rate and few laws. It could be argued that the states with more laws would actually be more violent if the laws didn’t exist, but it is difficult to make that conclusion.

Let’s look at how each state ranks in terms of casualty rate and the number of gun laws:

g <- ggplot(data = meanCBS, aes(x = reorder(state, -meanCasRate), 
                                y = meanCasRate, fill = numlaws))
p2 <- g + geom_col(color = "gray") + 
      theme(axis.text.x  = element_text(angle = 90, vjust = 0.5, 
                                        hjust = 1, size = 14)) + 
      theme(axis.title.y = element_text(size = 16)) + 
      theme(plot.title = element_text(size = 16)) + 
      labs(x = NULL, y = "Mean Casualty Rate per 100,000 People", 
           title = "Mean Casualty Rate per 100,000 People by State (2014-2017)")

pcaswithlaw <- p2 + 
      scale_fill_gradient2(low = muted("red"), 
                           mid = "white", 
                           high = muted("blue"), 
                           midpoint = max(meanCBS$numlaws)/2, 
                           name = "Avg. Number of Gun Laws", 
                           guide = guide_colorbar(direction = "horizontal", 
                                                  title.position = "top", 
                                                  barwidth = 10, 
                                                  title.hjust = .5)) + 
      theme(legend.background = element_rect(fill="gray90", 
                                             size=.3, 
                                             color = "black", 
                                             linetype = 1)) +
      theme(legend.justification=c(1,1), legend.position=c(1,1)) +
      geom_hline(yintercept = median(meanCBS$meanCasRate), lwd = 1) +
      annotate("text", label = "Median Casualty Rate", x = 40, y = 12.5, size = 6)

pcaswithlaw

The top 7 states with the highest average number of laws from 2014-2017 are California (103), Massachusetts (100), Connecticut (86.5), New York (75), New Jersey (68.8), and Illinois (64). Of these seven, two states are above the median casualty rate for the US (Illinois and Maryland) - the other five rank a little below the median rate. It appears there may be a weak association between number of laws and state casualty rate, but there are probably other factors at play as well, such as population density. There are a number of states below and above the median casualty rate with fewer gun laws, so it is difficult to conclude that more gun laws translate to less gun violence. Instead, high rates of gun violence probably have more to do with other factors, such as gang activity, poverty, wealth inequality, etc., instead of a lack of gun laws.

By City

Let’s look at the data by city and try to figure out if there is a correlation between the number of laws in a particular state and the per capita casualty rates for cities in that state:

# First, remove DC since we don't have data on the number of gun laws there
meanCBClaws <- filter(meanCBC, state != "District of Columbia")
lawlist <- sapply(meanCBClaws$state, function(x) mean.laws.by.year[mean.laws.by.year$State == x,2])

meanCBClaws <- ungroup(meanCBClaws) %>% 
      mutate("numlaws" = round(unlist(lawlist), digits = 1), 
             meanCasRate = round(meanCasRate, digits = 1), 
             meanKRate = round(meanKRate, digits = 1), 
             meanIRate = round(meanIRate, digits = 1))

oldnames <- names(meanCBClaws) # Save the current column names for later

# Do some renaming to make things easier to understand
names(meanCBClaws) <- c("City", "state.long", "State", "Casualty Rate", "Fatality Rate", "Injury Rate", "Number of Laws")

p.allcities <- ggplot(data = meanCBClaws, aes(x = `Number of Laws`, 
                                              y = `Casualty Rate`, 
                                              color = State)) +
      geom_point(aes(text = paste(City, ", ", State, "<br>Number of Laws: ", 
                                  `Number of Laws`, "<br>Casualty Rate: ", 
                                  `Casualty Rate`, "<br>Fatality Rate: ", 
                                  `Fatality Rate`, "<br>Injury Rate: ", 
                                  `Injury Rate`, sep = ""))) + 
      labs(x = "Number of Laws", 
           y = "Casualty Rate (Per 100,000 People)") + 
      theme(legend.position = "none")

ggplotly(p.allcities, tooltip = c("text"))

There are 739 cities displayed here (there are actually a lot more cities throughout the US where gun violence occurred, but the population data were only available for cities over 50,000 people). It is clear that in a given state, there is a wide distribution in the casualty rates. You can hover over each point to see the data it represents.

Unsurprisingly, it is clear that individual states experience different levels of gun violence in different cities. Let’s look at a box plot in order to see the distribution in city casualty rates in each state:

ggplot(data = meanCBClaws, aes(x = state.long, y = `Casualty Rate`, 
                               fill = `Number of Laws`)) +
      geom_boxplot() + 
      labs(x = NULL) + 
      labs (y = "Casualty Rate (Per 100,000 People)") +
      scale_fill_gradient2(midpoint = max(meanCBClaws$`Number of Laws`)/2, name = "Avg. Number of State Gun Laws", guide = guide_colorbar(direction = "horizontal", title.position = "top", barwidth = 10, title.hjust = .5)) + 
      theme(legend.background = element_rect(fill="gray90", size=.3, 
                                             color = "black", linetype = 1)) +
      theme(legend.justification=c(1,1), legend.position=c(1,1)) +
      theme(axis.text.x  = element_text(angle = 90, vjust = 0.5, 
                                        hjust = 1, size = 14)) + 
      theme(axis.title.y = element_text(size = 16)) + 
      theme(plot.title = element_text(size = 16))

It doesn’t appear that there is any trend between the distribution of casualty rates and the number of gun laws in a particular state. Some states with many gun laws (like Maryland) have an extremely small distribution, but one significant outlier (Baltimore). Other states like Mississippi and Louisiana have a large distribution in gun casualty rates across their cities.

Now let’s take another look at the injury/fatality rates for the Top 49 cities with the highest rates of gun casualty, this time colored by number of state laws. These are the same cities as those in the Top 50 rank we did earlier, except we’ve excluded Washington, DC.

names(meanCBClaws) <- oldnames 

# Take the top 49 cities and change the injury rate to negative for easy plotting
top49 <- meanCBClaws[1:49,] %>% 
      arrange(meanCasRate) %>%
      mutate(meanIRate = -meanIRate, 
             city = paste(city_or_county, sep = ", ", abb))

top49melt <- melt(top49[,c("city", "meanCasRate", "meanKRate", "meanIRate", "numlaws")], id.vars = c("city", "meanCasRate", "numlaws"))

kiplot3 <- ggplot(data = top49melt) + 
      geom_col(aes(x = reorder(city, meanCasRate), y = value, 
                   fill = numlaws)) +
      scale_y_continuous(breaks = c(150, 100, 50, 0, -50, -100, -150, -200), labels = c("150", "100", "50", "0", "50", "100", "150", "200")) + 
      theme(axis.text.x  = element_text(hjust = .95, size = 12)) + 
      theme(axis.text.y = element_text(vjust = 0.5, size = 10)) +
      theme(axis.title.y = element_text(size = 14)) + 
      theme(plot.title = element_text(size = 16)) + 
      labs(x = NULL, y = NULL) +
      labs(title = "Mean Injury/Fatality Rate per 100,000 People by City (2014-2017 
           where available)") + 
      geom_hline(yintercept = 0, lwd = 2, color = "black") +
      coord_flip()

citycaswithlaw <- kiplot3 + 
      scale_fill_gradient2(midpoint = max(top49$numlaws)/2, 
                           name = "Avg. Number of State Gun Laws", 
                           guide = guide_colorbar(direction = "horizontal",
                                                  title.position = "top", 
                                                  barwidth = 10, 
                                                  title.hjust = .5)) + 
      theme(legend.background = element_rect(fill="gray90", 
                                             size=.3, color = "black", 
                                             linetype = 1)) +
      theme(legend.justification=c(.125,.125), legend.position=c(.125,.125)) +
      annotate("label", x = 25, y = -100, label = "Injuries", size = 6) + 
      annotate("label", x = 25, y = 45, label = "Fatalities", size = 6)

citycaswithlaw

Three of the top ten most dangerous cities (Trenton, Baltimore, and Chicago) are among the states with the most gun laws (New Jersey, Maryland, and Illinois, respectively). How can this be?

It is important to note that individual city laws are not considered here. For example, New York City has an average casualty rate of 8.5 per 100,000 people for 2014-2017. It is one of the MOST POPULOUS cities in the US, yet has a below-average rate of gun casualties. It also has some of the strictest laws. For comparison, Chicago is another densely populated US City and has an average casualty rate of 110 per 100,000 people. That is almost 13 times as dangerous as NYC! Perhaps the cause for this discrepancy is the difference in gun laws for each city. Or, maybe it’s a difference in penalties for each violation, availability of black market firearms, effectiveness of the judicial system in punishing offenders, or a mixture of all of these factors and more.

It is plausible that the laws that exist in New York City are actually effective in curbing gun violence (in order to test this, an analysis should be done determining the relationship between gun violence and laws passed over time). However, that does not necessarily mean that these measures would be equally effective in another city, although it would certainly be worth further investigation.