According to Pew Research Center (2025), in 2023, nearly 47,000 people in the United States lost their lives to gun-related injuries. The majority of these deaths (58%) were suicides, while 38% were homicides, and the remaining 3% fell into other categories. Firearms were involved in about 8 out of every 10 murders that year, and in nearly half of all suicides. Although gun-related homicides have declined in recent years, the number of gun suicides has increased significantly. The rate of gun fatalities can also widely vary from state to state - as shown on the image below.
The aim of the project is to analyze the association between state firearm laws and firearm ownership and mortality. Fixed effects panel regression model is used to assess the impact of firearm legislation on firearm deaths.
Following R packages were used to perform the analysis:
library(haven)
library(readr)
library(dplyr)
library(tidyr)
library(plm)
library(lmtest)
library(ggplot2)
Due to limitations in data availability, it is necessary to integrate multiple datasets to build the model. These include:
The State Firearm Database records the presence or absence of 134 firearm safety laws, grouped into 14 categories. It offers a structured way to compare firearm legislation across states and will be used to define the key independent variables in the model.
The CDC’s WISQARS Fatal Injury Reports provide detailed data on firearm-related deaths in the U.S. from 2001 to 2020. This dataset will serve as the model’s dependent variable.
In addition, several control variables will be incorporated:
Unemployment rates (U.S. Bureau of Labor Statistics)
Poverty estimates (U.S. Census Bureau)
Non-firearm violence mortality rates (CDC WISQARS)
Per capita alcohol consumption by beverage type (SHADAC)
firearm_laws <- read_dta("firearm_laws.dta")
mortality_intent <- read_csv("mortality_intent.csv")
## Rows: 4803 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): State, Intent, Deaths, Crude Rate, Age-Adjusted Rate, Years of Pote...
## dbl (1): Year
## num (1): Population
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mortality <- read_csv("mortality.csv")
## Rows: 1043 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (3): Year, Crude Rate, Age-Adjusted Rate
## num (3): Deaths, Population, Years of Potential Life Lost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
unemployment <- read_csv("unemployment.csv")
## Rows: 29892 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): FIPS Code, State/Area, Month
## dbl (4): Year, Percent (%) of State/Area's Population, Percent (%) of Labor ...
## num (4): Total Civilian Non-Institutional Population in State/Area, Total Ci...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
poverty <- read_csv("poverty.csv")
## New names:
## Rows: 1040 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (4): ID, Name, 90% Confidence Interval...6, 90% Confidence Interval...8 dbl
## (2): Year, Percent in Poverty num (2): Poverty Universe, Number in Poverty
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `90% Confidence Interval` -> `90% Confidence Interval...6`
## • `90% Confidence Interval` -> `90% Confidence Interval...8`
violence <- read_csv("violence.csv")
## Rows: 1043 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (3): Year, Crude Rate, Age-Adjusted Rate
## num (3): Deaths, Population, Years of Potential Life Lost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
alcohol <- read_csv("alcohol.csv", skip = 4)
## Rows: 4160 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Fips, Location, Beverage Type, Data Type
## dbl (2): TimeFrame, Data
## lgl (1): MOE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In order to be able to combine the data, several transformations needed to be performed. Let’s start with the firearm laws data.
First, the dataset is filtered to include only data from year 2000 onwards.
head(firearm_laws)
## # A tibble: 6 Ă— 137
## STATE YEAR FELONY INVCOMMITMENT INVOUTPATIENT DANGER DRUGMISDEMEANOR
## <chr> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+l> <dbl+lbl>
## 1 Alabama 1991 0 [Law p… 0 [Law provi… 0 [Law provi… 0 [Law… 0 [Law provisi…
## 2 Alaska 1991 1 [Law p… 0 [Law provi… 0 [Law provi… 0 [Law… 0 [Law provisi…
## 3 Arizona 1991 0 [Law p… 1 [Law provi… 1 [Law provi… 1 [Law… 0 [Law provisi…
## 4 Arkansas 1991 1 [Law p… 1 [Law provi… 0 [Law provi… 1 [Law… 0 [Law provisi…
## 5 California 1991 1 [Law p… 1 [Law provi… 0 [Law provi… 1 [Law… 0 [Law provisi…
## 6 Colorado 1991 0 [Law p… 0 [Law provi… 0 [Law provi… 0 [Law… 0 [Law provisi…
## # ℹ 130 more variables: ALCTREATMENT <dbl+lbl>, ALCOHOLISM <dbl+lbl>,
## # RELINQUISHMENT <dbl+lbl>, VIOLENT <dbl+lbl>, VIOLENTH <dbl+lbl>,
## # VIOLENTPARTIAL <dbl+lbl>, DEALER <dbl+lbl>, DEALERH <dbl>,
## # RECORDSALL <dbl+lbl>, RECORDSALLH <dbl+lbl>, RECORDSDEALER <dbl+lbl>,
## # RECORDSDEALERH <dbl+lbl>, REPORTALL <dbl+lbl>, REPORTALLH <dbl+lbl>,
## # REPORTDEALER <dbl+lbl>, REPORTDEALERH <dbl+lbl>, PURGE <dbl+lbl>,
## # RESIDENTIAL <dbl+lbl>, THEFT <dbl+lbl>, SECURITY <dbl+lbl>, …
firearm_laws <- firearm_laws[firearm_laws$YEAR >= 2000, ]
Next, the mortality data is analyzed and cleaned.
mortality %>%
group_by(Year) %>%
summarise(count = n()) %>%
arrange(Year) %>%
ungroup() %>%
print(n = Inf)
## # A tibble: 21 Ă— 2
## Year count
## <dbl> <int>
## 1 2001 51
## 2 2002 51
## 3 2003 51
## 4 2004 51
## 5 2005 51
## 6 2006 51
## 7 2007 51
## 8 2008 51
## 9 2009 51
## 10 2010 51
## 11 2011 51
## 12 2012 51
## 13 2013 51
## 14 2014 51
## 15 2015 51
## 16 2016 51
## 17 2017 51
## 18 2018 51
## 19 2019 51
## 20 2020 51
## 21 NA 23
As the number of states is over 51, data from the District of Columbia needs to be removed. Otherwise, it would not be possible to match it across the datasets. The last rows are also removed, as they contain multiple errors and are not needed.
unique(mortality$State)
## [1] "Alabama"
## [2] "Alaska"
## [3] "Arizona"
## [4] "Arkansas"
## [5] "California"
## [6] "Colorado"
## [7] "Connecticut"
## [8] "Delaware"
## [9] "District of Columbia"
## [10] "Florida"
## [11] "Georgia"
## [12] "Hawaii"
## [13] "Idaho"
## [14] "Illinois"
## [15] "Indiana"
## [16] "Iowa"
## [17] "Kansas"
## [18] "Kentucky"
## [19] "Louisiana"
## [20] "Maine"
## [21] "Maryland"
## [22] "Massachusetts"
## [23] "Michigan"
## [24] "Minnesota"
## [25] "Mississippi"
## [26] "Missouri"
## [27] "Montana"
## [28] "Nebraska"
## [29] "Nevada"
## [30] "New Hampshire"
## [31] "New Jersey"
## [32] "New Mexico"
## [33] "New York"
## [34] "North Carolina"
## [35] "North Dakota"
## [36] "Ohio"
## [37] "Oklahoma"
## [38] "Oregon"
## [39] "Pennsylvania"
## [40] "Rhode Island"
## [41] "South Carolina"
## [42] "South Dakota"
## [43] "Tennessee"
## [44] "Texas"
## [45] "Utah"
## [46] "Vermont"
## [47] "Virginia"
## [48] "Washington"
## [49] "West Virginia"
## [50] "Wisconsin"
## [51] "Wyoming"
## [52] "Total"
## [53] NA
## [54] "Injury Outcome: Fatal"
## [55] "Injury Type: All Injury"
## [56] "Data Years: 2001 to 2020"
## [57] "Geography: United States"
## [58] "Intent: All Intents"
## [59] "Mechanism: Firearm"
## [60] "Age: All Ages"
## [61] "Sex: All Sexes"
## [62] "Race: All Races"
## [63] "Ethnicity: All Ethnicities"
## [64] "Metro / Non-Metro Indicator: None Selected"
## [65] "YPLL Age: 65"
## [66] "Year and Race Options: 2001 - 2020 by Bridged Race"
## [67] "Notation: indicates suppressed value (based on <20 unweighted count, <1,200 weighted count, or coefficient of variation of the estimate >30%);"
## [68] "--* indicates secondary suppression."
## [69] "Rows showing totals and sub-totals will not be available when only one row in the results table is suppressed."
## [70] "Reports for All Ages include those of unknown age."
## [71] "Data sources:"
## [72] "National Center for Health Statistics-CDC annual mortality data files for WISQARS Fatal data."
## [73] "National Electronic Injury Surveillance System-All Injury Program for WISQARS Nonfatal data , an estimated number of hospital visits for injury care that start in an emergency department based on a U.S. nationally representative probability sample of hospitals."
## [74] "Produced by: National Center for Injury Prevention and Control, CDC."
mortality <- mortality[mortality$State != "District of Columbia", ]
mortality <- mortality[1:1000,]
Next, let’s move on to unemployment data. Data are filtered for the years 2001 through 2020. Aggregated unemployment and labor force counts are calculated by state and year.
unemployment <- unemployment %>%
filter(Year >= 2001 & Year <= 2020) %>%
group_by(Year, `State/Area`) %>%
summarise(
unemployed = sum(`Total Unemployment in State/Area`, na.rm = TRUE),
labour_force = sum(`Total Civilian Labor Force in State/Area`, na.rm = TRUE)
)
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
The number of entries per year is reviewed, and unique state names are listed. Non-state regions are removed to maintain consistency with other datasets.
unemployment %>%
group_by(Year) %>%
summarise(count = n()) %>%
arrange(Year) %>%
ungroup() %>%
print(n = Inf)
## # A tibble: 20 Ă— 2
## Year count
## <dbl> <int>
## 1 2001 53
## 2 2002 53
## 3 2003 53
## 4 2004 53
## 5 2005 53
## 6 2006 53
## 7 2007 53
## 8 2008 53
## 9 2009 53
## 10 2010 53
## 11 2011 53
## 12 2012 53
## 13 2013 53
## 14 2014 53
## 15 2015 53
## 16 2016 53
## 17 2017 53
## 18 2018 53
## 19 2019 53
## 20 2020 53
unique(unemployment$`State/Area`)
## [1] "Alabama" "Alaska" "Arizona"
## [4] "Arkansas" "California" "Colorado"
## [7] "Connecticut" "Delaware" "District of Columbia"
## [10] "Florida" "Georgia" "Hawaii"
## [13] "Idaho" "Illinois" "Indiana"
## [16] "Iowa" "Kansas" "Kentucky"
## [19] "Los Angeles County" "Louisiana" "Maine"
## [22] "Maryland" "Massachusetts" "Michigan"
## [25] "Minnesota" "Mississippi" "Missouri"
## [28] "Montana" "Nebraska" "Nevada"
## [31] "New Hampshire" "New Jersey" "New Mexico"
## [34] "New York" "New York city" "North Carolina"
## [37] "North Dakota" "Ohio" "Oklahoma"
## [40] "Oregon" "Pennsylvania" "Rhode Island"
## [43] "South Carolina" "South Dakota" "Tennessee"
## [46] "Texas" "Utah" "Vermont"
## [49] "Virginia" "Washington" "West Virginia"
## [52] "Wisconsin" "Wyoming"
unemployment <- unemployment[!unemployment$`State/Area` %in% c("District of Columbia", "Los Angeles County", "New York city"), ]
Next, let’s review the poverty dataset. Entries are summarized by year, and the unique region names are listed. The District of Columbia and national-level data are excluded from analysis.
poverty %>%
group_by(Year) %>%
summarise(count = n()) %>%
arrange(count) %>%
ungroup() %>%
print(n = Inf)
## # A tibble: 20 Ă— 2
## Year count
## <dbl> <int>
## 1 2001 52
## 2 2002 52
## 3 2003 52
## 4 2004 52
## 5 2005 52
## 6 2006 52
## 7 2007 52
## 8 2008 52
## 9 2009 52
## 10 2010 52
## 11 2011 52
## 12 2012 52
## 13 2013 52
## 14 2014 52
## 15 2015 52
## 16 2016 52
## 17 2017 52
## 18 2018 52
## 19 2019 52
## 20 2020 52
unique(poverty$Name)
## [1] "United States" "Alabama" "Alaska"
## [4] "Arizona" "Arkansas" "California"
## [7] "Colorado" "Connecticut" "Delaware"
## [10] "District of Columbia" "Florida" "Georgia"
## [13] "Hawaii" "Idaho" "Illinois"
## [16] "Indiana" "Iowa" "Kansas"
## [19] "Kentucky" "Louisiana" "Maine"
## [22] "Maryland" "Massachusetts" "Michigan"
## [25] "Minnesota" "Mississippi" "Missouri"
## [28] "Montana" "Nebraska" "Nevada"
## [31] "New Hampshire" "New Jersey" "New Mexico"
## [34] "New York" "North Carolina" "North Dakota"
## [37] "Ohio" "Oklahoma" "Oregon"
## [40] "Pennsylvania" "Rhode Island" "South Carolina"
## [43] "South Dakota" "Tennessee" "Texas"
## [46] "Utah" "Vermont" "Virginia"
## [49] "Washington" "West Virginia" "Wisconsin"
## [52] "Wyoming"
poverty <- poverty[!poverty$Name %in% c("District of Columbia", "United States"),]
Lastly, similar transformations are performed on the violence and alcohol datasets.
head(violence)
## # A tibble: 6 Ă— 7
## State Year Deaths Population `Crude Rate` `Age-Adjusted Rate`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alabama 2020 1456 4921532 29.6 30.2
## 2 Alabama 2019 1401 4907965 28.6 29.2
## 3 Alabama 2018 1398 4891628 28.6 28.7
## 4 Alabama 2017 1449 4877989 29.7 29.8
## 5 Alabama 2016 1345 4866824 27.6 27.7
## 6 Alabama 2015 1228 4854803 25.3 25.2
## # ℹ 1 more variable: `Years of Potential Life Lost` <dbl>
violence <- violence[violence$State != "District of Columbia", ]
violence <- violence[1:1000,]
head(alcohol)
## # A tibble: 6 Ă— 7
## Fips Location `Beverage Type` TimeFrame `Data Type` Data MOE
## <chr> <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 01 Alabama Spirits 2001 Drinks per capita 110. NA
## 2 01 Alabama Wine 2001 Drinks per capita 34.0 NA
## 3 01 Alabama Beer 2001 Drinks per capita 275. NA
## 4 01 Alabama All beverages 2001 Drinks per capita 419. NA
## 5 02 Alaska Spirits 2001 Drinks per capita 200. NA
## 6 02 Alaska Wine 2001 Drinks per capita 75.8 NA
alcohol <- alcohol %>%
filter(`Beverage Type` == "All beverages") %>%
group_by(TimeFrame, Location) %>%
summarise(
per_capita_consumption = sum(Data, na.rm = TRUE),
)
## `summarise()` has grouped output by 'TimeFrame'. You can override using the
## `.groups` argument.
alcohol <- alcohol[!alcohol$Location %in% c("Dist. of Columbia", "United States"),]
A lagged year variable (Year2) is created by adding one to the original year. This allows the effects of firearm laws to be evaluated in the year following their implementation, accounting for delayed policy impact.
firearm_laws$Year2 <- firearm_laws$YEAR+1
In the next step, the data is combined into a single dataset.
combined <- firearm_laws %>%
full_join(mortality, by = c("STATE" = "State", "Year2" = "Year")) %>%
full_join(unemployment, by = c("STATE" = "State/Area", "Year2" = "Year")) %>%
full_join(poverty, by = c("STATE" = "Name", "Year2" = "Year")) %>%
full_join(violence, by = c("STATE" = "State", "Year2" = "Year")) %>%
full_join(alcohol, by = c("STATE" = "Location", "Year2" = "TimeFrame"))
The unemployment rate is calculated as a percentage of the labor force. The non-firearm crude death rate is derived by subtracting firearm-related deaths from total deaths and normalizing per 100,000 population.
combined$unemployment_rate <- combined$unemployed / combined$labour_force * 100
combined$nonfirearm_crude_rate <- (combined$Deaths.y-combined$Deaths.x)/combined$Population.y * 100000
As a last step, only the necessary variables are selected for the final dataset.
final <- combined[, c("STATE", "Year2", "Crude Rate.x", "Percent in Poverty", "per_capita_consumption", "unemployment_rate", "nonfirearm_crude_rate", "PERMIT", "MAYISSUE", "DANGER", "STALKING")]
new_names <- c("state", "year", "firearm_mortality", "poverty", "alcohol_per_capita", "unemployment", "nonfirearm_violence", "permit", "mayissue", "danger", "stalking")
colnames(final) <- new_names
Next, let’s look at the laws that will be analyzed within the model.
Each plot visualizes a different law type:
Permit: States requiring permits to purchase or carry firearms.
May Issue: States where authorities have discretion in issuing concealed carry permits.
Danger: Laws allowing firearm removal from individuals deemed dangerous.
Stalking: Laws restricting firearm access for individuals convicted of stalking.
Permit laws stayed mostly unchanged from 1991 to 2012, with only 5 states requiring them. Around 2013–2014, the number rose slightly to 7 states and stayed at that level through 2019.
Many states gradually moved away from may-issue policies, likely adopting more permissive systems. The number dropped from 21 states in 1991 to 9 by 2014, then stayed the same.
Danger-based firearm restrictions became more common over the past two decades. The number of states rose from 19 in the early 2000s to 28 by 2015, then stayed steady.
State recognition of stalking laws grew again after a brief drop in the early 2010s.
Two models are estimated using the plm() function: a fixed effects model and a random effects model.
fixed <- plm(firearm_mortality~poverty+alcohol_per_capita+unemployment+nonfirearm_violence+permit+mayissue+danger+stalking, data=final, index=c("state", "year"), model="within")
random <- plm(firearm_mortality~poverty+alcohol_per_capita+unemployment+nonfirearm_violence+permit+mayissue+danger+stalking, data=final, index=c("state", "year"), model="random")
phtest(fixed, random)
##
## Hausman Test
##
## data: firearm_mortality ~ poverty + alcohol_per_capita + unemployment + ...
## chisq = 145.83, df = 8, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
A Hausman test is performed using phtest() to compare the fixed and random effects models. It was concluded that the fixed effects model is better compared to random effects.
pols <- plm(firearm_mortality~poverty+alcohol_per_capita+unemployment+nonfirearm_violence+permit+mayissue+danger+stalking, data=final, index=c("state", "year"), model="pooling")
pFtest(fixed, pols)
##
## F test for individual effects
##
## data: firearm_mortality ~ poverty + alcohol_per_capita + unemployment + ...
## F = 81.331, df1 = 49, df2 = 942, p-value < 2.2e-16
## alternative hypothesis: significant effects
Next, an F-test is conducted to assess the significance of individual effects. It is determined that individual (state) effects are significant, supporting the use of fixed effects over pooled OLS.
pbgtest(fixed)
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: firearm_mortality ~ poverty + alcohol_per_capita + unemployment + ...
## chisq = 320.36, df = 20, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors
bptest(fixed)
##
## studentized Breusch-Pagan test
##
## data: fixed
## BP = 41.043, df = 8, p-value = 2.047e-06
Next, diagnostic tests for the model are performed. Both autocorrelation and heteroskedasticity are detected. To address these issues, robust standard errors are used in the final fixed effects model.
final_model <- coeftest(fixed, vcov.=vcovHC(fixed, method="white1", type="HC0", cluster="group"))
Let’s interpret the results of the final model.
final_model
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## poverty -0.3148337 0.0458261 -6.8702 1.164e-11 ***
## alcohol_per_capita 0.0147738 0.0023911 6.1787 9.609e-10 ***
## unemployment 0.0155412 0.0328746 0.4727 0.6365
## nonfirearm_violence 0.5200669 0.0615244 8.4530 < 2.2e-16 ***
## permit -1.7720965 0.2896643 -6.1178 1.390e-09 ***
## mayissue -1.2566681 0.1883113 -6.6734 4.261e-11 ***
## danger 1.6534894 0.2090313 7.9102 7.175e-15 ***
## stalking -0.4402426 0.3063251 -1.4372 0.1510
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1