DACSS 697 Blog 6

Different Attempts at Text Analysis

Megan Georges
2022-04-26

Research Question

Is there a relationship between the restrictiveness of a state’s firearm regulations/laws for individuals with domestic violence related records and homicide rates perpetrated using a firearm?

Background

Since 1996, Federal law has banned the possession of firearms for anyone convicted of a felony or a domestic violence (DV) misdemeanor, or who has a DV-related protective order against them (U.S. Department of Justice, 2020). However, states vary significantly in the number of and restrictiveness of gun laws. Some states do not even explicitly ban the purchase or possession of firearms for DV perpetrators. While federal law takes priority, the explicit statement of firearm restrictions in state laws often follows by additional provisions specifying what constitutes as an intimate partner, what types of offenses are included, how firearms should be relinquished or seized, and what the consequences are for violating provisions (Giffords Law Center, 2021).

The exact impact of gun violence varies across research, but all concur that it is a prevalent issue. Further than that, years of data indicate the substantial presence of domestic violence related matters in firearm homicides, particularly for female victims. Current research finds that nearly 50% of female homicides are perpetrated by a current or former intimate partner, and over half of those are executed using a firearm (Disarm Domestic Violence, 2022). Giffords Law Center (2021) additionally finds that 4.5 million U.S. women have reported being threatened by an intimate partner using a firearm. Also, it is important to note that due to stigma and other factors, many incidents of intimate partner violence go unreported. Additionally, the National Violent Death Reporting System, which compiles state data on violent deaths - focusing on who, where, when, and how - has just in the past few years procured funding to reach all 50 states. Therefore, there is limited available data on DV-specific homicides.

Due to an increased understanding of the prevalence of intimate partner violence (IPV) and its connection to firearms, many states have implemented legal provisions to prevent perpetrators from possessing firearms. For example, many states have closed the “boyfriend loophole” that exists in the federal legislation. This means that states have expanded the definition of an intimate partner to include dating partners, even if they do not live together, share a child, or are married (which are the specifications in federal laws) (Everytown Research & Policy, 2022). Many states have also added legislation that specifies who may petition for a DV protective order and what protective order scenarios would result in firearm prohibitions for the abuser or alleged abuser. Additional provisions pertain to specifications of firearm relinquishment and removal from those with a DV-related criminal or civil imposition (Disarm Domestic Violence, 2022).

Description of Data

Center for Disease Control (CDC) - CDC WONDER

The Centers for Disease Control and Prevention (CDC) provides CDC Wonder, which is an online database that allows for the collection and analysis of public health data. I am using the Underlying Cause of Death database, which allows the selection of Homicide as the Injury Intent and the specification of Firearm as the Injury Mechanism, and the information is determined through death certificates.

I decided to use an average across 3 years to account for any single year abnormalities. The state firearm laws are from 2018. I do not have information on how long each state has had each firearm provision enacted, so I am using homicide rates averages for 2017 through 2019. I’ve also included overall rates and then by gender. Since previous research indicates that a high number of female homicides by firearm has been perpetrated by an intimate partner, I will evaluate female homicide rates in relation to state firearm legislation. Due to the lack of data on confirmed IPH in many states, overall firearm homicides is the closest to that information.

Source: https://wonder.cdc.gov/ucd-icd10.html

# Overall Homicide Rates
HomRates <- suppressMessages(read.delim("../../DACSS 697/DV Research/Homicide_Firearm_2017through2019.txt"))
HomRates <- select(HomRates, State, Deaths, Population, Crude.Rate) %>% mutate(across(everything(), ~ifelse(.=="", NA, as.character(.)))) %>% na.omit()
HomRates <- filter(HomRates, !str_detect(State, "District of Columbia"))
HomRates$Deaths <- as.numeric(HomRates$Deaths)
HomRates$Population <- as.numeric(HomRates$Population)
HomRates$Crude.Rate <- as.numeric(HomRates$Crude.Rate)
A <- function(x) x/3
HomRates$Deaths <- sapply(HomRates$Deaths, A)
HomRates$Population <- sapply(HomRates$Population, A)
HomRates <- HomRates %>%
  mutate_at(vars(Population, Deaths), funs(round(., 0)))
# Homicide Rates by Gender
HomRatesG <- read.delim("../../DACSS 697/DV Research/CDC_Gun_Hom_Gender.txt")
HomRatesG <- select(HomRatesG, State, Gender, Deaths, Crude.Rate) %>%
  rename(States = State)
HomRatesG <- filter(HomRatesG, !str_detect(States, "District of Columbia")) 
HomRatesG$Deaths <- as.numeric(HomRatesG$Deaths)
HomRatesG$Crude.Rate <- as.numeric(HomRatesG$Crude.Rate)
A <- function(x) x/3
HomRatesG$Deaths <- sapply(HomRatesG$Deaths, A)
HomRatesG <- HomRatesG %>%
  mutate_at(vars(Deaths), funs(round(., 0)))
HomRatesG <- HomRatesG %>% 
  mutate(across(everything(), ~ifelse(.=="", NA, as.character(.)))) %>%
  filter(!str_detect(States, "NA"))
HomRatesG <- HomRatesG %>%
  pivot_wider(names_from = Gender, values_from = c(Deaths, Crude.Rate))  

## Notes: 
## When Deaths_Female is NA, value is 0-9 and suppressed by CDC
## When Crude.Rate values are NA, rate is suppressed due to suppressed Death count
# Combine overall rates and gender rates to one dataframe
CDCHomRates <- cbind.data.frame(HomRates, HomRatesG)
CDCHomRates <- select(CDCHomRates, -c(States))
CDCHomRates <- CDCHomRates[, c("State", "Crude.Rate_Female", "Crude.Rate_Male", "Crude.Rate", "Deaths_Female", "Deaths_Male", "Deaths", "Population")]
CDCHomRates <- rename_(CDCHomRates, "Crude_Total" = "Crude.Rate", "Deaths_Total" = "Deaths")
# Fixing suppressed and unreliable rates
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Maine"] <- "0.7"
CDCHomRates$Crude.Rate_Male[CDCHomRates$State == "Maine"] <- "0.9"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Vermont"] <- "1.1"
CDCHomRates$Crude.Rate_Male[CDCHomRates$State == "Vermont"] <- "1.3"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Wyoming"] <- "1.5"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "New Hampshire"] <- "0.8"

CDCHomRates$Deaths_Female[CDCHomRates$State == "Hawaii"] <- "3"
CDCHomRates$Deaths_Female[CDCHomRates$State == "North Dakota"] <- "2"
CDCHomRates$Deaths_Female[CDCHomRates$State == "South Dakota"] <- "1"
CDCHomRates$Deaths_Female[CDCHomRates$State == "Rhode Island"] <- "3"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Hawaii"] <- "0.1"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "North Dakota"] <- "0.2"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "South Dakota"] <- "0.1"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Rhode Island"] <- "0.2"
# Present data with table
kable(CDCHomRates, col.names = c("State", "Female", "Male", "Total", "Female", "Male", "Total", "Population"), 
      align = c('c', 'c', 'c', 'c', 'c', 'c', 'c', 'c')) %>%
  add_header_above(c("", "Crude Rate (per 100,000)"=3, "Number of Deaths"=3, ""))%>%
  add_header_above(c("Homicide by Firearm (Averages 2017-2019)"=8)) %>%
    kable_styling(fixed_thead = TRUE)%>%
  scroll_box(width = "100%", height = "600px") %>%
  footnote(general = "The CDC marks rates as unreliable when death counts are fewer than 20, which applies here to Female Crude Rates for Hawaii, North Dakota, Rhode Island, and South Dakota")
Homicide by Firearm (Averages 2017-2019)
Crude Rate (per 100,000)
Number of Deaths
State Female Male Total Female Male Total Population
Alabama 3.3 17.1 10.0 82 404 487 4888601
Alaska 3.3 8.6 6.1 12 33 45 736259
Arizona 1.4 6.8 4.1 52 242 294 7155544
Arkansas 2.5 10.8 6.6 38 160 198 3011969
California 1 5.7 3.3 195 1124 1319 39535307
Colorado 1 5 3.0 28 142 171 5687151
Connecticut 0.5 3.2 1.8 9 55 64 3575379
Delaware 1.7 8.8 5.1 8 41 49 967625
Florida 1.6 7.8 4.6 172 807 979 21253821
Georgia 1.9 11 6.3 105 562 667 10522092
Hawaii 0.1 1.4 0.9 3 10 13 1421300
Idaho 1 1.9 1.4 9 17 25 1752739
Illinois 1.4 11.9 6.6 90 746 836 12738308
Indiana 1.8 9.1 5.4 62 302 363 6696972
Iowa 0.5 2.7 1.6 8 42 50 3152309
Kansas 1.5 6.7 4.1 22 98 120 2912647
Kentucky 1.8 7.8 4.7 40 172 212 4463421
Louisiana 3 19.9 11.2 72 453 524 4664368
Maine 0.7 0.9 0.8 5 6 11 1339508
Maryland 1.4 13.9 7.4 43 406 449 6046858
Massachusetts 0.3 2.7 1.5 11 90 101 6884824
Michigan 1.5 7.8 4.6 74 384 458 9981694
Minnesota 0.4 2.5 1.5 13 69 82 5609139
Mississippi 3.6 18.8 11.0 55 272 327 2982260
Missouri 3.1 15.1 9.0 97 455 552 6125804
Montana 1.5 2.6 2.0 8 14 22 1060525
Nebraska 0.9 2.4 1.6 9 23 31 1927917
Nevada 2.1 7.3 4.7 32 111 143 3037529
New Hampshire 0.8 1.2 1.0 6 8 14 1352988
New Jersey 0.6 4.5 2.5 26 198 224 8932118
New Mexico 2.5 10.2 6.3 26 105 132 2093442
New York 0.4 3 1.7 42 289 331 19615056
North Carolina 1.5 8.7 5.0 78 439 517 10381708
North Dakota 0.2 2.1 1.4 2 8 10 759177
Ohio 1.6 8.4 4.9 97 481 578 11679050
Oklahoma 2 9.2 5.6 41 179 220 3943638
Oregon 0.6 2.8 1.7 14 59 73 4183742
Pennsylvania 1.3 7.9 4.5 83 494 577 12804862
Rhode Island 0.2 1.8 1.1 3 9 12 1058772
South Carolina 2.4 13.2 7.6 63 325 388 5085737
South Dakota 0.1 2.7 1.4 1 12 13 878853
Tennessee 2.3 12.2 7.1 80 402 482 6771723
Texas 1.5 7.1 4.3 216 1017 1233 28667441
Utah 0.8 2.1 1.5 12 34 46 3156299
Vermont 1.1 1.3 1.2 3 4 7 624648
Virginia 1.4 6.7 4.0 62 279 340 8507741
Washington 1 3.5 2.2 36 132 168 7518742
West Virginia 2 5.8 3.9 19 52 70 1804612
Wisconsin 1.1 4.2 2.7 33 122 155 5810495
Wyoming 1.5 2.6 2.1 4 8 12 578604
Note: The CDC marks rates as unreliable when death counts are fewer than 20, which applies here to Female Crude Rates for Hawaii, North Dakota, Rhode Island, and South Dakota

National Violent Death Reporting System (NVDRS) - Web-Based Injury Statistics Query and Reporting System (WISQARS)

The National Violent Death Reporting System (NVDRS) is a product of the CDC and specifically the Web-based Injury Statistics Query and Reporting System (WISQARS). This database provides data specifically on confirmed homicides perpetrated by an intimate partner and using a firearm, as opposed to the CDC WONDER dataset above of firearm homicide rates. Confirmed intimate partner homicide (IPH) rates are not available for all states or years, but that will likely change as the CDC began providing NVDRS funding to all states in 2018. The system allows for states to combine law enforcement reports, medical examiner/coroner reports, and death certificates when reporting and surveilling public health matters like homicide.

For the states included in the NVDRS between 2017 and 2019, I will analyze crude rates in relation to state firearm laws.

Source: https://www.cdc.gov/injury/wisqars/nvdrs/

# Confirmed intimate partner homicides (not all states are included)
NVDRSdata <- suppressMessages(read_csv("../../DACSS 697/DV Research/NVDRSdf.csv"))

NVDRSdata <- NVDRSdata %>%
  select(State, Average_Deaths, Crude_Rate)%>%
  filter(!str_detect(`State`, "TOTAL"))

# Present data with table
kable(NVDRSdata, col.names = c("State", "Number of Deaths", "Crude Rate")) %>%
  kable_styling() %>%
  add_header_above(c("Confirmed Intimate Partner Homicides Using a Firearm (2017-2019 Averages)"=3)) %>%
  scroll_box(width = "100%", height = "600px") %>%
  footnote(general = "if NA, value has been suppressed by the CDC as a privacy constraint:",
  number = c("number of deaths count is between 0 and 9", 
      "crude rate unable to be calculated due to suppressed death count"))
Confirmed Intimate Partner Homicides Using a Firearm (2017-2019 Averages)
State Number of Deaths Crude Rate
Alaska 14 0.63
Arizona 84 0.39
California 117 0.21
Colorado 47 0.28
Connecticut 15 0.14
Delaware 10 0.34
Georgia 149 0.47
Illinois 68 0.21
Indiana 57 0.28
Iowa 12 0.13
Kansas 35 0.4
Kentucky 70 0.52
Maine 13 0.32
Maryland 33 0.18
Massachusetts 19 0.09
Michigan 109 0.36
Minnesota 22 0.13
Nevada 50 0.55
New Hampshire 10 0.25
New Jersey 27 0.1
New Mexico 39 0.62
North Carolina 135 0.43
Ohio 133 0.38
Oklahoma 88 0.74
Oregon 35 0.28
Pennsylvania 92 0.29
Rhode Island Suppressed Suppressed
South Carolina 88 0.58
Utah 12 0.13
Vermont Suppressed Suppressed
Virginia 98 0.38
Washington 59 0.27
West Virginia 37 0.68
Wisconsin 52 0.3
Note: if NA, value has been suppressed by the CDC as a privacy constraint: 1 number of deaths count is between 0 and 9 2 crude rate unable to be calculated due to suppressed death count

Disarm Domestic Violence

Disarm Domestic Violence is a website that compiles information on each state’s domestic violence related legislation. Focal points of the source are prohibition, specification of the victim-perpetrator relationship, protective orders, judicial authority, and the firearm removal process.

Source: https://www.disarmdv.org/#

# Read in the data using web-scraping

URL <- "https://www.disarmdv.org/state/"

State <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')

URLS <- URL

# loop through each state-specific URL 
for (i in 1:length(State)){
  URLS <- c(URLS, paste("https://www.disarmdv.org/state/", State[i], sep = ""))
}
StateURL <- paste(URLS, "/?sec=law", sep="")
StateURL <- StateURL[2:51]
disarmdv <- c()
css_selector <- ".auto-navigation-content"

for (i in 1:length(StateURL)){
  
  laws <- StateURL[i] %>% 
  read_html() %>%
  html_nodes(css = css_selector) %>%
  html_text()

  
  disarmdv <- c(disarmdv, laws)
}
# Create dataframe with state and legislation text 

stateVector <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
stateVector <- data.frame(stateVector)
disarmDF <- data.frame(disarmdv)
disarmDF <- disarmDF %>%
  add_column(State = stateVector$stateVector)
disarmDF <- disarmDF[, c("State", "disarmdv")]
disarmDF <- rename_(disarmDF, "Text" = "disarmdv")

head(disarmDF, 1)
    State
1 alabama
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Text
1         Alabama Law\n        ALABAMA DOMESTIC VIOLENCE FIREARM PROHIBITIONS\nAlabama Domestic Violence Firearm Purchase and Possession Prohibitions\nAlabama prohibits the following individuals from owning a firearm, possessing a firearm, or having a firearm in their control: \nPersons convicted of a misdemeanor offense of domestic violence; and\nPersons subject to a valid protection order for domestic abuse.1\n“Valid protection order” is defined as “an order issued after a hearing of which the person received actual notice, and at which the person had an opportunity to participate, that does any of the following:\nRestrains the person from harassing, stalking, or threatening a  qualified individual* or child of the qualified individual or person or engaging in other conduct that would place a qualified individual in reasonable fear of bodily injury to the individual or child and that includes a finding that the person represents a credible threat to the physical safety of the qualified individual or child.\nBy its terms, explicitly prohibits the use, attempted use, or threatened use of physical force against the qualified individual or child that would reasonably be expected to cause bodily injury.”2\nALABAMA CIVIL PROTECTION ORDER FIREARM REMOVAL\nDomestic Violence Civil Protection Orders That Require Firearm Removal\nAlabama law does not require the removal of firearms from persons subject to domestic violence protection orders. \nAlabama law does allow a judge issuing an ex parte protection order, an ex parte modification of a protection order, a final protection order, or a modification of a protection order issued after notice and hearing to “[o]rder other relief as it deems necessary to provide for the safety and welfare of the plaintiff or any children and any person designated by the court.”3\nIndividuals Who May Petition for a Protection Order\nThe following persons may petition for a protection order: \nA spouse (including a common law spouse);\nA former spouse (including a common law former spouse); \nA person with whom the defendant has a child in common, regardless of whether the victim or defendant have ever been married and regardless of whether they are currently residing or have in the past resided together in the same household; \nA person who has or had a dating relationship with the defendant; \nA person who is or was cohabiting with the defendant and who is in, or was engaged in, a romantic or sexual relationship with the defendant;\nA relative of a person defined in (e) who also lived with the defendant; or \nAn individual who is a parent, stepparent, child, or stepchild and who is in or has maintained a living arrangement with the defendant.4\nPenalties for Violation\nA violation of a protective order is a Class A misdemeanor.5\n    

Trying Different Text Analysis Approaches

To Begin - Representing Texts

# Create corpus and set docnames 

laws_corpus <- corpus(disarmdv)
docnames(laws_corpus) <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
summary(laws_corpus)
Corpus consisting of 50 documents, showing 50 documents:

           Text Types Tokens Sentences
        alabama   207    498         5
         alaska   186    481         8
        arizona   191    503         7
       arkansas   138    232         5
     california   495   1735        22
       colorado   263   1016         5
    connecticut   320   1281         3
       delaware   435   1533        14
        florida   276    797         8
        georgia   126    225         4
         hawaii   241    625         5
          idaho   155    332         5
       illinois   380   1378         8
        indiana   184    472         6
           iowa   235    670         6
         kansas   188    408         3
       kentucky   251    620         7
      louisiana   482   2087        22
          maine   390   1423         6
       maryland   259    991        11
  massachusetts   359   1329         7
       michigan   266    708         7
      minnesota   440   1706         4
    mississippi   185    397         3
       missouri   160    329         5
        montana   196    528         5
       nebraska   171    359         3
         nevada   254    972         5
  new-hampshire   377   1348        27
     new-jersey   270   1170        12
     new-mexico   380   1325        14
       new-york   444   1746        10
 north-carolina   416   1637        24
   north-dakota   265    643         4
           ohio   298    853         9
       oklahoma   265    782         8
         oregon   246    684         5
   pennsylvania   629   3547        65
   rhode-island   392   1509         8
 south-carolina   176    470         2
   south-dakota   219    610         6
      tennessee   261    761         3
          texas   233    660         4
           utah   254    834         4
        vermont   376   1223         9
       virginia   278    880         7
     washington   506   2380        16
  west-virginia   340   1324        11
      wisconsin   452   2125        15
        wyoming   179    396         6
# Create tokens (keeping all characters)
laws_tokens <- tokens(laws_corpus)

# Create document-feature matrix 
laws_dfm <- dfm(laws_tokens)
head(laws_dfm)
Document-feature matrix of: 6 documents, 2,253 features (90.47% sparse) and 0 docvars.
            features
docs         alabama law domestic violence firearm prohibitions
  alabama          7   5        6        5       7            2
  alaska           0   4       11       11       8            2
  arizona          0   3        7        7      10            2
  arkansas         0   2        4        4       4            2
  california       0  19       10       12      42            2
  colorado         0   9        6        6      18            2
            features
docs         purchase and possession prohibits
  alabama           1  10          1         2
  alaska            2   8          4         0
  arizona           1   4          1         2
  arkansas          1   1          1         0
  california        2  18          8         1
  colorado          1   7          9         1
[ reached max_nfeat ... 2,243 more features ]
# New DFM - preprocessing for wordcloud
laws_dfm2 <- tokens(laws_corpus, 
                      remove_punct= TRUE,
                      remove_numbers = TRUE) %>%
  tokens_tolower() %>%
  tokens_select(pattern=stopwords("en"),
                selection="remove") %>%
  dfm() 

textplot_wordcloud(laws_dfm2, max_words = 50)

# Display the 20 top features from the dfm
textstat_frequency(laws_dfm2, n=20)
      feature frequency rank docfreq group
1       order      1007    1      50   all
2     firearm       596    2      50   all
3    firearms       564    3      50   all
4    domestic       503    4      50   all
5    violence       452    5      50   all
6  protection       433    6      36   all
7       court       403    7      50   all
8      person       364    8      45   all
9         may       346    9      50   all
10 respondent       326   10      31   all
11 protective       322   11      23   all
12      shall       311   12      35   all
13        law       309   13      50   all
14    persons       302   14      50   all
15 ammunition       288   15      39   all
16      abuse       263   16      28   all
17 possession       255   17      50   all
18      party       206   18      19   all
19  defendant       194   19      17   all
20    subject       181   20      48   all
# I will create a DFM with some preprocessing but I need to keep stopwords because they provide essential context for the firearm provisions
laws_dfm3 <- tokens(laws_corpus, 
                      remove_punct= TRUE,
                      remove_numbers = TRUE) %>%
  tokens_tolower() %>%
  dfm()
head(laws_dfm3)
Document-feature matrix of: 6 documents, 2,141 features (91.04% sparse) and 0 docvars.
            features
docs         alabama law domestic violence firearm prohibitions
  alabama          7   5        6        5       7            2
  alaska           0   4       11       11       8            2
  arizona          0   3        7        7      10            2
  arkansas         0   2        4        4       4            2
  california       0  19       10       12      42            2
  colorado         0   9        6        6      18            2
            features
docs         purchase and possession prohibits
  alabama           1  10          1         2
  alaska            2   8          4         0
  arizona           1   4          1         2
  arkansas          1   1          1         0
  california        2  18          8         1
  colorado          1   7          9         1
[ reached max_nfeat ... 2,131 more features ]

Feature Co-Occurrence Matrix

# create fcm
laws_fcm <- fcm(laws_dfm3)
dim(laws_fcm)
[1] 2141 2141
# Pull top features
fcm_features <- names(topfeatures(laws_fcm, 30))

# Retain top features in fcm
laws_fcm2 <- fcm_select(laws_fcm, pattern = fcm_features, selection = "keep")

dim(laws_fcm2)
[1] 30 30
# compute size weight for vertices in network and create plot
size <- log(colSums(laws_fcm2))
textplot_network(laws_fcm2, vertex_size = size / max(size) * 3)

Word co-occurrence is going to be key to properly analyzing this dataset because the context is key. From the topfeatures table for the dfm, we know that all 50 documents (states) include the words domestic violence and order, but that does not provide insight into the kinds of provisions being applied. The data source uses iterations of the word “prohibit” to discuss whether or not each state prohibits firearm purchase and possession for DV offenders. Likewise, the source discusses removal provisions and whether each state specifies these provisions. That is why we cannot simply identify the presence of these words, so we need to develop a way to evaluate their context and therefore identify the presence and strength of each provision.

Dictionary Analysis

# First, I'll use the Lexicoder Sentiment Dictionary to evaluate the text
LSD <- dfm_lookup(laws_dfm3, dictionary = data_dictionary_LSD2015)
head(LSD)
Document-feature matrix of: 6 documents, 4 features (50.00% sparse) and 0 docvars.
            features
docs         negative positive neg_positive neg_negative
  alabama          24       26            0            0
  alaska           25       20            0            0
  arizona          30       14            0            0
  arkansas         15       14            0            0
  california       82       55            0            0
  colorado         24       40            0            0
# Find the states with the highest and lowest female homicide rates
FemaleHomRates <- CDCHomRates %>% arrange(desc(Crude.Rate_Female))
head(FemaleHomRates, 10)
            State Crude.Rate_Female Crude.Rate_Male Crude_Total
1     Mississippi               3.6            18.8        11.0
2         Alabama               3.3            17.1        10.0
3          Alaska               3.3             8.6         6.1
4        Missouri               3.1            15.1         9.0
5       Louisiana                 3            19.9        11.2
6        Arkansas               2.5            10.8         6.6
7      New Mexico               2.5            10.2         6.3
8  South Carolina               2.4            13.2         7.6
9       Tennessee               2.3            12.2         7.1
10         Nevada               2.1             7.3         4.7
   Deaths_Female Deaths_Male Deaths_Total Population
1             55         272          327    2982260
2             82         404          487    4888601
3             12          33           45     736259
4             97         455          552    6125804
5             72         453          524    4664368
6             38         160          198    3011969
7             26         105          132    2093442
8             63         325          388    5085737
9             80         402          482    6771723
10            32         111          143    3037529
tail(FemaleHomRates, 10)
           State Crude.Rate_Female Crude.Rate_Male Crude_Total
41        Oregon               0.6             2.8         1.7
42   Connecticut               0.5             3.2         1.8
43          Iowa               0.5             2.7         1.6
44     Minnesota               0.4             2.5         1.5
45      New York               0.4               3         1.7
46 Massachusetts               0.3             2.7         1.5
47  North Dakota               0.2             2.1         1.4
48  Rhode Island               0.2             1.8         1.1
49        Hawaii               0.1             1.4         0.9
50  South Dakota               0.1             2.7         1.4
   Deaths_Female Deaths_Male Deaths_Total Population
41            14          59           73    4183742
42             9          55           64    3575379
43             8          42           50    3152309
44            13          69           82    5609139
45            42         289          331   19615056
46            11          90          101    6884824
47             2           8           10     759177
48             3           9           12    1058772
49             3          10           13    1421300
50             1          12           13     878853
# Plot LSD2015 sentiments for the states with the most and least firearm homicides
LSDdataframe <- as.data.frame(LSD) %>% select(doc_id, negative, positive)
LSDdataframe <- pivot_longer(data = LSDdataframe, cols = c("negative", "positive"), names_to = "Sentiment", values_to = "Score")
LSD_HighHom <- filter(LSDdataframe, doc_id %in% c("mississippi", "alabama", "alaska", "missouri", "louisiana", "arkansas", "new-mexico", "south-carolina", "tennessee", "nevada")) 
LSD_HighHom$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High" ,"High" ,"High", "High", "High", "High", "High", "High", "High", "High", "High")
LSD_LowHom <- filter(LSDdataframe, doc_id %in% c("oregon", "connecticut", "iowa", "minnesota", "new-york", "massachusetts", "north-dakota", "rhode-island", "hawaii", "south-dakota"))
LSD_LowHom$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low" ,"Low" ,"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
LSD_LowHigh <- rbind(LSD_HighHom, LSD_LowHom)
ggplot(data = LSD_LowHigh, mapping = aes(x = doc_id, y = Score, fill = Sentiment)) + 
  geom_col(position = position_dodge()) +
  labs(title = "Sentiments for States with 10 Highest and Lowest Homicide Rates", 
       subtitle = "LSD2015 Negative vs Positive", x = "State", y = "Sentiment Score") + facet_wrap(~ Hom_Rates, scales = "free_x") +
  theme(axis.text.x = element_text(angle = 90))

Well, this is a start. There doesn’t appear to be a difference in sentiment (by LSD2015 scale) between high and low homicide rate states. This doesn’t really tell me much, since the method does nothing to identify the context of the negative/positive words being used. I’ll move on.

To try one more dictionary approach, I’ll use the loughran lexicon, which has multiple sentiment categories: negative, positive, uncertainty, litigious, constraining, and superfluous. It seems that many key words I’m interested in are included in this lexicon, specifically in the litigious and constraining categories, like variations of the word prohibit. However, a key topic - removal - is missing, so I’ll add variations of this word to the lexicon.

# load loughran lexicon
lou <- get_sentiments("loughran")
# add remov words to lexicon
custom_lou <- lou %>%
  bind_rows(tribble(~word, ~sentiment,
                    "remove", "constraining",
                    "removal", "constraining",
                    "removing", "constraining",
                    "removed", "constraining"))
custom_lou <- as.dictionary(custom_lou)
# use liwcalike() to estimate sentiment using customized loughran dictionary
laws_sentiment_lou <- liwcalike(as.character(laws_corpus), custom_lou)
laws_lou <- laws_sentiment_lou %>% select(docname, negative, positive, constraining, litigious,superfluous, uncertainty)
laws_lou
          docname negative positive constraining litigious
1         alabama     5.82     0.20         2.01      4.02
2          alaska     3.95     0.00         2.29      2.91
3         arizona     6.10     0.00         2.10      7.43
4        arkansas     5.13     0.00         3.42      4.70
5      california     3.85     0.17         1.32      4.54
6        colorado     4.13     0.20         1.57      4.53
7     connecticut     2.88     0.00         2.42      2.18
8        delaware     4.75     0.06         1.16      3.53
9         florida     8.53     0.00         2.89      5.27
10        georgia     9.33     0.00         4.00      4.89
11         hawaii     6.08     0.00         3.52      3.52
12          idaho     5.42     0.00         3.31      4.52
13       illinois     4.61     0.14         1.01      3.96
14        indiana     3.15     0.00         1.68      3.57
15           iowa     3.43     0.00         2.24      3.13
16         kansas     7.35     0.25         2.70      4.17
17       kentucky     8.23     0.16         1.77      4.35
18      louisiana     7.97     0.24         0.90      5.60
19          maine     7.88     0.14         1.11      4.15
20       maryland     5.75     0.00         1.41      4.54
21  massachusetts     4.77     0.22         0.67      3.50
22       michigan     7.75     0.14         1.97      5.92
23      minnesota     6.12     0.17         1.40      5.24
24    mississippi    10.83     0.00         2.27      3.78
25       missouri     6.95     0.00         2.72      3.02
26        montana     9.47     0.00         2.27      4.92
27       nebraska     8.59     0.00         2.77      4.71
28         nevada     7.30     0.00         1.44      3.29
29  new-hampshire     6.89     0.00         1.26      5.93
30     new-jersey     5.04     0.26         3.25      4.27
31     new-mexico     5.07     0.15         3.06      4.33
32       new-york     6.19     0.11         0.63      5.56
33 north-carolina     4.39     0.12         1.53      4.21
34   north-dakota     7.05     0.00         1.95      2.85
35           ohio     5.86     0.00         1.52      3.40
36       oklahoma     7.87     0.00         1.78      4.44
37         oregon     5.12     0.15         2.49      3.36
38   pennsylvania     6.66     0.08         1.04      4.61
39   rhode-island     7.29     0.07         1.19      3.84
40 south-carolina     5.67     0.00         3.57      2.52
41   south-dakota     8.20     0.00         1.97      4.26
42      tennessee     6.57     0.00         1.71      3.02
43          texas     6.80     0.00         1.96      2.42
44           utah     5.52     0.12         1.56      2.88
45        vermont     6.79     0.08         1.39      4.66
46       virginia     5.73     0.00         1.65      2.97
47     washington     5.95     0.08         2.43      3.31
48  west-virginia     2.76     0.07         1.20      6.02
49      wisconsin     4.05     0.14         1.32      5.18
50        wyoming     7.83     0.00         2.78      3.54
   superfluous uncertainty
1         0.00        0.40
2         0.00        1.66
3         0.00        0.95
4         0.00        1.28
5         0.12        1.21
6         0.00        0.79
7         0.00        0.62
8         0.00        1.09
9         0.00        0.88
10        0.00        1.78
11        0.00        0.80
12        0.00        1.20
13        0.22        0.94
14        0.00        1.26
15        0.00        0.30
16        0.00        0.74
17        0.00        1.29
18        0.00        0.52
19        0.00        0.76
20        0.00        0.91
21        0.00        0.89
22        0.00        1.83
23        0.06        0.82
24        0.00        1.26
25        0.00        1.21
26        0.00        0.95
27        0.00        1.11
28        0.00        1.23
29        0.07        1.56
30        0.00        1.11
31        0.07        0.89
32        0.00        1.03
33        0.00        0.67
34        0.00        0.90
35        0.12        0.82
36        0.00        1.14
37        0.00        0.58
38        0.00        0.39
39        0.00        0.53
40        0.00        0.42
41        0.00        0.98
42        0.00        0.79
43        0.00        1.21
44        0.12        1.08
45        0.00        0.65
46        0.00        1.76
47        0.04        0.67
48        0.00        0.35
49        0.00        1.04
50        0.00        0.76
# Plot sentiments "negative", "litigious", and "constraining" for top ten and lowesst ten homicide rated states
LOUdataframe <- as.data.frame(laws_lou) %>% select(docname, negative, constraining, litigious)
LOUdataframe <- pivot_longer(data = LOUdataframe, cols = c("negative", "constraining", "litigious"), names_to = "Sentiment", values_to = "Score")

LOU_HighHom <- filter(LOUdataframe, docname %in% c("mississippi", "alabama", "alaska", "missouri", "louisiana", "arkansas", "new-mexico", "south-carolina", "tennessee", "nevada")) 
LOU_HighHom$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High" ,"High" ,"High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High")
LOU_LowHom <- filter(LOUdataframe, docname %in% c("oregon", "connecticut", "iowa", "minnesota", "new-york", "massachusetts", "north-dakota", "rhode-island", "hawaii", "south-dakota"))
LOU_LowHom$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low" ,"Low" ,"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
LOU_LowHigh <- rbind(LOU_HighHom, LOU_LowHom)
ggplot(data = LOU_LowHigh, mapping = aes(x = docname, y = Score, fill = Sentiment)) + 
  geom_col(position = position_dodge()) +
  labs(title = "Sentiments for States with 10 Highest and Lowest Homicide Rates", 
       subtitle = "Loughran: Negative, Litigious, Constraining", x = "State", y = "Sentiment Score") + facet_wrap(~ Hom_Rates, scales = "free_x") +
  theme(axis.text.x = element_text(angle = 90))

Okay, same as with the LSD2015 lexicon - it doesn’t appear that there is much difference across states in the language being used in the firearm provisions. When reviewing the datasource, the verbage is consistent accross the text documents, but the context is key. I will move on to more complex text analysis.

Latent Dirichlet Allocation (LDA)

Preprocessing:

As I stated in the representing texts section, I’m hesitant to remove stopwords because of the key context they provide …but a lot of the stopwords are just noise. Therefore, I will customize “en” stopwords to keep the essential-to-context ones. I’ll employ n-grams to identify stopwords that are key to context.

Key words that I’ve identified to look at surrounding words are:

d <- tibble(disarmDF)
ngram <- d %>%
 unnest_tokens(ngram, Text, token = "ngrams", n = 7)
head(ngram)
# A tibble: 6 x 2
  State   ngram                                                       
  <chr>   <chr>                                                       
1 alabama alabama law alabama domestic violence firearm prohibitions  
2 alabama law alabama domestic violence firearm prohibitions alabama  
3 alabama alabama domestic violence firearm prohibitions alabama dome~
4 alabama domestic violence firearm prohibitions alabama domestic vio~
5 alabama violence firearm prohibitions alabama domestic violence fir~
6 alabama firearm prohibitions alabama domestic violence firearm purc~
# n-grams surrounding "prohib" wordstem
ngrams_separated <- ngram %>%
  separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")

ngrams_separated %>%
  filter(str_detect(word4, "^prohib")) %>%
  count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 275 x 9
   State          word1      word2 word3 word4 word5 word6 word7     n
   <chr>          <chr>      <chr> <chr> <chr> <chr> <chr> <chr> <int>
 1 washington     other      dang~ weap~ proh~ the   party from      4
 2 washington     under      wash~ law   proh~ the   party from      4
 3 wisconsin      person     is    not   proh~ from  poss~ a         4
 4 louisiana      is         no    long~ proh~ from  poss~ a         3
 5 south-carolina the        pers~ is    proh~ from  ship~ tran~     3
 6 wisconsin      the        pers~ is    proh~ from  poss~ a         3
 7 illinois       order      of    prot~ proh~ the   resp~ from      2
 8 kentucky       kentucky   does  not   proh~ purc~ or    poss~     2
 9 mississippi    mississip~ does  not   proh~ purc~ and   poss~     2
10 missouri       missouri   does  not   proh~ purc~ and   poss~     2
# ... with 265 more rows
# n-grams surrounding "remov" wordstem
ngrams_separated <- ngram %>%
  separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")

ngrams_separated %>%
  filter(str_detect(word4, "^remov")) %>%
  count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 180 x 9
   State         word1      word2  word3 word4 word5 word6 word7     n
   <chr>         <chr>      <chr>  <chr> <chr> <chr> <chr> <chr> <int>
 1 massachusetts firearms   and    lice~ remo~ purs~ to    a         2
 2 new-hampshire weapons    relin~ or    remo~ purs~ to    a         2
 3 alabama       not        requi~ the   remo~ of    fire~ from      1
 4 alabama       protection order  fire~ remo~ dome~ viol~ civil     1
 5 alabama       that       requi~ fire~ remo~ alab~ law   does      1
 6 alaska        not        requi~ the   remo~ of    fire~ from      1
 7 alaska        protective order  fire~ remo~ dome~ viol~ civil     1
 8 alaska        that       requi~ fire~ remo~ alas~ law   does      1
 9 alaska        this       parag~ 3     remo~ proc~ a     court     1
10 arizona       allow      for    the   remo~ of    fire~ from      1
# ... with 170 more rows
# n-grams surrounding "requir" wordstem
ngrams_separated <- ngram %>%
  separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")

ngrams_separated %>%
  filter(str_detect(word4, "^requir")) %>%
  count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 163 x 9
   State         word1      word2  word3 word4 word5 word6 word7     n
   <chr>         <chr>      <chr>  <chr> <chr> <chr> <chr> <chr> <int>
 1 washington    other      dange~ weap~ requ~ that  the   party     4
 2 nevada        domestic   viole~ may   requ~ the   adve~ party     2
 3 pennsylvania  or         ammun~ as    requ~ by    a     final     2
 4 washington    under      washi~ law   requ~ that  the   party     2
 5 west-virginia final      heari~ shall requ~ the   resp~ to        2
 6 alabama       law        does   not   requ~ the   remo~ of        1
 7 alabama       protection orders that  requ~ fire~ remo~ alab~     1
 8 alaska        law        does   not   requ~ the   remo~ of        1
 9 alaska        protective orders that  requ~ fire~ remo~ alas~     1
10 arizona       of         prote~ that  requ~ fire~ remo~ if        1
# ... with 153 more rows

After reviewing those n-grams, I’ve identified the following stopwords to be essential to the context of the text: “do; does; doesn’t; don’t; or; nor; and; not; own; is; no; are”

# Customize english stopwords
ENstopwords <- stopwords(kind = "en")
CustomStopwords <- ENstopwords[-c(39, 40, 50, 51, 88, 89, 115, 166, 112, 165, 167, 169)]

__Following example from https://slcladal.github.io/topicmodels.html__

library(tm)
Corpus <- Corpus(VectorSource(disarmdv))
processedCorpus <- tm_map(Corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, CustomStopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
head(processedCorpus, 1)
<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 1
minFrequency <- 3
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minFrequency, Inf))))
dim(DTM)
[1]  50 779
# remove empty rows from DTM
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
# Evaluating different number of topics to determine proper number
library(ldatuning)
result <- FindTopicsNumber(
  DTM,
  topics = seq(from = 2, to = 30, by = 1),
  metrics = c("CaoJuan2009",  "Deveaud2014", "Griffiths2004", "Arun2010"),
  method = "Gibbs",
  control = list(seed = 77),
  verbose = TRUE
)
fit models... done.
calculate metrics:
  CaoJuan2009... done.
  Deveaud2014... done.
  Griffiths2004... done.
  Arun2010... done.

It looks like 19 topics is maximizes/minimizes the metrics the most. I initially thought to go with 4 topics because a basic overview of the source seems to have 4 major topics to me: prohibitions, orders, removal, and penalties.

t <- 19
set.seed(91617)
# compute LDA model
topicModel <- LDA(DTM, t, method="Gibbs")
tmResult <- posterior(topicModel)
beta <- tmResult$terms
theta <- tmResult$topics
Terms <- terms(topicModel, 5)
Terms
     Topic 1   Topic 2      Topic 3      Topic 4      Topic 5   
[1,] "court"   "ammunition" "firearm"    "and"        "party"   
[2,] "officer" "and"        "possession" "firearms"   "and"     
[3,] "deadly"  "persons"    "court"      "ammunition" "firearms"
[4,] "and"     "firearm"    "control"    "shall"      "transfer"
[5,] "may"     "subject"    "subject"    "purchase"   "firearm" 
     Topic 6       Topic 7     Topic 8      Topic 9  Topic 10    
[1,] "law"         "person"    "protection" "not"    "ammunition"
[2,] "order"       "child"     "order"      "may"    "abuse"     
[3,] "enforcement" "partner"   "civil"      "order"  "defendant" 
[4,] "firearms"    "bodily"    "issuing"    "orders" "weapons"   
[5,] "shall"       "prohibits" "violation"  "final"  "and"       
     Topic 11      Topic 12     Topic 13     Topic 14     Topic 15  
[1,] "order"       "protective" "respondent" "not"        "violence"
[2,] "person"      "order"      "firearm"    "person"     "firearm" 
[3,] "use"         "violence"   "court"      "shall"      "domestic"
[4,] "persons"     "and"        "and"        "injunction" "weapon"  
[5,] "restraining" "may"        "may"        "firearms"   "order"   
     Topic 16    Topic 17     Topic 18   Topic 19   
[1,] "and"       "domestic"   "violence" "family"   
[2,] "dangerous" "abuse"      "domestic" "household"
[3,] "firearms"  "physical"   "persons"  "member"   
[4,] "weapons"   "possession" "petition" "orders"   
[5,] "license"   "not"        "does"     "injury"   
# Create and apply names to each topic
top5termsPerTopic <- terms(topicModel, 5)
topicNames <- apply(top5termsPerTopic, 2, paste, collapse=" ")
topicNames
                                   Topic 1 
            "court officer deadly and may" 
                                   Topic 2 
  "ammunition and persons firearm subject" 
                                   Topic 3 
"firearm possession court control subject" 
                                   Topic 4 
  "and firearms ammunition shall purchase" 
                                   Topic 5 
     "party and firearms transfer firearm" 
                                   Topic 6 
    "law order enforcement firearms shall" 
                                   Topic 7 
   "person child partner bodily prohibits" 
                                   Topic 8 
"protection order civil issuing violation" 
                                   Topic 9 
              "not may order orders final" 
                                  Topic 10 
  "ammunition abuse defendant weapons and" 
                                  Topic 11 
    "order person use persons restraining" 
                                  Topic 12 
       "protective order violence and may" 
                                  Topic 13 
        "respondent firearm court and may" 
                                  Topic 14 
    "not person shall injunction firearms" 
                                  Topic 15 
  "violence firearm domestic weapon order" 
                                  Topic 16 
  "and dangerous firearms weapons license" 
                                  Topic 17 
  "domestic abuse physical possession not" 
                                  Topic 18 
 "violence domestic persons petition does" 
                                  Topic 19 
   "family household member orders injury" 
# Evaluate contents of 10 states
HighHomStates <- c(1, 2, 24, 25, 18, 4, 31, 40, 42, 28)
Test1 <- lapply(processedCorpus[HighHomStates], as.character)
library(reshape2)
N <- length(Test1)
topicProportionExamples <- theta[HighHomStates,]
colnames(topicProportionExamples) <- topicNames
vizDataFrame <- melt(cbind(data.frame(topicProportionExamples), document = factor(1:N)), variable.name = "topic", id.vars = "document")  
ggplot(data = vizDataFrame, aes(topic, value, fill = document), ylab = "proportion") + 
  geom_bar(stat="identity") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +  
  coord_flip() +
  facet_wrap(~ document, ncol = N)

It doesn’t appear that one topic is dominating the states with the highest female homicide rates.

I don’t think LDA is the strongest analysis for this research, so I’ll move on…

Structural Topic Modeling (STM)

# I'm going to run STM with stopwords removed ..
# Based on running initial models, the topics were including state names often
# Preprocessing corpus: tokenizing and creating document feature matrix
DFM_laws <- tokens(laws_corpus, 
                      remove_punct= TRUE,
                   remove_separators = TRUE,
                      remove_numbers = TRUE, 
                   remove_symbols = TRUE) %>%
  tokens_tolower() %>%
  tokens_select(pattern=stopwords("en"),
                selection="remove") %>%
  tokens_remove(pattern = c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming', 'new', 'north', 'south', 'dakota', 'york', 'carolina', 'jersey', 'hampshire', 'mexico', 'rhode', 'island', 'west')) %>%
  dfm()  
head(DFM_laws)
Document-feature matrix of: 6 documents, 1,990 features (92.35% sparse) and 0 docvars.
            features
docs         law domestic violence firearm prohibitions purchase
  alabama      5        6        5       7            2        1
  alaska       4       11       11       8            2        2
  arizona      3        7        7      10            2        1
  arkansas     2        4        4       4            2        1
  california  19       10       12      42            2        2
  colorado     9        6        6      18            2        1
            features
docs         possession prohibits following individuals
  alabama             1         2         3           2
  alaska              4         0         0           1
  arizona             1         2         1           1
  arkansas            1         0         2           1
  california          8         1         4           1
  colorado            9         1         3           1
[ reached max_nfeat ... 1,980 more features ]
# Convert dfm to stm
dfm_stm <- convert(DFM_laws, to = "stm")

Choosing K - number of topics

# Choosing K - Statistical fit criterion
K <- c(4, 5, 6, 7, 8, 9, 10)
fit <- searchK(dfm_stm$documents, dfm_stm$vocab, K = K, verbose = FALSE)
# Create table with topic number info
Kplot <- data.frame("K" = K, 
                   "Coherence" = unlist(fit$results$semcoh),
                   "Exclusivity" = unlist(fit$results$exclus))
# Reshape table to long format
Kplot <- melt(Kplot, id=c("K"))
Kplot
    K    variable      value
1   4   Coherence -12.755275
2   5   Coherence  -9.205106
3   6   Coherence -14.924754
4   7   Coherence -17.113427
5   8   Coherence -14.701060
6   9   Coherence -15.432468
7  10   Coherence -18.802207
8   4 Exclusivity   7.757162
9   5 Exclusivity   7.967502
10  6 Exclusivity   8.252566
11  7 Exclusivity   8.417143
12  8 Exclusivity   8.524789
13  9 Exclusivity   8.646599
14 10 Exclusivity   8.752513
#Plot results of topic number testing
ggplot(Kplot, aes(K, value, color = variable)) +
  geom_line(size = 1.5, show.legend = FALSE) +
  facet_wrap(~variable,scales = "free_y") +
  labs(x = "Number of topics K",
       title = "Statistical fit of models with different K")

It looks like 4 and 6 topics would provide more coherent topics but not very exclusive, and 9 would also have moderate coherence and the most exclusivity of these three K’s…

Now I’ll test K values using the criterion of interpretability and relevance

# based on the statistical chart, I'll look at K = 4, 6, 9
model_4K <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 4,
         verbose = FALSE)
model_5K <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 5,
         verbose = FALSE)
model_6K <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 6,
         verbose = FALSE)
model_9K <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 9,
         verbose = FALSE)

I’ll take a look at the top features for each topic for each model

# for K = 4
topics_4 <- labelTopics(model_4K, n=10)
topics_4 <- data.frame("features" = t(topics_4$frex))
colnames(topics_4) <- paste("Topics", c(1:4))
topics_4
      Topics 1    Topics 2       Topics 3       Topics 4
1      persons     adverse       revolver         rifles
2       relief     abusing     protective       shotguns
3      spouses  restrained        permits            fid
4       victim   dangerous muzzle-loading          third
5         foid       party        divorce   relinquished
6      however   concealed         deadly    safekeeping
7  misdemeanor      pistol      specified        sheriff
8     violence     party's      aggrieved       approved
9   protection enforcement          carry          large
10          ex      agency      defendant acknowledgment
# for K = 5
topics_5 <- labelTopics(model_5K, n=10)
topics_5 <- data.frame("features" = t(topics_5$frex))
colnames(topics_5) <- paste("Topics", c(1:5))
topics_5
      Topics 1    Topics 2       Topics 3    Topics 4   Topics 5
1   protection     adverse     protective  injunction   revolver
2      persons  restrained        parents  respondent    machine
3     violence     abusing         family    approved       guns
4       spouse   dangerous muzzle-loading        form    permits
5           ex       party    transferred  prevention     permit
6        parte   concealed        partner        obey        fid
7  misdemeanor     party's         threat   possessed      carry
8     domestic enforcement        divorce     hearing    weapons
9      removal      agency    preliminary     sheriff  defendant
10    personal     control         deadly surrendered ammunition
# for K = 6
topics_6 <- labelTopics(model_6K, n=10)
topics_6 <- data.frame("features" = t(topics_6$frex))
colnames(topics_6) <- paste("Topics", c(1:6))
topics_6
     Topics 1   Topics 2        Topics 3    Topics 4    Topics 5
1     partner    adverse      protective  respondent    revolver
2      victim    abusing         divorce  injunction      pistol
3     persons restrained post-separation     hearing     permits
4          ex      party     transferred   possessed restraining
5       parte   transfer          parent  prevention   concealed
6  protection    party's         battery        obey   dangerous
7    violence     agency          family surrendered      deadly
8      dating    control     preliminary    approved   obtaining
9       child  federally      proceeding   surrender immediately
10     spouse     dealer    transporting         3rd     weapons
         Topics 6
1  muzzle-loading
2             fid
3     safekeeping
4         weapons
5           large
6       defendant
7    relinquished
8          rifles
9        shotguns
10 acknowledgment
# for K = 9
topics_9 <- labelTopics(model_9K, n=10)
topics_9 <- data.frame("features" = t(topics_9$frex))
colnames(topics_9) <- paste("Topics", c(1:9))
topics_9
     Topics 1   Topics 2        Topics 3    Topics 4     Topics 5
1          ex    adverse      protective  injunction     revolver
2       parte    abusing     transferred        form       pistol
3       final restrained     preliminary       proof    concealed
4    personal      party         divorce     hearing    dangerous
5  protection    party's post-separation        stay  restraining
6     persons   transfer           cases        step      license
7     chapter     dealer      proceeding  respondent    obtaining
8    violence    control      employment information        valid
9     issuing     agency          parent       third commissioner
10     orders  federally         battery     sheriff      weapons
         Topics 6       Topics 7   Topics 8  Topics 9
1             fid        permits       foid    victim
2  muzzle-loading         deadly prevention    living
3     safekeeping      defendant       card  intimate
4    relinquished      specified       obey    dating
5           large      purchaser aggravated     child
6         weapons      aggrieved  possessed qualified
7      relinquish identification        3rd   partner
8          rifles         weapon   existing  shipping
9        shotguns    restraining     revoke household
10 acknowledgment          judge     degree      acts

K = 9 as expected has the most unique (exclusive) topics, yet they seem pretty coherent still. I’ll go with K = 9 to proceed with analysis.

# Fit model with 9 topics
stm_model9 <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 9,
         verbose = TRUE)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    .........
     Recovering initialization...
    ...................
Initialization complete.
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -5.793) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -5.623, relative change = 2.933e-02) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -5.572, relative change = 9.056e-03) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -5.548, relative change = 4.324e-03) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -5.536, relative change = 2.154e-03) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: party, order, firearm, protection, firearms 
 Topic 3: protective, order, violence, domestic, firearm 
 Topic 4: respondent, firearms, firearm, order, court 
 Topic 5: order, pistol, firearm, person, firearms 
 Topic 6: firearms, abuse, order, defendant, ammunition 
 Topic 7: order, firearms, violence, domestic, defendant 
 Topic 8: order, protection, respondent, firearms, court 
 Topic 9: violence, domestic, persons, person, protection 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -5.530, relative change = 1.113e-03) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -5.527, relative change = 5.705e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -5.525, relative change = 3.651e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -5.524, relative change = 2.707e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -5.523, relative change = 1.382e-04) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: order, firearm, party, protection, firearms 
 Topic 3: protective, order, domestic, firearm, violence 
 Topic 4: respondent, firearms, firearm, order, court 
 Topic 5: order, firearm, pistol, person, firearms 
 Topic 6: firearms, order, abuse, defendant, ammunition 
 Topic 7: order, firearms, violence, domestic, protective 
 Topic 8: order, protection, respondent, firearms, court 
 Topic 9: domestic, violence, person, persons, protection 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -5.522, relative change = 9.098e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -5.522, relative change = 8.902e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -5.521, relative change = 7.416e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -5.521, relative change = 7.012e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -5.521, relative change = 7.445e-05) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: order, firearm, party, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, firearms, firearm, order, court 
 Topic 5: order, firearm, pistol, person, firearms 
 Topic 6: firearms, order, abuse, ammunition, defendant 
 Topic 7: order, firearms, violence, domestic, protective 
 Topic 8: order, protection, respondent, firearms, court 
 Topic 9: domestic, violence, person, protection, persons 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -5.520, relative change = 6.269e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -5.520, relative change = 5.278e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -5.520, relative change = 4.244e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -5.520, relative change = 3.997e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -5.519, relative change = 6.764e-05) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: order, firearm, party, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, firearms, firearm, order, court 
 Topic 5: order, firearm, pistol, person, firearms 
 Topic 6: firearms, order, abuse, ammunition, defendant 
 Topic 7: order, firearms, violence, domestic, protective 
 Topic 8: order, protection, respondent, firearms, court 
 Topic 9: domestic, violence, person, order, protection 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -5.519, relative change = 5.884e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -5.519, relative change = 3.112e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -5.519, relative change = 2.015e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -5.518, relative change = 1.492e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -5.518, relative change = 1.132e-05) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: order, firearm, party, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, firearms, firearm, order, court 
 Topic 5: order, firearm, pistol, person, firearms 
 Topic 6: firearms, order, abuse, ammunition, defendant 
 Topic 7: order, firearms, violence, domestic, protective 
 Topic 8: order, protection, respondent, firearms, court 
 Topic 9: domestic, violence, person, order, protection 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 
# Inspect model with plot
plot(stm_model9)

There seems to be very similar topics, so I’ll fit another model with K = 6

# Fit model with 6 topics
stm_model6 <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab, 
         K = 6,
         verbose = TRUE)
Beginning Spectral Initialization 
     Calculating the gram matrix...
     Finding anchor words...
    ......
     Recovering initialization...
    ...................
Initialization complete.
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 1 (approx. per word bound = -5.861) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 2 (approx. per word bound = -5.715, relative change = 2.486e-02) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 3 (approx. per word bound = -5.680, relative change = 6.089e-03) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 4 (approx. per word bound = -5.663, relative change = 3.072e-03) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 5 (approx. per word bound = -5.654, relative change = 1.615e-03) 
Topic 1: order, protection, violence, domestic, persons 
 Topic 2: order, party, firearm, protection, firearms 
 Topic 3: order, protective, firearm, violence, domestic 
 Topic 4: respondent, order, firearms, firearm, court 
 Topic 5: order, firearms, firearm, ammunition, person 
 Topic 6: abuse, firearms, order, ammunition, defendant 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 6 (approx. per word bound = -5.650, relative change = 7.021e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 7 (approx. per word bound = -5.647, relative change = 4.916e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 8 (approx. per word bound = -5.644, relative change = 4.847e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 9 (approx. per word bound = -5.642, relative change = 3.906e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 10 (approx. per word bound = -5.641, relative change = 1.913e-04) 
Topic 1: order, protection, domestic, violence, persons 
 Topic 2: order, party, firearm, protection, firearms 
 Topic 3: order, protective, firearm, domestic, violence 
 Topic 4: respondent, order, firearms, firearm, court 
 Topic 5: order, firearms, firearm, violence, domestic 
 Topic 6: firearms, abuse, order, ammunition, defendant 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 11 (approx. per word bound = -5.640, relative change = 1.338e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 12 (approx. per word bound = -5.640, relative change = 9.893e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 13 (approx. per word bound = -5.639, relative change = 6.967e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 14 (approx. per word bound = -5.639, relative change = 4.655e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 15 (approx. per word bound = -5.639, relative change = 4.115e-05) 
Topic 1: order, protection, domestic, violence, persons 
 Topic 2: order, party, firearm, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, order, firearms, firearm, court 
 Topic 5: order, firearms, firearm, violence, domestic 
 Topic 6: firearms, abuse, order, ammunition, defendant 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 16 (approx. per word bound = -5.639, relative change = 5.361e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 17 (approx. per word bound = -5.638, relative change = 9.159e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 18 (approx. per word bound = -5.637, relative change = 1.106e-04) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 19 (approx. per word bound = -5.637, relative change = 7.219e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 20 (approx. per word bound = -5.637, relative change = 4.172e-05) 
Topic 1: order, protection, domestic, violence, persons 
 Topic 2: order, party, firearm, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, order, firearms, firearm, court 
 Topic 5: order, firearms, firearm, violence, domestic 
 Topic 6: firearms, abuse, order, ammunition, defendant 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 21 (approx. per word bound = -5.637, relative change = 3.280e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 22 (approx. per word bound = -5.636, relative change = 4.152e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 23 (approx. per word bound = -5.636, relative change = 4.586e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 24 (approx. per word bound = -5.636, relative change = 2.632e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 25 (approx. per word bound = -5.636, relative change = 3.476e-05) 
Topic 1: order, protection, domestic, violence, persons 
 Topic 2: order, party, firearm, protection, firearms 
 Topic 3: protective, order, firearm, domestic, violence 
 Topic 4: respondent, order, firearms, firearm, court 
 Topic 5: order, firearms, firearm, violence, domestic 
 Topic 6: firearms, order, abuse, ammunition, defendant 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 26 (approx. per word bound = -5.636, relative change = 2.630e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 27 (approx. per word bound = -5.635, relative change = 2.009e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Completing Iteration 28 (approx. per word bound = -5.635, relative change = 1.463e-05) 
..................................................
Completed E-Step (0 seconds). 
Completed M-Step. 
Model Converged 
# Inspect model with plot
plot(stm_model6)

The topics don’t appear as unique unless we apply a frex weight, so I’ll do that now.

# Frex weighted top terms for K = 6
Topic_Names6 <- labelTopics(stm_model6, topics = c(1:5), n=5)$frex
topwords6 <- data.frame("features" = t(Topic_Names6))
colnames(topwords6) <- paste("Topics", c(1:6))
topwords6
  Topics 1   Topics 2        Topics 3   Topics 4    Topics 5
1  partner    adverse      protective respondent    revolver
2   victim    abusing         divorce injunction      pistol
3  persons restrained post-separation    hearing     permits
4       ex      party     transferred  possessed restraining
5    parte   transfer          parent prevention   concealed
        Topics 6
1 muzzle-loading
2            fid
3    safekeeping
4        weapons
5          large
# Frex weighted top terms for K = 9
Topic_Names9 <- labelTopics(stm_model9, topics = c(1:9), n=5)$frex
topwords9 <- data.frame("features" = t(Topic_Names9))
colnames(topwords9) <- paste("Topics", c(1:9))
topwords9
    Topics 1   Topics 2        Topics 3   Topics 4    Topics 5
1         ex    adverse      protective injunction    revolver
2      parte    abusing     transferred       form      pistol
3      final restrained     preliminary      proof   concealed
4   personal      party         divorce    hearing   dangerous
5 protection    party's post-separation       stay restraining
        Topics 6  Topics 7   Topics 8 Topics 9
1            fid   permits       foid   victim
2 muzzle-loading    deadly prevention   living
3    safekeeping defendant       card intimate
4   relinquished specified       obey   dating
5          large purchaser aggravated    child
plot(stm_model9, type = "summary")

The topic names and categories seem to make sense for K = 9 and cover a fair amount of topics for the source. I’ll proceed with K = 9

# make a document-topic-matrix
# topic association by state (gamma)
gamma <- tidy(stm_model9, matrix = "gamma") %>%
  pivot_wider(names_from = topic, values_from = gamma)
gamma$state <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
gamma <- gamma[, c('document', 'state', '1', '2', '3', '4', '5', '6', '7', '8', '9')]
gamma
# A tibble: 50 x 11
   document state          `1`     `2`     `3`     `4`     `5`     `6`
      <int> <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1        1 alabama    0.284   2.29e-4 5.67e-3 4.49e-4 6.51e-4 2.36e-3
 2        2 alaska     0.986   1.18e-3 6.22e-3 5.70e-4 1.14e-3 2.51e-4
 3        3 arizona    0.988   1.86e-3 2.59e-3 4.64e-4 1.23e-3 4.94e-4
 4        4 arkansas   0.986   1.27e-3 3.11e-3 4.36e-4 1.23e-3 4.42e-4
 5        5 california 0.0336  1.28e-1 4.96e-1 1.67e-1 2.15e-3 3.03e-4
 6        6 colorado   0.0466  9.50e-1 1.04e-3 4.99e-4 5.76e-4 4.60e-4
 7        7 connectic~ 0.00279 4.40e-4 1.46e-4 1.27e-4 9.95e-1 1.77e-4
 8        8 delaware   0.205   1.97e-3 1.41e-3 7.88e-1 6.66e-4 3.40e-4
 9        9 florida    0.00426 1.02e-5 1.14e-3 1.12e-4 4.02e-5 6.57e-4
10       10 georgia    0.754   2.26e-3 2.16e-1 1.68e-3 9.90e-4 4.19e-4
# ... with 40 more rows, and 3 more variables: `7` <dbl>, `8` <dbl>,
#   `9` <dbl>
# Plot topic association (gamma) for each state
gamma1 <- gamma %>%
  pivot_longer(c('1', '2', '3', '4', '5', '6', '7', '8', '9'), names_to = "topic", values_to = "gamma") 
gamma1 %>% 
  ggplot(aes(factor(topic), gamma)) +
  geom_boxplot() +
  facet_wrap(~ state) +
  labs(x = "topic", y = expression(gamma))

# word topic probabilities (beta)
beta <- tidy(stm_model9, matrix = "beta")
beta
# A tibble: 17,910 x 3
   topic term      beta
   <int> <chr>    <dbl>
 1     1 1-3   8.95e-17
 2     2 1-3   6.30e-80
 3     3 1-3   2.82e- 4
 4     4 1-3   5.21e-76
 5     5 1-3   8.21e-81
 6     6 1-3   1.01e-79
 7     7 1-3   2.44e-75
 8     8 1-3   3.59e-86
 9     9 1-3   6.11e-38
10     1 1-5   3.31e-24
# ... with 17,900 more rows
beta1 <- beta %>%
  group_by(topic) %>%
  slice_max(beta, n = 10) %>% 
  ungroup() %>%
  arrange(topic, -beta)

beta1 %>%
  mutate(term = reorder_within(term, beta, topic)) %>%
  ggplot(aes(beta, term, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  scale_y_reordered()

gammaHigh <- gamma1 %>%
  filter(state == 'mississippi' | state == 'alabama' | state == 'alaska' | state == 'missouri' | state == 'louisiana' | state == 'arkansas' | state == 'new-mexico' | state == 'south-carolina' | state == 'tennessee' | state == 'nevada')

gammaHigh <- as.data.frame(gammaHigh)
gammaHigh <- gammaHigh %>%
  arrange(desc(gamma))
gammaHigh <- gammaHigh %>% 
  group_by(state) %>%
  filter(gamma == max(gamma))
gammaHigh$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High")

gammaLow <- gamma1 %>%
 filter(state == 'oregon' | state == 'connecticut' | state == 'iowa' | state == 'minnesota' | state == 'new-york' | state == 'massachusetts' | state == 'north-dakota' | state == 'rhode-island' | state == 'hawaii' | state == 'south-dakota')

gammaLow <- as.data.frame(gammaLow)
gammaLow <- gammaLow %>%
  arrange(desc(gamma))
gammaLow <- gammaLow %>% 
  group_by(state) %>%
  filter(gamma == max(gamma))

gammaLow$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
gamma_LowHigh <- rbind(gammaHigh, gammaLow)
ggplot(data = gamma_LowHigh, mapping = aes(x = state, y = topic)) + 
  geom_point(position = position_dodge()) +
  labs(title = "Most Prevalent Topic for States with Highest and Lowest Homicide Rates", 
       subtitle = "by gamma", x = "State", y = "Topic #") + facet_wrap(~ Hom_Rates, scales = "free_x") +
  theme(axis.text.x = element_text(angle = 90))

Word Embeddings

WEtokens <- word_tokenizer(disarmDF$Text)

WE_dfm <- dfm(tokens(WEtokens))
topfeatures(WE_dfm)
     the       or        a       of       to    order      and 
    3071     2087     1997     1905     1257     1007      807 
      in  firearm firearms 
     611      596      564 
# Create iterator object 
WEit <- itoken(WEtokens, progressbar = FALSE)
# Build vocabulary
WEvocab <- create_vocabulary(WEit)

# Vocabulary
WEvocab
Number of docs: 50 
0 stopwords:  ... 
ngram_min = 1; ngram_max = 1 
Vocabulary: 
           term term_count doc_count
   1:  1,000.00          1         1
   2:  1,000.14          1         1
   3:    1000.4          1         1
   4:  13Within          1         1
   5: 15,000.18          1         1
  ---                               
2556:        to       1250        50
2557:         a       1723        50
2558:        of       1894        50
2559:        or       2087        50
2560:       the       2894        50
# Prune vocabulary
WEvocab <- prune_vocabulary(WEvocab, term_count_min = 3)

# Vectorize
WEvectorizer <- vocab_vectorizer(WEvocab)
# use window of 5 for context words
WEtcm <- create_tcm(WEit, WEvectorizer, skip_grams_window = 8L)
# Creating new GloVe model
glove <- GlobalVectors$new(rank = 50, x_max = 10)

# Checking GloVe methods
glove
<GloVe>
  Public:
    bias_i: NULL
    bias_j: NULL
    clone: function (deep = FALSE) 
    components: NULL
    fit_transform: function (x, n_iter = 10L, convergence_tol = -1, n_threads = getOption("rsparse_omp_threads", 
    get_history: function () 
    initialize: function (rank, x_max, learning_rate = 0.15, alpha = 0.75, lambda = 0, 
    shuffle: FALSE
  Private:
    alpha: 0.75
    b_i: NULL
    b_j: NULL
    cost_history: 
    fitted: FALSE
    glove_fitter: NULL
    initial: NULL
    lambda: 0
    learning_rate: 0.15
    rank: 50
    w_i: NULL
    w_j: NULL
    x_max: 10
# Fitting model
wv_main <- glove$fit_transform(WEtcm, n_iter = 10, convergence_tol = 0.01, n_threads = 8)
INFO  [15:06:09.259] epoch 1, loss 0.2709 
INFO  [15:06:09.321] epoch 2, loss 0.1399 
INFO  [15:06:09.358] epoch 3, loss 0.1084 
INFO  [15:06:09.395] epoch 4, loss 0.0897 
INFO  [15:06:09.432] epoch 5, loss 0.0771 
INFO  [15:06:09.464] epoch 6, loss 0.0679 
INFO  [15:06:09.496] epoch 7, loss 0.0610 
INFO  [15:06:09.530] epoch 8, loss 0.0556 
INFO  [15:06:09.556] epoch 9, loss 0.0512 
INFO  [15:06:09.583] epoch 10, loss 0.0476 
wv_context <- glove$components

word_vectors <- wv_main + t(wv_context)
head(word_vectors)
             [,1]        [,2]        [,3]       [,4]        [,5]
1000  -0.22377119 -0.37275691  0.49958320 -0.6530729  0.39787339
2,000  0.04565777 -0.16684887 -0.71996473 -0.1168168 -0.45784019
300   -0.19688222  0.06961206  0.46452678 -0.7765630 -0.01053767
33     0.12091936  0.04067427  0.13637843  0.1158650  0.01426115
34     0.66291260  0.40118000 -0.06885287 -0.7869727 -0.09392023
35    -0.25241339  0.35311719  0.41773917 -0.5406417  0.24061691
             [,6]        [,7]       [,8]      [,9]      [,10]
1000   0.24531685 -0.19752144 -0.6461049 0.1118661  0.2232746
2,000  0.10235055  0.31602438  0.1597726 0.1951701  0.3401448
300   -0.03289126 -0.39322809 -0.4364519 0.3116974  0.1699319
33     0.50277031 -0.27060444 -0.2277159 0.3356594  0.3220402
34     0.24162998 -0.08751304  0.2287783 0.2888618  0.7397811
35    -0.09295799 -0.25456761  0.1901331 0.9845538 -0.4060035
           [,11]       [,12]       [,13]      [,14]       [,15]
1000  -1.3742658 -0.39703509  0.27828311 -0.3747121  0.16217658
2,000 -0.7924532 -0.03491487 -0.25173260 -0.1681265 -0.16507693
300    0.3140177 -0.28673255 -0.01809755 -0.4637921  0.73505092
33    -0.7791282 -0.03023970 -0.80945145  0.4405687 -0.27639321
34    -0.1969642  0.06422293  0.08017974 -0.5666969  0.03131932
35     0.2670137  0.53335614 -0.45415415 -0.3046850  0.32296040
             [,16]       [,17]       [,18]       [,19]      [,20]
1000   0.388331416 -0.67369307  0.12730280 -0.26534089 0.03731712
2,000 -0.353692867 -1.03561646 -0.11242526  0.55612515 0.58042799
300   -0.101200719 -0.12015707 -0.61307723 -0.24253844 0.48086226
33    -0.629378179 -0.59543316 -0.06934646  0.02699629 0.47617516
34     0.006477723 -0.29410601 -0.57052879  0.50191344 0.05161829
35    -0.341885381  0.07932932 -0.67213810 -0.04372061 0.61321268
           [,21]       [,22]      [,23]        [,24]       [,25]
1000  -0.1118129 -0.88102581 -0.2839284  0.001948904 -0.46703445
2,000 -0.2573835 -0.10245248 -0.2228517  0.325271598 -0.14639982
300    0.4623094 -0.39478824  0.1557225  0.534141311  0.49165936
33     0.1578009 -0.02509966 -0.4889516 -0.298854355  0.03109315
34     0.6827955  0.10479717 -0.8228853  0.136242132 -0.11061420
35     0.5272659  0.09744415 -0.5689402 -0.418547251 -0.71604332
            [,26]      [,27]      [,28]      [,29]      [,30]
1000   0.02046734  0.2262467 -0.4040565 -0.5591604 -0.3699760
2,000 -0.48967084  0.4750698 -0.1750091 -0.5365679  0.6111346
300    0.17950403  0.1112683  0.2807499 -0.6761707  0.1627183
33    -0.60492915  0.4012593 -0.3031201  0.5239270  0.1824540
34     0.04349048 -0.1248052  0.2165260 -0.6430890 -0.4238369
35     0.42602615  0.1591540  0.1823090  0.5311024 -0.1827240
            [,31]      [,32]      [,33]      [,34]      [,35]
1000   0.04321407 -0.1719138 -0.1956032  0.2717458  0.3737434
2,000 -0.13370774 -0.2778356  0.3765130 -0.3957580 -0.2517429
300   -0.04606170 -0.8433769 -0.1614898  0.7812309  0.4673014
33     1.20174332 -0.1684193  0.4513720  0.3320473 -0.1699916
34     0.46127094  0.1247384  0.3376820  0.2356399  0.3722288
35     0.33492847 -0.2010675  0.5622297 -0.1451315 -0.1810949
            [,36]      [,37]        [,38]       [,39]       [,40]
1000  -0.20119463  0.1766891 -0.189788859 -0.16486417  0.15189581
2,000 -0.09056757 -0.5993558 -0.194886622 -0.16767227  0.02652697
300   -0.11076369  0.3462067  0.001362241 -0.78891009  0.44354430
33     0.88093587 -0.2982164 -0.236459663 -0.45411685  0.25507558
34     0.83989620 -0.3804438 -0.064118349 -0.55311186  0.17670306
35     0.30358876 -0.1142943  0.815151187 -0.04128551 -0.63387738
            [,41]       [,42]       [,43]      [,44]       [,45]
1000   0.09480729  0.05360328  0.19516048 -0.3772564  0.39955441
2,000  0.59459687  0.39587999 -0.17474019 -0.3592981 -0.07300888
300   -0.46506499 -0.66787370 -0.01464945  0.1803982  0.42308652
33    -0.58847918 -0.19449043  0.08108815 -0.8350119 -0.20293178
34    -1.00327445  0.13579859 -0.17790923 -0.8077632  0.51414648
35    -0.22457583 -0.03137244  0.31772997 -0.4711086 -0.23396411
            [,46]      [,47]       [,48]       [,49]       [,50]
1000  0.009215315  0.3732960  0.03578524 -0.12787899 -0.02052856
2,000 0.130955581  0.6948428 -0.46186357 -0.39869351 -0.55764150
300   0.024648553  0.2319487 -0.25746141 -0.15867303 -0.33786671
33    0.345933090 -0.1017864 -1.08094941 -0.93477908 -0.17244307
34    0.111983344  0.5193804 -0.76355126  0.03014966  0.86986805
35    0.136556231  0.1642367 -0.90344928 -0.62902094 -0.09305010

Cosine Similarity

# Word vectory for removal
removal <- word_vectors["removal", , drop = FALSE]

# Cosine similarity
removal_cos_sim <- sim2(x = word_vectors, y = removal, method = "cosine", norm = "l2")

# Top ten words relating to school
head(sort(removal_cos_sim[,1], decreasing = TRUE), 10)
   removal    require       does        not   prohibit       from 
 1.0000000  0.8387050  0.5425201  0.4881574  0.4660969  0.4199901 
   persons prohibited  prohibits   firearms 
 0.3978946  0.3973669  0.3920589  0.3865959 
# Word vectory for firearm
firearm <- word_vectors["firearm", , drop = FALSE]

# Cosine similarity
firearm_cos_sim <- sim2(x = word_vectors, y = firearm, method = "cosine", norm = "l2")

# Top ten words relating to school
head(sort(firearm_cos_sim[,1], decreasing = TRUE), 10)
   firearm        any respondent   firearms  defendant ammunition 
 1.0000000  0.7805967  0.7090351  0.6838225  0.6715562  0.6199826 
    weapon      their        the possessing 
 0.6172396  0.6127874  0.5992732  0.5903008 

n-grams

d <- tibble(txt = disarmDF$Text)
bigrams <- d %>%
  unnest_tokens(ngram, txt, token = "ngrams", n = 2)
head(bigrams)
# A tibble: 6 x 1
  ngram               
  <chr>               
1 alabama law         
2 law alabama         
3 alabama domestic    
4 domestic violence   
5 violence firearm    
6 firearm prohibitions
sentences <- d %>%
  unnest_tokens(sentence, txt, token = "sentences")
head(sentences)
# A tibble: 6 x 1
  sentence                                                            
  <chr>                                                               
1 alabama law         alabama domestic violence firearm prohibitions ~
2 by its terms, explicitly prohibits the use, attempted use, or threa~
3 2 alabama civil protection order firearm removal domestic violence ~
4 alabama law does allow a judge issuing an ex parte protection order~
5 3 individuals who may petition for a protection order the following~
6 alaska law         domestic violence firearm prohibitions alaska do~
ngram <- d %>%
 unnest_tokens(ngram, txt, token = "ngrams", n = 8)
head(ngram)
# A tibble: 6 x 1
  ngram                                                               
  <chr>                                                               
1 alabama law alabama domestic violence firearm prohibitions alabama  
2 law alabama domestic violence firearm prohibitions alabama domestic 
3 alabama domestic violence firearm prohibitions alabama domestic vio~
4 domestic violence firearm prohibitions alabama domestic violence fi~
5 violence firearm prohibitions alabama domestic violence firearm pur~
6 firearm prohibitions alabama domestic violence firearm purchase and 
CountNGram <- ngram %>%
  count(ngram, sort = TRUE)
head(CountNGram, 20)
# A tibble: 20 x 2
   ngram                                                            n
   <chr>                                                        <int>
 1 persons convicted of misdemeanor crimes of domestic violence    18
 2 a credible threat to the physical safety of                     12
 3 order the following persons may petition for a                  12
 4 ammunition by persons convicted of misdemeanor crimes of        11
 5 by persons convicted of misdemeanor crimes of domestic          11
 6 does not require removal of firearms or ammunition              10
 7 firearms or ammunition by persons convicted of misdemeanor      10
 8 individuals who may petition for a protective order             10
 9 not require removal of firearms or ammunition from              10
10 of firearms or ammunition by persons convicted of               10
11 or ammunition by persons convicted of misdemeanor crimes        10
12 possession of firearms or ammunition by persons convicted       10
13 and possession of firearms or ammunition by persons              9
14 does not prohibit purchase and possession of firearms            9
15 does not prohibit purchase or possession of firearms             9
16 individuals who may petition for a domestic violence             9
17 purchase and possession of firearms or ammunition by             9
18 represents a credible threat to the physical safety              9
19 attempted use or threatened use of physical force                8
20 may petition for a protection from abuse order                   8

References

Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2020 on CDC WONDER Online Database, released in 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html

Disarm Domestic Violence (2022) Protective Orders and Firearm Prohibitions. The National Coalition Against Domestic Violence; The Educational Fund to Stop Gun Violence; The Alliance for Gun Responsibility; Prosecutors Against Gun Violence. https://www.disarmdv.org/

Everytown Research & Policy (2022) Which States Prohibit Convicted Domestic Abusers from Having Guns? Everytown for Gun Safety Support Fund. https://everytownresearch.org/rankings/law/prohibition-for-convicted-domestic-abusers/

Giffords Law Center (2021) Domestic Violence & Firearms. Giffords Law Center to Prevent Gun Violence. https://giffords.org/lawcenter/gun-laws/

U.S. Department of Justice (2020) Criminal Resource Manual: 1117. Restrictions on the Possession of Firearms by Individuals Convicted of a Misdemeanor Crime of Domestic Violence. The United States Department of Justice Archives. https://www.justice.gov/archives/jm/criminal-resource-manual-1117-restrictions-possession-firearms-individuals-convicted

https://github.com/dondealban/learning-stm https://content-analysis-with-r.com/6-topic_models.html