Different Attempts at Text Analysis
Is there a relationship between the restrictiveness of a state’s firearm regulations/laws for individuals with domestic violence related records and homicide rates perpetrated using a firearm?
Since 1996, Federal law has banned the possession of firearms for anyone convicted of a felony or a domestic violence (DV) misdemeanor, or who has a DV-related protective order against them (U.S. Department of Justice, 2020). However, states vary significantly in the number of and restrictiveness of gun laws. Some states do not even explicitly ban the purchase or possession of firearms for DV perpetrators. While federal law takes priority, the explicit statement of firearm restrictions in state laws often follows by additional provisions specifying what constitutes as an intimate partner, what types of offenses are included, how firearms should be relinquished or seized, and what the consequences are for violating provisions (Giffords Law Center, 2021).
The exact impact of gun violence varies across research, but all concur that it is a prevalent issue. Further than that, years of data indicate the substantial presence of domestic violence related matters in firearm homicides, particularly for female victims. Current research finds that nearly 50% of female homicides are perpetrated by a current or former intimate partner, and over half of those are executed using a firearm (Disarm Domestic Violence, 2022). Giffords Law Center (2021) additionally finds that 4.5 million U.S. women have reported being threatened by an intimate partner using a firearm. Also, it is important to note that due to stigma and other factors, many incidents of intimate partner violence go unreported. Additionally, the National Violent Death Reporting System, which compiles state data on violent deaths - focusing on who, where, when, and how - has just in the past few years procured funding to reach all 50 states. Therefore, there is limited available data on DV-specific homicides.
Due to an increased understanding of the prevalence of intimate partner violence (IPV) and its connection to firearms, many states have implemented legal provisions to prevent perpetrators from possessing firearms. For example, many states have closed the “boyfriend loophole” that exists in the federal legislation. This means that states have expanded the definition of an intimate partner to include dating partners, even if they do not live together, share a child, or are married (which are the specifications in federal laws) (Everytown Research & Policy, 2022). Many states have also added legislation that specifies who may petition for a DV protective order and what protective order scenarios would result in firearm prohibitions for the abuser or alleged abuser. Additional provisions pertain to specifications of firearm relinquishment and removal from those with a DV-related criminal or civil imposition (Disarm Domestic Violence, 2022).
The Centers for Disease Control and Prevention (CDC) provides CDC Wonder, which is an online database that allows for the collection and analysis of public health data. I am using the Underlying Cause of Death database, which allows the selection of Homicide as the Injury Intent and the specification of Firearm as the Injury Mechanism, and the information is determined through death certificates.
I decided to use an average across 3 years to account for any single year abnormalities. The state firearm laws are from 2018. I do not have information on how long each state has had each firearm provision enacted, so I am using homicide rates averages for 2017 through 2019. I’ve also included overall rates and then by gender. Since previous research indicates that a high number of female homicides by firearm has been perpetrated by an intimate partner, I will evaluate female homicide rates in relation to state firearm legislation. Due to the lack of data on confirmed IPH in many states, overall firearm homicides is the closest to that information.
Source: https://wonder.cdc.gov/ucd-icd10.html
# Overall Homicide Rates
HomRates <- suppressMessages(read.delim("../../DACSS 697/DV Research/Homicide_Firearm_2017through2019.txt"))
HomRates <- select(HomRates, State, Deaths, Population, Crude.Rate) %>% mutate(across(everything(), ~ifelse(.=="", NA, as.character(.)))) %>% na.omit()
HomRates <- filter(HomRates, !str_detect(State, "District of Columbia"))
HomRates$Deaths <- as.numeric(HomRates$Deaths)
HomRates$Population <- as.numeric(HomRates$Population)
HomRates$Crude.Rate <- as.numeric(HomRates$Crude.Rate)
A <- function(x) x/3
HomRates$Deaths <- sapply(HomRates$Deaths, A)
HomRates$Population <- sapply(HomRates$Population, A)
HomRates <- HomRates %>%
mutate_at(vars(Population, Deaths), funs(round(., 0)))
# Homicide Rates by Gender
HomRatesG <- read.delim("../../DACSS 697/DV Research/CDC_Gun_Hom_Gender.txt")
HomRatesG <- select(HomRatesG, State, Gender, Deaths, Crude.Rate) %>%
rename(States = State)
HomRatesG <- filter(HomRatesG, !str_detect(States, "District of Columbia"))
HomRatesG$Deaths <- as.numeric(HomRatesG$Deaths)
HomRatesG$Crude.Rate <- as.numeric(HomRatesG$Crude.Rate)
A <- function(x) x/3
HomRatesG$Deaths <- sapply(HomRatesG$Deaths, A)
HomRatesG <- HomRatesG %>%
mutate_at(vars(Deaths), funs(round(., 0)))
HomRatesG <- HomRatesG %>%
mutate(across(everything(), ~ifelse(.=="", NA, as.character(.)))) %>%
filter(!str_detect(States, "NA"))
HomRatesG <- HomRatesG %>%
pivot_wider(names_from = Gender, values_from = c(Deaths, Crude.Rate))
## Notes:
## When Deaths_Female is NA, value is 0-9 and suppressed by CDC
## When Crude.Rate values are NA, rate is suppressed due to suppressed Death count
# Combine overall rates and gender rates to one dataframe
CDCHomRates <- cbind.data.frame(HomRates, HomRatesG)
CDCHomRates <- select(CDCHomRates, -c(States))
CDCHomRates <- CDCHomRates[, c("State", "Crude.Rate_Female", "Crude.Rate_Male", "Crude.Rate", "Deaths_Female", "Deaths_Male", "Deaths", "Population")]
CDCHomRates <- rename_(CDCHomRates, "Crude_Total" = "Crude.Rate", "Deaths_Total" = "Deaths")
# Fixing suppressed and unreliable rates
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Maine"] <- "0.7"
CDCHomRates$Crude.Rate_Male[CDCHomRates$State == "Maine"] <- "0.9"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Vermont"] <- "1.1"
CDCHomRates$Crude.Rate_Male[CDCHomRates$State == "Vermont"] <- "1.3"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Wyoming"] <- "1.5"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "New Hampshire"] <- "0.8"
CDCHomRates$Deaths_Female[CDCHomRates$State == "Hawaii"] <- "3"
CDCHomRates$Deaths_Female[CDCHomRates$State == "North Dakota"] <- "2"
CDCHomRates$Deaths_Female[CDCHomRates$State == "South Dakota"] <- "1"
CDCHomRates$Deaths_Female[CDCHomRates$State == "Rhode Island"] <- "3"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Hawaii"] <- "0.1"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "North Dakota"] <- "0.2"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "South Dakota"] <- "0.1"
CDCHomRates$Crude.Rate_Female[CDCHomRates$State == "Rhode Island"] <- "0.2"
# Present data with table
kable(CDCHomRates, col.names = c("State", "Female", "Male", "Total", "Female", "Male", "Total", "Population"),
align = c('c', 'c', 'c', 'c', 'c', 'c', 'c', 'c')) %>%
add_header_above(c("", "Crude Rate (per 100,000)"=3, "Number of Deaths"=3, ""))%>%
add_header_above(c("Homicide by Firearm (Averages 2017-2019)"=8)) %>%
kable_styling(fixed_thead = TRUE)%>%
scroll_box(width = "100%", height = "600px") %>%
footnote(general = "The CDC marks rates as unreliable when death counts are fewer than 20, which applies here to Female Crude Rates for Hawaii, North Dakota, Rhode Island, and South Dakota")
| State | Female | Male | Total | Female | Male | Total | Population |
|---|---|---|---|---|---|---|---|
| Alabama | 3.3 | 17.1 | 10.0 | 82 | 404 | 487 | 4888601 |
| Alaska | 3.3 | 8.6 | 6.1 | 12 | 33 | 45 | 736259 |
| Arizona | 1.4 | 6.8 | 4.1 | 52 | 242 | 294 | 7155544 |
| Arkansas | 2.5 | 10.8 | 6.6 | 38 | 160 | 198 | 3011969 |
| California | 1 | 5.7 | 3.3 | 195 | 1124 | 1319 | 39535307 |
| Colorado | 1 | 5 | 3.0 | 28 | 142 | 171 | 5687151 |
| Connecticut | 0.5 | 3.2 | 1.8 | 9 | 55 | 64 | 3575379 |
| Delaware | 1.7 | 8.8 | 5.1 | 8 | 41 | 49 | 967625 |
| Florida | 1.6 | 7.8 | 4.6 | 172 | 807 | 979 | 21253821 |
| Georgia | 1.9 | 11 | 6.3 | 105 | 562 | 667 | 10522092 |
| Hawaii | 0.1 | 1.4 | 0.9 | 3 | 10 | 13 | 1421300 |
| Idaho | 1 | 1.9 | 1.4 | 9 | 17 | 25 | 1752739 |
| Illinois | 1.4 | 11.9 | 6.6 | 90 | 746 | 836 | 12738308 |
| Indiana | 1.8 | 9.1 | 5.4 | 62 | 302 | 363 | 6696972 |
| Iowa | 0.5 | 2.7 | 1.6 | 8 | 42 | 50 | 3152309 |
| Kansas | 1.5 | 6.7 | 4.1 | 22 | 98 | 120 | 2912647 |
| Kentucky | 1.8 | 7.8 | 4.7 | 40 | 172 | 212 | 4463421 |
| Louisiana | 3 | 19.9 | 11.2 | 72 | 453 | 524 | 4664368 |
| Maine | 0.7 | 0.9 | 0.8 | 5 | 6 | 11 | 1339508 |
| Maryland | 1.4 | 13.9 | 7.4 | 43 | 406 | 449 | 6046858 |
| Massachusetts | 0.3 | 2.7 | 1.5 | 11 | 90 | 101 | 6884824 |
| Michigan | 1.5 | 7.8 | 4.6 | 74 | 384 | 458 | 9981694 |
| Minnesota | 0.4 | 2.5 | 1.5 | 13 | 69 | 82 | 5609139 |
| Mississippi | 3.6 | 18.8 | 11.0 | 55 | 272 | 327 | 2982260 |
| Missouri | 3.1 | 15.1 | 9.0 | 97 | 455 | 552 | 6125804 |
| Montana | 1.5 | 2.6 | 2.0 | 8 | 14 | 22 | 1060525 |
| Nebraska | 0.9 | 2.4 | 1.6 | 9 | 23 | 31 | 1927917 |
| Nevada | 2.1 | 7.3 | 4.7 | 32 | 111 | 143 | 3037529 |
| New Hampshire | 0.8 | 1.2 | 1.0 | 6 | 8 | 14 | 1352988 |
| New Jersey | 0.6 | 4.5 | 2.5 | 26 | 198 | 224 | 8932118 |
| New Mexico | 2.5 | 10.2 | 6.3 | 26 | 105 | 132 | 2093442 |
| New York | 0.4 | 3 | 1.7 | 42 | 289 | 331 | 19615056 |
| North Carolina | 1.5 | 8.7 | 5.0 | 78 | 439 | 517 | 10381708 |
| North Dakota | 0.2 | 2.1 | 1.4 | 2 | 8 | 10 | 759177 |
| Ohio | 1.6 | 8.4 | 4.9 | 97 | 481 | 578 | 11679050 |
| Oklahoma | 2 | 9.2 | 5.6 | 41 | 179 | 220 | 3943638 |
| Oregon | 0.6 | 2.8 | 1.7 | 14 | 59 | 73 | 4183742 |
| Pennsylvania | 1.3 | 7.9 | 4.5 | 83 | 494 | 577 | 12804862 |
| Rhode Island | 0.2 | 1.8 | 1.1 | 3 | 9 | 12 | 1058772 |
| South Carolina | 2.4 | 13.2 | 7.6 | 63 | 325 | 388 | 5085737 |
| South Dakota | 0.1 | 2.7 | 1.4 | 1 | 12 | 13 | 878853 |
| Tennessee | 2.3 | 12.2 | 7.1 | 80 | 402 | 482 | 6771723 |
| Texas | 1.5 | 7.1 | 4.3 | 216 | 1017 | 1233 | 28667441 |
| Utah | 0.8 | 2.1 | 1.5 | 12 | 34 | 46 | 3156299 |
| Vermont | 1.1 | 1.3 | 1.2 | 3 | 4 | 7 | 624648 |
| Virginia | 1.4 | 6.7 | 4.0 | 62 | 279 | 340 | 8507741 |
| Washington | 1 | 3.5 | 2.2 | 36 | 132 | 168 | 7518742 |
| West Virginia | 2 | 5.8 | 3.9 | 19 | 52 | 70 | 1804612 |
| Wisconsin | 1.1 | 4.2 | 2.7 | 33 | 122 | 155 | 5810495 |
| Wyoming | 1.5 | 2.6 | 2.1 | 4 | 8 | 12 | 578604 |
The National Violent Death Reporting System (NVDRS) is a product of the CDC and specifically the Web-based Injury Statistics Query and Reporting System (WISQARS). This database provides data specifically on confirmed homicides perpetrated by an intimate partner and using a firearm, as opposed to the CDC WONDER dataset above of firearm homicide rates. Confirmed intimate partner homicide (IPH) rates are not available for all states or years, but that will likely change as the CDC began providing NVDRS funding to all states in 2018. The system allows for states to combine law enforcement reports, medical examiner/coroner reports, and death certificates when reporting and surveilling public health matters like homicide.
For the states included in the NVDRS between 2017 and 2019, I will analyze crude rates in relation to state firearm laws.
Source: https://www.cdc.gov/injury/wisqars/nvdrs/
# Confirmed intimate partner homicides (not all states are included)
NVDRSdata <- suppressMessages(read_csv("../../DACSS 697/DV Research/NVDRSdf.csv"))
NVDRSdata <- NVDRSdata %>%
select(State, Average_Deaths, Crude_Rate)%>%
filter(!str_detect(`State`, "TOTAL"))
# Present data with table
kable(NVDRSdata, col.names = c("State", "Number of Deaths", "Crude Rate")) %>%
kable_styling() %>%
add_header_above(c("Confirmed Intimate Partner Homicides Using a Firearm (2017-2019 Averages)"=3)) %>%
scroll_box(width = "100%", height = "600px") %>%
footnote(general = "if NA, value has been suppressed by the CDC as a privacy constraint:",
number = c("number of deaths count is between 0 and 9",
"crude rate unable to be calculated due to suppressed death count"))
| State | Number of Deaths | Crude Rate |
|---|---|---|
| Alaska | 14 | 0.63 |
| Arizona | 84 | 0.39 |
| California | 117 | 0.21 |
| Colorado | 47 | 0.28 |
| Connecticut | 15 | 0.14 |
| Delaware | 10 | 0.34 |
| Georgia | 149 | 0.47 |
| Illinois | 68 | 0.21 |
| Indiana | 57 | 0.28 |
| Iowa | 12 | 0.13 |
| Kansas | 35 | 0.4 |
| Kentucky | 70 | 0.52 |
| Maine | 13 | 0.32 |
| Maryland | 33 | 0.18 |
| Massachusetts | 19 | 0.09 |
| Michigan | 109 | 0.36 |
| Minnesota | 22 | 0.13 |
| Nevada | 50 | 0.55 |
| New Hampshire | 10 | 0.25 |
| New Jersey | 27 | 0.1 |
| New Mexico | 39 | 0.62 |
| North Carolina | 135 | 0.43 |
| Ohio | 133 | 0.38 |
| Oklahoma | 88 | 0.74 |
| Oregon | 35 | 0.28 |
| Pennsylvania | 92 | 0.29 |
| Rhode Island | Suppressed | Suppressed |
| South Carolina | 88 | 0.58 |
| Utah | 12 | 0.13 |
| Vermont | Suppressed | Suppressed |
| Virginia | 98 | 0.38 |
| Washington | 59 | 0.27 |
| West Virginia | 37 | 0.68 |
| Wisconsin | 52 | 0.3 |
Disarm Domestic Violence is a website that compiles information on each state’s domestic violence related legislation. Focal points of the source are prohibition, specification of the victim-perpetrator relationship, protective orders, judicial authority, and the firearm removal process.
Source: https://www.disarmdv.org/#
# Read in the data using web-scraping
URL <- "https://www.disarmdv.org/state/"
State <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
URLS <- URL
# loop through each state-specific URL
for (i in 1:length(State)){
URLS <- c(URLS, paste("https://www.disarmdv.org/state/", State[i], sep = ""))
}
StateURL <- paste(URLS, "/?sec=law", sep="")
StateURL <- StateURL[2:51]
disarmdv <- c()
css_selector <- ".auto-navigation-content"
for (i in 1:length(StateURL)){
laws <- StateURL[i] %>%
read_html() %>%
html_nodes(css = css_selector) %>%
html_text()
disarmdv <- c(disarmdv, laws)
}
# Create dataframe with state and legislation text
stateVector <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
stateVector <- data.frame(stateVector)
disarmDF <- data.frame(disarmdv)
disarmDF <- disarmDF %>%
add_column(State = stateVector$stateVector)
disarmDF <- disarmDF[, c("State", "disarmdv")]
disarmDF <- rename_(disarmDF, "Text" = "disarmdv")
head(disarmDF, 1)
State
1 alabama
Text
1 Alabama Law\n ALABAMA DOMESTIC VIOLENCE FIREARM PROHIBITIONS\nAlabama Domestic Violence Firearm Purchase and Possession Prohibitions\nAlabama prohibits the following individuals from owning a firearm, possessing a firearm, or having a firearm in their control: \nPersons convicted of a misdemeanor offense of domestic violence; and\nPersons subject to a valid protection order for domestic abuse.1\n“Valid protection order” is defined as “an order issued after a hearing of which the person received actual notice, and at which the person had an opportunity to participate, that does any of the following:\nRestrains the person from harassing, stalking, or threatening a qualified individual* or child of the qualified individual or person or engaging in other conduct that would place a qualified individual in reasonable fear of bodily injury to the individual or child and that includes a finding that the person represents a credible threat to the physical safety of the qualified individual or child.\nBy its terms, explicitly prohibits the use, attempted use, or threatened use of physical force against the qualified individual or child that would reasonably be expected to cause bodily injury.”2\nALABAMA CIVIL PROTECTION ORDER FIREARM REMOVAL\nDomestic Violence Civil Protection Orders That Require Firearm Removal\nAlabama law does not require the removal of firearms from persons subject to domestic violence protection orders. \nAlabama law does allow a judge issuing an ex parte protection order, an ex parte modification of a protection order, a final protection order, or a modification of a protection order issued after notice and hearing to “[o]rder other relief as it deems necessary to provide for the safety and welfare of the plaintiff or any children and any person designated by the court.”3\nIndividuals Who May Petition for a Protection Order\nThe following persons may petition for a protection order: \nA spouse (including a common law spouse);\nA former spouse (including a common law former spouse); \nA person with whom the defendant has a child in common, regardless of whether the victim or defendant have ever been married and regardless of whether they are currently residing or have in the past resided together in the same household; \nA person who has or had a dating relationship with the defendant; \nA person who is or was cohabiting with the defendant and who is in, or was engaged in, a romantic or sexual relationship with the defendant;\nA relative of a person defined in (e) who also lived with the defendant; or \nAn individual who is a parent, stepparent, child, or stepchild and who is in or has maintained a living arrangement with the defendant.4\nPenalties for Violation\nA violation of a protective order is a Class A misdemeanor.5\n
# Create corpus and set docnames
laws_corpus <- corpus(disarmdv)
docnames(laws_corpus) <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
summary(laws_corpus)
Corpus consisting of 50 documents, showing 50 documents:
Text Types Tokens Sentences
alabama 207 498 5
alaska 186 481 8
arizona 191 503 7
arkansas 138 232 5
california 495 1735 22
colorado 263 1016 5
connecticut 320 1281 3
delaware 435 1533 14
florida 276 797 8
georgia 126 225 4
hawaii 241 625 5
idaho 155 332 5
illinois 380 1378 8
indiana 184 472 6
iowa 235 670 6
kansas 188 408 3
kentucky 251 620 7
louisiana 482 2087 22
maine 390 1423 6
maryland 259 991 11
massachusetts 359 1329 7
michigan 266 708 7
minnesota 440 1706 4
mississippi 185 397 3
missouri 160 329 5
montana 196 528 5
nebraska 171 359 3
nevada 254 972 5
new-hampshire 377 1348 27
new-jersey 270 1170 12
new-mexico 380 1325 14
new-york 444 1746 10
north-carolina 416 1637 24
north-dakota 265 643 4
ohio 298 853 9
oklahoma 265 782 8
oregon 246 684 5
pennsylvania 629 3547 65
rhode-island 392 1509 8
south-carolina 176 470 2
south-dakota 219 610 6
tennessee 261 761 3
texas 233 660 4
utah 254 834 4
vermont 376 1223 9
virginia 278 880 7
washington 506 2380 16
west-virginia 340 1324 11
wisconsin 452 2125 15
wyoming 179 396 6
# Create tokens (keeping all characters)
laws_tokens <- tokens(laws_corpus)
# Create document-feature matrix
laws_dfm <- dfm(laws_tokens)
head(laws_dfm)
Document-feature matrix of: 6 documents, 2,253 features (90.47% sparse) and 0 docvars.
features
docs alabama law domestic violence firearm prohibitions
alabama 7 5 6 5 7 2
alaska 0 4 11 11 8 2
arizona 0 3 7 7 10 2
arkansas 0 2 4 4 4 2
california 0 19 10 12 42 2
colorado 0 9 6 6 18 2
features
docs purchase and possession prohibits
alabama 1 10 1 2
alaska 2 8 4 0
arizona 1 4 1 2
arkansas 1 1 1 0
california 2 18 8 1
colorado 1 7 9 1
[ reached max_nfeat ... 2,243 more features ]
# New DFM - preprocessing for wordcloud
laws_dfm2 <- tokens(laws_corpus,
remove_punct= TRUE,
remove_numbers = TRUE) %>%
tokens_tolower() %>%
tokens_select(pattern=stopwords("en"),
selection="remove") %>%
dfm()
textplot_wordcloud(laws_dfm2, max_words = 50)
# Display the 20 top features from the dfm
textstat_frequency(laws_dfm2, n=20)
feature frequency rank docfreq group
1 order 1007 1 50 all
2 firearm 596 2 50 all
3 firearms 564 3 50 all
4 domestic 503 4 50 all
5 violence 452 5 50 all
6 protection 433 6 36 all
7 court 403 7 50 all
8 person 364 8 45 all
9 may 346 9 50 all
10 respondent 326 10 31 all
11 protective 322 11 23 all
12 shall 311 12 35 all
13 law 309 13 50 all
14 persons 302 14 50 all
15 ammunition 288 15 39 all
16 abuse 263 16 28 all
17 possession 255 17 50 all
18 party 206 18 19 all
19 defendant 194 19 17 all
20 subject 181 20 48 all
# I will create a DFM with some preprocessing but I need to keep stopwords because they provide essential context for the firearm provisions
laws_dfm3 <- tokens(laws_corpus,
remove_punct= TRUE,
remove_numbers = TRUE) %>%
tokens_tolower() %>%
dfm()
head(laws_dfm3)
Document-feature matrix of: 6 documents, 2,141 features (91.04% sparse) and 0 docvars.
features
docs alabama law domestic violence firearm prohibitions
alabama 7 5 6 5 7 2
alaska 0 4 11 11 8 2
arizona 0 3 7 7 10 2
arkansas 0 2 4 4 4 2
california 0 19 10 12 42 2
colorado 0 9 6 6 18 2
features
docs purchase and possession prohibits
alabama 1 10 1 2
alaska 2 8 4 0
arizona 1 4 1 2
arkansas 1 1 1 0
california 2 18 8 1
colorado 1 7 9 1
[ reached max_nfeat ... 2,131 more features ]
# Pull top features
fcm_features <- names(topfeatures(laws_fcm, 30))
# Retain top features in fcm
laws_fcm2 <- fcm_select(laws_fcm, pattern = fcm_features, selection = "keep")
dim(laws_fcm2)
[1] 30 30
# compute size weight for vertices in network and create plot
size <- log(colSums(laws_fcm2))
textplot_network(laws_fcm2, vertex_size = size / max(size) * 3)
Word co-occurrence is going to be key to properly analyzing this dataset because the context is key. From the topfeatures table for the dfm, we know that all 50 documents (states) include the words domestic violence and order, but that does not provide insight into the kinds of provisions being applied. The data source uses iterations of the word “prohibit” to discuss whether or not each state prohibits firearm purchase and possession for DV offenders. Likewise, the source discusses removal provisions and whether each state specifies these provisions. That is why we cannot simply identify the presence of these words, so we need to develop a way to evaluate their context and therefore identify the presence and strength of each provision.
# First, I'll use the Lexicoder Sentiment Dictionary to evaluate the text
LSD <- dfm_lookup(laws_dfm3, dictionary = data_dictionary_LSD2015)
head(LSD)
Document-feature matrix of: 6 documents, 4 features (50.00% sparse) and 0 docvars.
features
docs negative positive neg_positive neg_negative
alabama 24 26 0 0
alaska 25 20 0 0
arizona 30 14 0 0
arkansas 15 14 0 0
california 82 55 0 0
colorado 24 40 0 0
# Find the states with the highest and lowest female homicide rates
FemaleHomRates <- CDCHomRates %>% arrange(desc(Crude.Rate_Female))
head(FemaleHomRates, 10)
State Crude.Rate_Female Crude.Rate_Male Crude_Total
1 Mississippi 3.6 18.8 11.0
2 Alabama 3.3 17.1 10.0
3 Alaska 3.3 8.6 6.1
4 Missouri 3.1 15.1 9.0
5 Louisiana 3 19.9 11.2
6 Arkansas 2.5 10.8 6.6
7 New Mexico 2.5 10.2 6.3
8 South Carolina 2.4 13.2 7.6
9 Tennessee 2.3 12.2 7.1
10 Nevada 2.1 7.3 4.7
Deaths_Female Deaths_Male Deaths_Total Population
1 55 272 327 2982260
2 82 404 487 4888601
3 12 33 45 736259
4 97 455 552 6125804
5 72 453 524 4664368
6 38 160 198 3011969
7 26 105 132 2093442
8 63 325 388 5085737
9 80 402 482 6771723
10 32 111 143 3037529
tail(FemaleHomRates, 10)
State Crude.Rate_Female Crude.Rate_Male Crude_Total
41 Oregon 0.6 2.8 1.7
42 Connecticut 0.5 3.2 1.8
43 Iowa 0.5 2.7 1.6
44 Minnesota 0.4 2.5 1.5
45 New York 0.4 3 1.7
46 Massachusetts 0.3 2.7 1.5
47 North Dakota 0.2 2.1 1.4
48 Rhode Island 0.2 1.8 1.1
49 Hawaii 0.1 1.4 0.9
50 South Dakota 0.1 2.7 1.4
Deaths_Female Deaths_Male Deaths_Total Population
41 14 59 73 4183742
42 9 55 64 3575379
43 8 42 50 3152309
44 13 69 82 5609139
45 42 289 331 19615056
46 11 90 101 6884824
47 2 8 10 759177
48 3 9 12 1058772
49 3 10 13 1421300
50 1 12 13 878853
# Plot LSD2015 sentiments for the states with the most and least firearm homicides
LSDdataframe <- as.data.frame(LSD) %>% select(doc_id, negative, positive)
LSDdataframe <- pivot_longer(data = LSDdataframe, cols = c("negative", "positive"), names_to = "Sentiment", values_to = "Score")
LSD_HighHom <- filter(LSDdataframe, doc_id %in% c("mississippi", "alabama", "alaska", "missouri", "louisiana", "arkansas", "new-mexico", "south-carolina", "tennessee", "nevada"))
LSD_HighHom$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High" ,"High" ,"High", "High", "High", "High", "High", "High", "High", "High", "High")
LSD_LowHom <- filter(LSDdataframe, doc_id %in% c("oregon", "connecticut", "iowa", "minnesota", "new-york", "massachusetts", "north-dakota", "rhode-island", "hawaii", "south-dakota"))
LSD_LowHom$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low" ,"Low" ,"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
LSD_LowHigh <- rbind(LSD_HighHom, LSD_LowHom)
ggplot(data = LSD_LowHigh, mapping = aes(x = doc_id, y = Score, fill = Sentiment)) +
geom_col(position = position_dodge()) +
labs(title = "Sentiments for States with 10 Highest and Lowest Homicide Rates",
subtitle = "LSD2015 Negative vs Positive", x = "State", y = "Sentiment Score") + facet_wrap(~ Hom_Rates, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90))
Well, this is a start. There doesn’t appear to be a difference in sentiment (by LSD2015 scale) between high and low homicide rate states. This doesn’t really tell me much, since the method does nothing to identify the context of the negative/positive words being used. I’ll move on.
To try one more dictionary approach, I’ll use the loughran lexicon, which has multiple sentiment categories: negative, positive, uncertainty, litigious, constraining, and superfluous. It seems that many key words I’m interested in are included in this lexicon, specifically in the litigious and constraining categories, like variations of the word prohibit. However, a key topic - removal - is missing, so I’ll add variations of this word to the lexicon.
# load loughran lexicon
lou <- get_sentiments("loughran")
# add remov words to lexicon
custom_lou <- lou %>%
bind_rows(tribble(~word, ~sentiment,
"remove", "constraining",
"removal", "constraining",
"removing", "constraining",
"removed", "constraining"))
custom_lou <- as.dictionary(custom_lou)
# use liwcalike() to estimate sentiment using customized loughran dictionary
laws_sentiment_lou <- liwcalike(as.character(laws_corpus), custom_lou)
laws_lou <- laws_sentiment_lou %>% select(docname, negative, positive, constraining, litigious,superfluous, uncertainty)
laws_lou
docname negative positive constraining litigious
1 alabama 5.82 0.20 2.01 4.02
2 alaska 3.95 0.00 2.29 2.91
3 arizona 6.10 0.00 2.10 7.43
4 arkansas 5.13 0.00 3.42 4.70
5 california 3.85 0.17 1.32 4.54
6 colorado 4.13 0.20 1.57 4.53
7 connecticut 2.88 0.00 2.42 2.18
8 delaware 4.75 0.06 1.16 3.53
9 florida 8.53 0.00 2.89 5.27
10 georgia 9.33 0.00 4.00 4.89
11 hawaii 6.08 0.00 3.52 3.52
12 idaho 5.42 0.00 3.31 4.52
13 illinois 4.61 0.14 1.01 3.96
14 indiana 3.15 0.00 1.68 3.57
15 iowa 3.43 0.00 2.24 3.13
16 kansas 7.35 0.25 2.70 4.17
17 kentucky 8.23 0.16 1.77 4.35
18 louisiana 7.97 0.24 0.90 5.60
19 maine 7.88 0.14 1.11 4.15
20 maryland 5.75 0.00 1.41 4.54
21 massachusetts 4.77 0.22 0.67 3.50
22 michigan 7.75 0.14 1.97 5.92
23 minnesota 6.12 0.17 1.40 5.24
24 mississippi 10.83 0.00 2.27 3.78
25 missouri 6.95 0.00 2.72 3.02
26 montana 9.47 0.00 2.27 4.92
27 nebraska 8.59 0.00 2.77 4.71
28 nevada 7.30 0.00 1.44 3.29
29 new-hampshire 6.89 0.00 1.26 5.93
30 new-jersey 5.04 0.26 3.25 4.27
31 new-mexico 5.07 0.15 3.06 4.33
32 new-york 6.19 0.11 0.63 5.56
33 north-carolina 4.39 0.12 1.53 4.21
34 north-dakota 7.05 0.00 1.95 2.85
35 ohio 5.86 0.00 1.52 3.40
36 oklahoma 7.87 0.00 1.78 4.44
37 oregon 5.12 0.15 2.49 3.36
38 pennsylvania 6.66 0.08 1.04 4.61
39 rhode-island 7.29 0.07 1.19 3.84
40 south-carolina 5.67 0.00 3.57 2.52
41 south-dakota 8.20 0.00 1.97 4.26
42 tennessee 6.57 0.00 1.71 3.02
43 texas 6.80 0.00 1.96 2.42
44 utah 5.52 0.12 1.56 2.88
45 vermont 6.79 0.08 1.39 4.66
46 virginia 5.73 0.00 1.65 2.97
47 washington 5.95 0.08 2.43 3.31
48 west-virginia 2.76 0.07 1.20 6.02
49 wisconsin 4.05 0.14 1.32 5.18
50 wyoming 7.83 0.00 2.78 3.54
superfluous uncertainty
1 0.00 0.40
2 0.00 1.66
3 0.00 0.95
4 0.00 1.28
5 0.12 1.21
6 0.00 0.79
7 0.00 0.62
8 0.00 1.09
9 0.00 0.88
10 0.00 1.78
11 0.00 0.80
12 0.00 1.20
13 0.22 0.94
14 0.00 1.26
15 0.00 0.30
16 0.00 0.74
17 0.00 1.29
18 0.00 0.52
19 0.00 0.76
20 0.00 0.91
21 0.00 0.89
22 0.00 1.83
23 0.06 0.82
24 0.00 1.26
25 0.00 1.21
26 0.00 0.95
27 0.00 1.11
28 0.00 1.23
29 0.07 1.56
30 0.00 1.11
31 0.07 0.89
32 0.00 1.03
33 0.00 0.67
34 0.00 0.90
35 0.12 0.82
36 0.00 1.14
37 0.00 0.58
38 0.00 0.39
39 0.00 0.53
40 0.00 0.42
41 0.00 0.98
42 0.00 0.79
43 0.00 1.21
44 0.12 1.08
45 0.00 0.65
46 0.00 1.76
47 0.04 0.67
48 0.00 0.35
49 0.00 1.04
50 0.00 0.76
# Plot sentiments "negative", "litigious", and "constraining" for top ten and lowesst ten homicide rated states
LOUdataframe <- as.data.frame(laws_lou) %>% select(docname, negative, constraining, litigious)
LOUdataframe <- pivot_longer(data = LOUdataframe, cols = c("negative", "constraining", "litigious"), names_to = "Sentiment", values_to = "Score")
LOU_HighHom <- filter(LOUdataframe, docname %in% c("mississippi", "alabama", "alaska", "missouri", "louisiana", "arkansas", "new-mexico", "south-carolina", "tennessee", "nevada"))
LOU_HighHom$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High" ,"High" ,"High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High", "High")
LOU_LowHom <- filter(LOUdataframe, docname %in% c("oregon", "connecticut", "iowa", "minnesota", "new-york", "massachusetts", "north-dakota", "rhode-island", "hawaii", "south-dakota"))
LOU_LowHom$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low" ,"Low" ,"Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
LOU_LowHigh <- rbind(LOU_HighHom, LOU_LowHom)
ggplot(data = LOU_LowHigh, mapping = aes(x = docname, y = Score, fill = Sentiment)) +
geom_col(position = position_dodge()) +
labs(title = "Sentiments for States with 10 Highest and Lowest Homicide Rates",
subtitle = "Loughran: Negative, Litigious, Constraining", x = "State", y = "Sentiment Score") + facet_wrap(~ Hom_Rates, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90))
Okay, same as with the LSD2015 lexicon - it doesn’t appear that there is much difference across states in the language being used in the firearm provisions. When reviewing the datasource, the verbage is consistent accross the text documents, but the context is key. I will move on to more complex text analysis.
Preprocessing:
As I stated in the representing texts section, I’m hesitant to remove stopwords because of the key context they provide …but a lot of the stopwords are just noise. Therefore, I will customize “en” stopwords to keep the essential-to-context ones. I’ll employ n-grams to identify stopwords that are key to context.
Key words that I’ve identified to look at surrounding words are:
“prohibit; prohibits; prohibited; prohibition; prohibitions”
“removal; removed; removing”
“require; requires; required; requirement; requirements”
d <- tibble(disarmDF)
ngram <- d %>%
unnest_tokens(ngram, Text, token = "ngrams", n = 7)
head(ngram)
# A tibble: 6 x 2
State ngram
<chr> <chr>
1 alabama alabama law alabama domestic violence firearm prohibitions
2 alabama law alabama domestic violence firearm prohibitions alabama
3 alabama alabama domestic violence firearm prohibitions alabama dome~
4 alabama domestic violence firearm prohibitions alabama domestic vio~
5 alabama violence firearm prohibitions alabama domestic violence fir~
6 alabama firearm prohibitions alabama domestic violence firearm purc~
# n-grams surrounding "prohib" wordstem
ngrams_separated <- ngram %>%
separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")
ngrams_separated %>%
filter(str_detect(word4, "^prohib")) %>%
count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 275 x 9
State word1 word2 word3 word4 word5 word6 word7 n
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 washington other dang~ weap~ proh~ the party from 4
2 washington under wash~ law proh~ the party from 4
3 wisconsin person is not proh~ from poss~ a 4
4 louisiana is no long~ proh~ from poss~ a 3
5 south-carolina the pers~ is proh~ from ship~ tran~ 3
6 wisconsin the pers~ is proh~ from poss~ a 3
7 illinois order of prot~ proh~ the resp~ from 2
8 kentucky kentucky does not proh~ purc~ or poss~ 2
9 mississippi mississip~ does not proh~ purc~ and poss~ 2
10 missouri missouri does not proh~ purc~ and poss~ 2
# ... with 265 more rows
# n-grams surrounding "remov" wordstem
ngrams_separated <- ngram %>%
separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")
ngrams_separated %>%
filter(str_detect(word4, "^remov")) %>%
count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 180 x 9
State word1 word2 word3 word4 word5 word6 word7 n
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 massachusetts firearms and lice~ remo~ purs~ to a 2
2 new-hampshire weapons relin~ or remo~ purs~ to a 2
3 alabama not requi~ the remo~ of fire~ from 1
4 alabama protection order fire~ remo~ dome~ viol~ civil 1
5 alabama that requi~ fire~ remo~ alab~ law does 1
6 alaska not requi~ the remo~ of fire~ from 1
7 alaska protective order fire~ remo~ dome~ viol~ civil 1
8 alaska that requi~ fire~ remo~ alas~ law does 1
9 alaska this parag~ 3 remo~ proc~ a court 1
10 arizona allow for the remo~ of fire~ from 1
# ... with 170 more rows
# n-grams surrounding "requir" wordstem
ngrams_separated <- ngram %>%
separate(ngram, c("word1", "word2", "word3", "word4", "word5", "word6", "word7"), sep = " ")
ngrams_separated %>%
filter(str_detect(word4, "^requir")) %>%
count(State, word1, word2, word3, word4, word5, word6, word7, sort = TRUE)
# A tibble: 163 x 9
State word1 word2 word3 word4 word5 word6 word7 n
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1 washington other dange~ weap~ requ~ that the party 4
2 nevada domestic viole~ may requ~ the adve~ party 2
3 pennsylvania or ammun~ as requ~ by a final 2
4 washington under washi~ law requ~ that the party 2
5 west-virginia final heari~ shall requ~ the resp~ to 2
6 alabama law does not requ~ the remo~ of 1
7 alabama protection orders that requ~ fire~ remo~ alab~ 1
8 alaska law does not requ~ the remo~ of 1
9 alaska protective orders that requ~ fire~ remo~ alas~ 1
10 arizona of prote~ that requ~ fire~ remo~ if 1
# ... with 153 more rows
After reviewing those n-grams, I’ve identified the following stopwords to be essential to the context of the text: “do; does; doesn’t; don’t; or; nor; and; not; own; is; no; are”
__Following example from https://slcladal.github.io/topicmodels.html__
library(tm)
Corpus <- Corpus(VectorSource(disarmdv))
processedCorpus <- tm_map(Corpus, content_transformer(tolower))
processedCorpus <- tm_map(processedCorpus, removeWords, CustomStopwords)
processedCorpus <- tm_map(processedCorpus, removePunctuation, preserve_intra_word_dashes = TRUE)
processedCorpus <- tm_map(processedCorpus, removeNumbers)
head(processedCorpus, 1)
<<SimpleCorpus>>
Metadata: corpus specific: 1, document level (indexed): 0
Content: documents: 1
minFrequency <- 3
DTM <- DocumentTermMatrix(processedCorpus, control = list(bounds = list(global = c(minFrequency, Inf))))
dim(DTM)
[1] 50 779
# remove empty rows from DTM
sel_idx <- slam::row_sums(DTM) > 0
DTM <- DTM[sel_idx, ]
# Evaluating different number of topics to determine proper number
library(ldatuning)
result <- FindTopicsNumber(
DTM,
topics = seq(from = 2, to = 30, by = 1),
metrics = c("CaoJuan2009", "Deveaud2014", "Griffiths2004", "Arun2010"),
method = "Gibbs",
control = list(seed = 77),
verbose = TRUE
)
fit models... done.
calculate metrics:
CaoJuan2009... done.
Deveaud2014... done.
Griffiths2004... done.
Arun2010... done.
FindTopicsNumber_plot(result)
It looks like 19 topics is maximizes/minimizes the metrics the most. I initially thought to go with 4 topics because a basic overview of the source seems to have 4 major topics to me: prohibitions, orders, removal, and penalties.
tmResult <- posterior(topicModel)
beta <- tmResult$terms
theta <- tmResult$topics
Terms <- terms(topicModel, 5)
Terms
Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
[1,] "court" "ammunition" "firearm" "and" "party"
[2,] "officer" "and" "possession" "firearms" "and"
[3,] "deadly" "persons" "court" "ammunition" "firearms"
[4,] "and" "firearm" "control" "shall" "transfer"
[5,] "may" "subject" "subject" "purchase" "firearm"
Topic 6 Topic 7 Topic 8 Topic 9 Topic 10
[1,] "law" "person" "protection" "not" "ammunition"
[2,] "order" "child" "order" "may" "abuse"
[3,] "enforcement" "partner" "civil" "order" "defendant"
[4,] "firearms" "bodily" "issuing" "orders" "weapons"
[5,] "shall" "prohibits" "violation" "final" "and"
Topic 11 Topic 12 Topic 13 Topic 14 Topic 15
[1,] "order" "protective" "respondent" "not" "violence"
[2,] "person" "order" "firearm" "person" "firearm"
[3,] "use" "violence" "court" "shall" "domestic"
[4,] "persons" "and" "and" "injunction" "weapon"
[5,] "restraining" "may" "may" "firearms" "order"
Topic 16 Topic 17 Topic 18 Topic 19
[1,] "and" "domestic" "violence" "family"
[2,] "dangerous" "abuse" "domestic" "household"
[3,] "firearms" "physical" "persons" "member"
[4,] "weapons" "possession" "petition" "orders"
[5,] "license" "not" "does" "injury"
# Create and apply names to each topic
top5termsPerTopic <- terms(topicModel, 5)
topicNames <- apply(top5termsPerTopic, 2, paste, collapse=" ")
topicNames
Topic 1
"court officer deadly and may"
Topic 2
"ammunition and persons firearm subject"
Topic 3
"firearm possession court control subject"
Topic 4
"and firearms ammunition shall purchase"
Topic 5
"party and firearms transfer firearm"
Topic 6
"law order enforcement firearms shall"
Topic 7
"person child partner bodily prohibits"
Topic 8
"protection order civil issuing violation"
Topic 9
"not may order orders final"
Topic 10
"ammunition abuse defendant weapons and"
Topic 11
"order person use persons restraining"
Topic 12
"protective order violence and may"
Topic 13
"respondent firearm court and may"
Topic 14
"not person shall injunction firearms"
Topic 15
"violence firearm domestic weapon order"
Topic 16
"and dangerous firearms weapons license"
Topic 17
"domestic abuse physical possession not"
Topic 18
"violence domestic persons petition does"
Topic 19
"family household member orders injury"
library(reshape2)
N <- length(Test1)
topicProportionExamples <- theta[HighHomStates,]
colnames(topicProportionExamples) <- topicNames
vizDataFrame <- melt(cbind(data.frame(topicProportionExamples), document = factor(1:N)), variable.name = "topic", id.vars = "document")
ggplot(data = vizDataFrame, aes(topic, value, fill = document), ylab = "proportion") +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
coord_flip() +
facet_wrap(~ document, ncol = N)
It doesn’t appear that one topic is dominating the states with the highest female homicide rates.
I don’t think LDA is the strongest analysis for this research, so I’ll move on…
# I'm going to run STM with stopwords removed ..
# Based on running initial models, the topics were including state names often
# Preprocessing corpus: tokenizing and creating document feature matrix
DFM_laws <- tokens(laws_corpus,
remove_punct= TRUE,
remove_separators = TRUE,
remove_numbers = TRUE,
remove_symbols = TRUE) %>%
tokens_tolower() %>%
tokens_select(pattern=stopwords("en"),
selection="remove") %>%
tokens_remove(pattern = c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming', 'new', 'north', 'south', 'dakota', 'york', 'carolina', 'jersey', 'hampshire', 'mexico', 'rhode', 'island', 'west')) %>%
dfm()
head(DFM_laws)
Document-feature matrix of: 6 documents, 1,990 features (92.35% sparse) and 0 docvars.
features
docs law domestic violence firearm prohibitions purchase
alabama 5 6 5 7 2 1
alaska 4 11 11 8 2 2
arizona 3 7 7 10 2 1
arkansas 2 4 4 4 2 1
california 19 10 12 42 2 2
colorado 9 6 6 18 2 1
features
docs possession prohibits following individuals
alabama 1 2 3 2
alaska 4 0 0 1
arizona 1 2 1 1
arkansas 1 0 2 1
california 8 1 4 1
colorado 9 1 3 1
[ reached max_nfeat ... 1,980 more features ]
# Convert dfm to stm
dfm_stm <- convert(DFM_laws, to = "stm")
Choosing K - number of topics
# Create table with topic number info
Kplot <- data.frame("K" = K,
"Coherence" = unlist(fit$results$semcoh),
"Exclusivity" = unlist(fit$results$exclus))
# Reshape table to long format
Kplot <- melt(Kplot, id=c("K"))
Kplot
K variable value
1 4 Coherence -12.755275
2 5 Coherence -9.205106
3 6 Coherence -14.924754
4 7 Coherence -17.113427
5 8 Coherence -14.701060
6 9 Coherence -15.432468
7 10 Coherence -18.802207
8 4 Exclusivity 7.757162
9 5 Exclusivity 7.967502
10 6 Exclusivity 8.252566
11 7 Exclusivity 8.417143
12 8 Exclusivity 8.524789
13 9 Exclusivity 8.646599
14 10 Exclusivity 8.752513
#Plot results of topic number testing
ggplot(Kplot, aes(K, value, color = variable)) +
geom_line(size = 1.5, show.legend = FALSE) +
facet_wrap(~variable,scales = "free_y") +
labs(x = "Number of topics K",
title = "Statistical fit of models with different K")
It looks like 4 and 6 topics would provide more coherent topics but not very exclusive, and 9 would also have moderate coherence and the most exclusivity of these three K’s…
Now I’ll test K values using the criterion of interpretability and relevance
# based on the statistical chart, I'll look at K = 4, 6, 9
model_4K <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 4,
verbose = FALSE)
model_5K <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 5,
verbose = FALSE)
model_6K <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 6,
verbose = FALSE)
model_9K <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 9,
verbose = FALSE)
I’ll take a look at the top features for each topic for each model
# for K = 4
topics_4 <- labelTopics(model_4K, n=10)
topics_4 <- data.frame("features" = t(topics_4$frex))
colnames(topics_4) <- paste("Topics", c(1:4))
topics_4
Topics 1 Topics 2 Topics 3 Topics 4
1 persons adverse revolver rifles
2 relief abusing protective shotguns
3 spouses restrained permits fid
4 victim dangerous muzzle-loading third
5 foid party divorce relinquished
6 however concealed deadly safekeeping
7 misdemeanor pistol specified sheriff
8 violence party's aggrieved approved
9 protection enforcement carry large
10 ex agency defendant acknowledgment
# for K = 5
topics_5 <- labelTopics(model_5K, n=10)
topics_5 <- data.frame("features" = t(topics_5$frex))
colnames(topics_5) <- paste("Topics", c(1:5))
topics_5
Topics 1 Topics 2 Topics 3 Topics 4 Topics 5
1 protection adverse protective injunction revolver
2 persons restrained parents respondent machine
3 violence abusing family approved guns
4 spouse dangerous muzzle-loading form permits
5 ex party transferred prevention permit
6 parte concealed partner obey fid
7 misdemeanor party's threat possessed carry
8 domestic enforcement divorce hearing weapons
9 removal agency preliminary sheriff defendant
10 personal control deadly surrendered ammunition
# for K = 6
topics_6 <- labelTopics(model_6K, n=10)
topics_6 <- data.frame("features" = t(topics_6$frex))
colnames(topics_6) <- paste("Topics", c(1:6))
topics_6
Topics 1 Topics 2 Topics 3 Topics 4 Topics 5
1 partner adverse protective respondent revolver
2 victim abusing divorce injunction pistol
3 persons restrained post-separation hearing permits
4 ex party transferred possessed restraining
5 parte transfer parent prevention concealed
6 protection party's battery obey dangerous
7 violence agency family surrendered deadly
8 dating control preliminary approved obtaining
9 child federally proceeding surrender immediately
10 spouse dealer transporting 3rd weapons
Topics 6
1 muzzle-loading
2 fid
3 safekeeping
4 weapons
5 large
6 defendant
7 relinquished
8 rifles
9 shotguns
10 acknowledgment
# for K = 9
topics_9 <- labelTopics(model_9K, n=10)
topics_9 <- data.frame("features" = t(topics_9$frex))
colnames(topics_9) <- paste("Topics", c(1:9))
topics_9
Topics 1 Topics 2 Topics 3 Topics 4 Topics 5
1 ex adverse protective injunction revolver
2 parte abusing transferred form pistol
3 final restrained preliminary proof concealed
4 personal party divorce hearing dangerous
5 protection party's post-separation stay restraining
6 persons transfer cases step license
7 chapter dealer proceeding respondent obtaining
8 violence control employment information valid
9 issuing agency parent third commissioner
10 orders federally battery sheriff weapons
Topics 6 Topics 7 Topics 8 Topics 9
1 fid permits foid victim
2 muzzle-loading deadly prevention living
3 safekeeping defendant card intimate
4 relinquished specified obey dating
5 large purchaser aggravated child
6 weapons aggrieved possessed qualified
7 relinquish identification 3rd partner
8 rifles weapon existing shipping
9 shotguns restraining revoke household
10 acknowledgment judge degree acts
K = 9 as expected has the most unique (exclusive) topics, yet they seem pretty coherent still. I’ll go with K = 9 to proceed with analysis.
# Fit model with 9 topics
stm_model9 <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 9,
verbose = TRUE)
Beginning Spectral Initialization
Calculating the gram matrix...
Finding anchor words...
.........
Recovering initialization...
...................
Initialization complete.
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -5.793)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -5.623, relative change = 2.933e-02)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -5.572, relative change = 9.056e-03)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -5.548, relative change = 4.324e-03)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -5.536, relative change = 2.154e-03)
Topic 1: order, protection, violence, domestic, persons
Topic 2: party, order, firearm, protection, firearms
Topic 3: protective, order, violence, domestic, firearm
Topic 4: respondent, firearms, firearm, order, court
Topic 5: order, pistol, firearm, person, firearms
Topic 6: firearms, abuse, order, defendant, ammunition
Topic 7: order, firearms, violence, domestic, defendant
Topic 8: order, protection, respondent, firearms, court
Topic 9: violence, domestic, persons, person, protection
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -5.530, relative change = 1.113e-03)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -5.527, relative change = 5.705e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -5.525, relative change = 3.651e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -5.524, relative change = 2.707e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -5.523, relative change = 1.382e-04)
Topic 1: order, protection, violence, domestic, persons
Topic 2: order, firearm, party, protection, firearms
Topic 3: protective, order, domestic, firearm, violence
Topic 4: respondent, firearms, firearm, order, court
Topic 5: order, firearm, pistol, person, firearms
Topic 6: firearms, order, abuse, defendant, ammunition
Topic 7: order, firearms, violence, domestic, protective
Topic 8: order, protection, respondent, firearms, court
Topic 9: domestic, violence, person, persons, protection
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -5.522, relative change = 9.098e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -5.522, relative change = 8.902e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -5.521, relative change = 7.416e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -5.521, relative change = 7.012e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -5.521, relative change = 7.445e-05)
Topic 1: order, protection, violence, domestic, persons
Topic 2: order, firearm, party, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, firearms, firearm, order, court
Topic 5: order, firearm, pistol, person, firearms
Topic 6: firearms, order, abuse, ammunition, defendant
Topic 7: order, firearms, violence, domestic, protective
Topic 8: order, protection, respondent, firearms, court
Topic 9: domestic, violence, person, protection, persons
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -5.520, relative change = 6.269e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -5.520, relative change = 5.278e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -5.520, relative change = 4.244e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -5.520, relative change = 3.997e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -5.519, relative change = 6.764e-05)
Topic 1: order, protection, violence, domestic, persons
Topic 2: order, firearm, party, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, firearms, firearm, order, court
Topic 5: order, firearm, pistol, person, firearms
Topic 6: firearms, order, abuse, ammunition, defendant
Topic 7: order, firearms, violence, domestic, protective
Topic 8: order, protection, respondent, firearms, court
Topic 9: domestic, violence, person, order, protection
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -5.519, relative change = 5.884e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -5.519, relative change = 3.112e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -5.519, relative change = 2.015e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -5.518, relative change = 1.492e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -5.518, relative change = 1.132e-05)
Topic 1: order, protection, violence, domestic, persons
Topic 2: order, firearm, party, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, firearms, firearm, order, court
Topic 5: order, firearm, pistol, person, firearms
Topic 6: firearms, order, abuse, ammunition, defendant
Topic 7: order, firearms, violence, domestic, protective
Topic 8: order, protection, respondent, firearms, court
Topic 9: domestic, violence, person, order, protection
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Model Converged
# Inspect model with plot
plot(stm_model9)
There seems to be very similar topics, so I’ll fit another model with K = 6
# Fit model with 6 topics
stm_model6 <- stm(documents = dfm_stm$documents,
vocab = dfm_stm$vocab,
K = 6,
verbose = TRUE)
Beginning Spectral Initialization
Calculating the gram matrix...
Finding anchor words...
......
Recovering initialization...
...................
Initialization complete.
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 1 (approx. per word bound = -5.861)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 2 (approx. per word bound = -5.715, relative change = 2.486e-02)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 3 (approx. per word bound = -5.680, relative change = 6.089e-03)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 4 (approx. per word bound = -5.663, relative change = 3.072e-03)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 5 (approx. per word bound = -5.654, relative change = 1.615e-03)
Topic 1: order, protection, violence, domestic, persons
Topic 2: order, party, firearm, protection, firearms
Topic 3: order, protective, firearm, violence, domestic
Topic 4: respondent, order, firearms, firearm, court
Topic 5: order, firearms, firearm, ammunition, person
Topic 6: abuse, firearms, order, ammunition, defendant
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 6 (approx. per word bound = -5.650, relative change = 7.021e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 7 (approx. per word bound = -5.647, relative change = 4.916e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 8 (approx. per word bound = -5.644, relative change = 4.847e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 9 (approx. per word bound = -5.642, relative change = 3.906e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 10 (approx. per word bound = -5.641, relative change = 1.913e-04)
Topic 1: order, protection, domestic, violence, persons
Topic 2: order, party, firearm, protection, firearms
Topic 3: order, protective, firearm, domestic, violence
Topic 4: respondent, order, firearms, firearm, court
Topic 5: order, firearms, firearm, violence, domestic
Topic 6: firearms, abuse, order, ammunition, defendant
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 11 (approx. per word bound = -5.640, relative change = 1.338e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 12 (approx. per word bound = -5.640, relative change = 9.893e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 13 (approx. per word bound = -5.639, relative change = 6.967e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 14 (approx. per word bound = -5.639, relative change = 4.655e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 15 (approx. per word bound = -5.639, relative change = 4.115e-05)
Topic 1: order, protection, domestic, violence, persons
Topic 2: order, party, firearm, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, order, firearms, firearm, court
Topic 5: order, firearms, firearm, violence, domestic
Topic 6: firearms, abuse, order, ammunition, defendant
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 16 (approx. per word bound = -5.639, relative change = 5.361e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 17 (approx. per word bound = -5.638, relative change = 9.159e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 18 (approx. per word bound = -5.637, relative change = 1.106e-04)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 19 (approx. per word bound = -5.637, relative change = 7.219e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 20 (approx. per word bound = -5.637, relative change = 4.172e-05)
Topic 1: order, protection, domestic, violence, persons
Topic 2: order, party, firearm, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, order, firearms, firearm, court
Topic 5: order, firearms, firearm, violence, domestic
Topic 6: firearms, abuse, order, ammunition, defendant
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 21 (approx. per word bound = -5.637, relative change = 3.280e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 22 (approx. per word bound = -5.636, relative change = 4.152e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 23 (approx. per word bound = -5.636, relative change = 4.586e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 24 (approx. per word bound = -5.636, relative change = 2.632e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 25 (approx. per word bound = -5.636, relative change = 3.476e-05)
Topic 1: order, protection, domestic, violence, persons
Topic 2: order, party, firearm, protection, firearms
Topic 3: protective, order, firearm, domestic, violence
Topic 4: respondent, order, firearms, firearm, court
Topic 5: order, firearms, firearm, violence, domestic
Topic 6: firearms, order, abuse, ammunition, defendant
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 26 (approx. per word bound = -5.636, relative change = 2.630e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 27 (approx. per word bound = -5.635, relative change = 2.009e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Completing Iteration 28 (approx. per word bound = -5.635, relative change = 1.463e-05)
..................................................
Completed E-Step (0 seconds).
Completed M-Step.
Model Converged
# Inspect model with plot
plot(stm_model6)
The topics don’t appear as unique unless we apply a frex weight, so I’ll do that now.
# Frex weighted top terms for K = 6
Topic_Names6 <- labelTopics(stm_model6, topics = c(1:5), n=5)$frex
topwords6 <- data.frame("features" = t(Topic_Names6))
colnames(topwords6) <- paste("Topics", c(1:6))
topwords6
Topics 1 Topics 2 Topics 3 Topics 4 Topics 5
1 partner adverse protective respondent revolver
2 victim abusing divorce injunction pistol
3 persons restrained post-separation hearing permits
4 ex party transferred possessed restraining
5 parte transfer parent prevention concealed
Topics 6
1 muzzle-loading
2 fid
3 safekeeping
4 weapons
5 large
# Frex weighted top terms for K = 9
Topic_Names9 <- labelTopics(stm_model9, topics = c(1:9), n=5)$frex
topwords9 <- data.frame("features" = t(Topic_Names9))
colnames(topwords9) <- paste("Topics", c(1:9))
topwords9
Topics 1 Topics 2 Topics 3 Topics 4 Topics 5
1 ex adverse protective injunction revolver
2 parte abusing transferred form pistol
3 final restrained preliminary proof concealed
4 personal party divorce hearing dangerous
5 protection party's post-separation stay restraining
Topics 6 Topics 7 Topics 8 Topics 9
1 fid permits foid victim
2 muzzle-loading deadly prevention living
3 safekeeping defendant card intimate
4 relinquished specified obey dating
5 large purchaser aggravated child
plot(stm_model9, type = "summary")
The topic names and categories seem to make sense for K = 9 and cover a fair amount of topics for the source. I’ll proceed with K = 9
# make a document-topic-matrix
# topic association by state (gamma)
gamma <- tidy(stm_model9, matrix = "gamma") %>%
pivot_wider(names_from = topic, values_from = gamma)
gamma$state <- c('alabama', 'alaska', 'arizona', 'arkansas', 'california', 'colorado', 'connecticut', 'delaware', 'florida', 'georgia', 'hawaii', 'idaho', 'illinois', 'indiana', 'iowa', 'kansas', 'kentucky', 'louisiana', 'maine', 'maryland', 'massachusetts', 'michigan', 'minnesota', 'mississippi', 'missouri', 'montana', 'nebraska', 'nevada', 'new-hampshire', 'new-jersey', 'new-mexico', 'new-york', 'north-carolina', 'north-dakota', 'ohio', 'oklahoma', 'oregon', 'pennsylvania', 'rhode-island', 'south-carolina', 'south-dakota', 'tennessee', 'texas', 'utah', 'vermont', 'virginia', 'washington', 'west-virginia', 'wisconsin', 'wyoming')
gamma <- gamma[, c('document', 'state', '1', '2', '3', '4', '5', '6', '7', '8', '9')]
gamma
# A tibble: 50 x 11
document state `1` `2` `3` `4` `5` `6`
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 alabama 0.284 2.29e-4 5.67e-3 4.49e-4 6.51e-4 2.36e-3
2 2 alaska 0.986 1.18e-3 6.22e-3 5.70e-4 1.14e-3 2.51e-4
3 3 arizona 0.988 1.86e-3 2.59e-3 4.64e-4 1.23e-3 4.94e-4
4 4 arkansas 0.986 1.27e-3 3.11e-3 4.36e-4 1.23e-3 4.42e-4
5 5 california 0.0336 1.28e-1 4.96e-1 1.67e-1 2.15e-3 3.03e-4
6 6 colorado 0.0466 9.50e-1 1.04e-3 4.99e-4 5.76e-4 4.60e-4
7 7 connectic~ 0.00279 4.40e-4 1.46e-4 1.27e-4 9.95e-1 1.77e-4
8 8 delaware 0.205 1.97e-3 1.41e-3 7.88e-1 6.66e-4 3.40e-4
9 9 florida 0.00426 1.02e-5 1.14e-3 1.12e-4 4.02e-5 6.57e-4
10 10 georgia 0.754 2.26e-3 2.16e-1 1.68e-3 9.90e-4 4.19e-4
# ... with 40 more rows, and 3 more variables: `7` <dbl>, `8` <dbl>,
# `9` <dbl>
# Plot topic association (gamma) for each state
gamma1 <- gamma %>%
pivot_longer(c('1', '2', '3', '4', '5', '6', '7', '8', '9'), names_to = "topic", values_to = "gamma")
gamma1 %>%
ggplot(aes(factor(topic), gamma)) +
geom_boxplot() +
facet_wrap(~ state) +
labs(x = "topic", y = expression(gamma))
# word topic probabilities (beta)
beta <- tidy(stm_model9, matrix = "beta")
beta
# A tibble: 17,910 x 3
topic term beta
<int> <chr> <dbl>
1 1 1-3 8.95e-17
2 2 1-3 6.30e-80
3 3 1-3 2.82e- 4
4 4 1-3 5.21e-76
5 5 1-3 8.21e-81
6 6 1-3 1.01e-79
7 7 1-3 2.44e-75
8 8 1-3 3.59e-86
9 9 1-3 6.11e-38
10 1 1-5 3.31e-24
# ... with 17,900 more rows
beta1 <- beta %>%
group_by(topic) %>%
slice_max(beta, n = 10) %>%
ungroup() %>%
arrange(topic, -beta)
beta1 %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
scale_y_reordered()
gammaHigh <- gamma1 %>%
filter(state == 'mississippi' | state == 'alabama' | state == 'alaska' | state == 'missouri' | state == 'louisiana' | state == 'arkansas' | state == 'new-mexico' | state == 'south-carolina' | state == 'tennessee' | state == 'nevada')
gammaHigh <- as.data.frame(gammaHigh)
gammaHigh <- gammaHigh %>%
arrange(desc(gamma))
gammaHigh <- gammaHigh %>%
group_by(state) %>%
filter(gamma == max(gamma))
gammaHigh$Hom_Rates <- c("High", "High", "High", "High", "High", "High", "High", "High", "High", "High")
gammaLow <- gamma1 %>%
filter(state == 'oregon' | state == 'connecticut' | state == 'iowa' | state == 'minnesota' | state == 'new-york' | state == 'massachusetts' | state == 'north-dakota' | state == 'rhode-island' | state == 'hawaii' | state == 'south-dakota')
gammaLow <- as.data.frame(gammaLow)
gammaLow <- gammaLow %>%
arrange(desc(gamma))
gammaLow <- gammaLow %>%
group_by(state) %>%
filter(gamma == max(gamma))
gammaLow$Hom_Rates <- c("Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low", "Low")
gamma_LowHigh <- rbind(gammaHigh, gammaLow)
ggplot(data = gamma_LowHigh, mapping = aes(x = state, y = topic)) +
geom_point(position = position_dodge()) +
labs(title = "Most Prevalent Topic for States with Highest and Lowest Homicide Rates",
subtitle = "by gamma", x = "State", y = "Topic #") + facet_wrap(~ Hom_Rates, scales = "free_x") +
theme(axis.text.x = element_text(angle = 90))
WEtokens <- word_tokenizer(disarmDF$Text)
WE_dfm <- dfm(tokens(WEtokens))
topfeatures(WE_dfm)
the or a of to order and
3071 2087 1997 1905 1257 1007 807
in firearm firearms
611 596 564
# Create iterator object
WEit <- itoken(WEtokens, progressbar = FALSE)
# Build vocabulary
WEvocab <- create_vocabulary(WEit)
# Vocabulary
WEvocab
Number of docs: 50
0 stopwords: ...
ngram_min = 1; ngram_max = 1
Vocabulary:
term term_count doc_count
1: 1,000.00 1 1
2: 1,000.14 1 1
3: 1000.4 1 1
4: 13Within 1 1
5: 15,000.18 1 1
---
2556: to 1250 50
2557: a 1723 50
2558: of 1894 50
2559: or 2087 50
2560: the 2894 50
# Prune vocabulary
WEvocab <- prune_vocabulary(WEvocab, term_count_min = 3)
# Vectorize
WEvectorizer <- vocab_vectorizer(WEvocab)
# use window of 5 for context words
WEtcm <- create_tcm(WEit, WEvectorizer, skip_grams_window = 8L)
# Creating new GloVe model
glove <- GlobalVectors$new(rank = 50, x_max = 10)
# Checking GloVe methods
glove
<GloVe>
Public:
bias_i: NULL
bias_j: NULL
clone: function (deep = FALSE)
components: NULL
fit_transform: function (x, n_iter = 10L, convergence_tol = -1, n_threads = getOption("rsparse_omp_threads",
get_history: function ()
initialize: function (rank, x_max, learning_rate = 0.15, alpha = 0.75, lambda = 0,
shuffle: FALSE
Private:
alpha: 0.75
b_i: NULL
b_j: NULL
cost_history:
fitted: FALSE
glove_fitter: NULL
initial: NULL
lambda: 0
learning_rate: 0.15
rank: 50
w_i: NULL
w_j: NULL
x_max: 10
# Fitting model
wv_main <- glove$fit_transform(WEtcm, n_iter = 10, convergence_tol = 0.01, n_threads = 8)
INFO [15:06:09.259] epoch 1, loss 0.2709
INFO [15:06:09.321] epoch 2, loss 0.1399
INFO [15:06:09.358] epoch 3, loss 0.1084
INFO [15:06:09.395] epoch 4, loss 0.0897
INFO [15:06:09.432] epoch 5, loss 0.0771
INFO [15:06:09.464] epoch 6, loss 0.0679
INFO [15:06:09.496] epoch 7, loss 0.0610
INFO [15:06:09.530] epoch 8, loss 0.0556
INFO [15:06:09.556] epoch 9, loss 0.0512
INFO [15:06:09.583] epoch 10, loss 0.0476
[,1] [,2] [,3] [,4] [,5]
1000 -0.22377119 -0.37275691 0.49958320 -0.6530729 0.39787339
2,000 0.04565777 -0.16684887 -0.71996473 -0.1168168 -0.45784019
300 -0.19688222 0.06961206 0.46452678 -0.7765630 -0.01053767
33 0.12091936 0.04067427 0.13637843 0.1158650 0.01426115
34 0.66291260 0.40118000 -0.06885287 -0.7869727 -0.09392023
35 -0.25241339 0.35311719 0.41773917 -0.5406417 0.24061691
[,6] [,7] [,8] [,9] [,10]
1000 0.24531685 -0.19752144 -0.6461049 0.1118661 0.2232746
2,000 0.10235055 0.31602438 0.1597726 0.1951701 0.3401448
300 -0.03289126 -0.39322809 -0.4364519 0.3116974 0.1699319
33 0.50277031 -0.27060444 -0.2277159 0.3356594 0.3220402
34 0.24162998 -0.08751304 0.2287783 0.2888618 0.7397811
35 -0.09295799 -0.25456761 0.1901331 0.9845538 -0.4060035
[,11] [,12] [,13] [,14] [,15]
1000 -1.3742658 -0.39703509 0.27828311 -0.3747121 0.16217658
2,000 -0.7924532 -0.03491487 -0.25173260 -0.1681265 -0.16507693
300 0.3140177 -0.28673255 -0.01809755 -0.4637921 0.73505092
33 -0.7791282 -0.03023970 -0.80945145 0.4405687 -0.27639321
34 -0.1969642 0.06422293 0.08017974 -0.5666969 0.03131932
35 0.2670137 0.53335614 -0.45415415 -0.3046850 0.32296040
[,16] [,17] [,18] [,19] [,20]
1000 0.388331416 -0.67369307 0.12730280 -0.26534089 0.03731712
2,000 -0.353692867 -1.03561646 -0.11242526 0.55612515 0.58042799
300 -0.101200719 -0.12015707 -0.61307723 -0.24253844 0.48086226
33 -0.629378179 -0.59543316 -0.06934646 0.02699629 0.47617516
34 0.006477723 -0.29410601 -0.57052879 0.50191344 0.05161829
35 -0.341885381 0.07932932 -0.67213810 -0.04372061 0.61321268
[,21] [,22] [,23] [,24] [,25]
1000 -0.1118129 -0.88102581 -0.2839284 0.001948904 -0.46703445
2,000 -0.2573835 -0.10245248 -0.2228517 0.325271598 -0.14639982
300 0.4623094 -0.39478824 0.1557225 0.534141311 0.49165936
33 0.1578009 -0.02509966 -0.4889516 -0.298854355 0.03109315
34 0.6827955 0.10479717 -0.8228853 0.136242132 -0.11061420
35 0.5272659 0.09744415 -0.5689402 -0.418547251 -0.71604332
[,26] [,27] [,28] [,29] [,30]
1000 0.02046734 0.2262467 -0.4040565 -0.5591604 -0.3699760
2,000 -0.48967084 0.4750698 -0.1750091 -0.5365679 0.6111346
300 0.17950403 0.1112683 0.2807499 -0.6761707 0.1627183
33 -0.60492915 0.4012593 -0.3031201 0.5239270 0.1824540
34 0.04349048 -0.1248052 0.2165260 -0.6430890 -0.4238369
35 0.42602615 0.1591540 0.1823090 0.5311024 -0.1827240
[,31] [,32] [,33] [,34] [,35]
1000 0.04321407 -0.1719138 -0.1956032 0.2717458 0.3737434
2,000 -0.13370774 -0.2778356 0.3765130 -0.3957580 -0.2517429
300 -0.04606170 -0.8433769 -0.1614898 0.7812309 0.4673014
33 1.20174332 -0.1684193 0.4513720 0.3320473 -0.1699916
34 0.46127094 0.1247384 0.3376820 0.2356399 0.3722288
35 0.33492847 -0.2010675 0.5622297 -0.1451315 -0.1810949
[,36] [,37] [,38] [,39] [,40]
1000 -0.20119463 0.1766891 -0.189788859 -0.16486417 0.15189581
2,000 -0.09056757 -0.5993558 -0.194886622 -0.16767227 0.02652697
300 -0.11076369 0.3462067 0.001362241 -0.78891009 0.44354430
33 0.88093587 -0.2982164 -0.236459663 -0.45411685 0.25507558
34 0.83989620 -0.3804438 -0.064118349 -0.55311186 0.17670306
35 0.30358876 -0.1142943 0.815151187 -0.04128551 -0.63387738
[,41] [,42] [,43] [,44] [,45]
1000 0.09480729 0.05360328 0.19516048 -0.3772564 0.39955441
2,000 0.59459687 0.39587999 -0.17474019 -0.3592981 -0.07300888
300 -0.46506499 -0.66787370 -0.01464945 0.1803982 0.42308652
33 -0.58847918 -0.19449043 0.08108815 -0.8350119 -0.20293178
34 -1.00327445 0.13579859 -0.17790923 -0.8077632 0.51414648
35 -0.22457583 -0.03137244 0.31772997 -0.4711086 -0.23396411
[,46] [,47] [,48] [,49] [,50]
1000 0.009215315 0.3732960 0.03578524 -0.12787899 -0.02052856
2,000 0.130955581 0.6948428 -0.46186357 -0.39869351 -0.55764150
300 0.024648553 0.2319487 -0.25746141 -0.15867303 -0.33786671
33 0.345933090 -0.1017864 -1.08094941 -0.93477908 -0.17244307
34 0.111983344 0.5193804 -0.76355126 0.03014966 0.86986805
35 0.136556231 0.1642367 -0.90344928 -0.62902094 -0.09305010
# Word vectory for removal
removal <- word_vectors["removal", , drop = FALSE]
# Cosine similarity
removal_cos_sim <- sim2(x = word_vectors, y = removal, method = "cosine", norm = "l2")
# Top ten words relating to school
head(sort(removal_cos_sim[,1], decreasing = TRUE), 10)
removal require does not prohibit from
1.0000000 0.8387050 0.5425201 0.4881574 0.4660969 0.4199901
persons prohibited prohibits firearms
0.3978946 0.3973669 0.3920589 0.3865959
# Word vectory for firearm
firearm <- word_vectors["firearm", , drop = FALSE]
# Cosine similarity
firearm_cos_sim <- sim2(x = word_vectors, y = firearm, method = "cosine", norm = "l2")
# Top ten words relating to school
head(sort(firearm_cos_sim[,1], decreasing = TRUE), 10)
firearm any respondent firearms defendant ammunition
1.0000000 0.7805967 0.7090351 0.6838225 0.6715562 0.6199826
weapon their the possessing
0.6172396 0.6127874 0.5992732 0.5903008
d <- tibble(txt = disarmDF$Text)
bigrams <- d %>%
unnest_tokens(ngram, txt, token = "ngrams", n = 2)
head(bigrams)
# A tibble: 6 x 1
ngram
<chr>
1 alabama law
2 law alabama
3 alabama domestic
4 domestic violence
5 violence firearm
6 firearm prohibitions
sentences <- d %>%
unnest_tokens(sentence, txt, token = "sentences")
head(sentences)
# A tibble: 6 x 1
sentence
<chr>
1 alabama law alabama domestic violence firearm prohibitions ~
2 by its terms, explicitly prohibits the use, attempted use, or threa~
3 2 alabama civil protection order firearm removal domestic violence ~
4 alabama law does allow a judge issuing an ex parte protection order~
5 3 individuals who may petition for a protection order the following~
6 alaska law domestic violence firearm prohibitions alaska do~
ngram <- d %>%
unnest_tokens(ngram, txt, token = "ngrams", n = 8)
head(ngram)
# A tibble: 6 x 1
ngram
<chr>
1 alabama law alabama domestic violence firearm prohibitions alabama
2 law alabama domestic violence firearm prohibitions alabama domestic
3 alabama domestic violence firearm prohibitions alabama domestic vio~
4 domestic violence firearm prohibitions alabama domestic violence fi~
5 violence firearm prohibitions alabama domestic violence firearm pur~
6 firearm prohibitions alabama domestic violence firearm purchase and
# A tibble: 20 x 2
ngram n
<chr> <int>
1 persons convicted of misdemeanor crimes of domestic violence 18
2 a credible threat to the physical safety of 12
3 order the following persons may petition for a 12
4 ammunition by persons convicted of misdemeanor crimes of 11
5 by persons convicted of misdemeanor crimes of domestic 11
6 does not require removal of firearms or ammunition 10
7 firearms or ammunition by persons convicted of misdemeanor 10
8 individuals who may petition for a protective order 10
9 not require removal of firearms or ammunition from 10
10 of firearms or ammunition by persons convicted of 10
11 or ammunition by persons convicted of misdemeanor crimes 10
12 possession of firearms or ammunition by persons convicted 10
13 and possession of firearms or ammunition by persons 9
14 does not prohibit purchase and possession of firearms 9
15 does not prohibit purchase or possession of firearms 9
16 individuals who may petition for a domestic violence 9
17 purchase and possession of firearms or ammunition by 9
18 represents a credible threat to the physical safety 9
19 attempted use or threatened use of physical force 8
20 may petition for a protection from abuse order 8
Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2020 on CDC WONDER Online Database, released in 2021. Data are from the Multiple Cause of Death Files, 1999-2020, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at http://wonder.cdc.gov/ucd-icd10.html
Disarm Domestic Violence (2022) Protective Orders and Firearm Prohibitions. The National Coalition Against Domestic Violence; The Educational Fund to Stop Gun Violence; The Alliance for Gun Responsibility; Prosecutors Against Gun Violence. https://www.disarmdv.org/
Everytown Research & Policy (2022) Which States Prohibit Convicted Domestic Abusers from Having Guns? Everytown for Gun Safety Support Fund. https://everytownresearch.org/rankings/law/prohibition-for-convicted-domestic-abusers/
Giffords Law Center (2021) Domestic Violence & Firearms. Giffords Law Center to Prevent Gun Violence. https://giffords.org/lawcenter/gun-laws/
U.S. Department of Justice (2020) Criminal Resource Manual: 1117. Restrictions on the Possession of Firearms by Individuals Convicted of a Misdemeanor Crime of Domestic Violence. The United States Department of Justice Archives. https://www.justice.gov/archives/jm/criminal-resource-manual-1117-restrictions-possession-firearms-individuals-convicted
https://github.com/dondealban/learning-stm https://content-analysis-with-r.com/6-topic_models.html