Structural Topic Model

I decided to work with structured topic modeling for this analysis. This is because there is significant meta data available in these articles, for example the date they were published, the organized they were published in etc. While I don’t work on incorporating the metadata in this analysis, there is space in this analysis to uncover how different different organizations might talk about the same underlying topic using different word choices.

#insert metadata
metadata <- read_csv("reports_metadata.csv")
## Rows: 14190 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): file_name, new_filename, country_iso3c, country_name, report_name,...
## dbl (20): year.0, word_count, hathaway, state, fariss.mean, fariss.std_devia...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#filter metadata to match the corpus
metadata <- metadata %>% 
filter(year.0 >= 2000) %>% 
select(year.0,new_filename, organization)
#join metadata to the main document file
metadata <- metadata %>% 
right_join (articles, by = c("new_filename" = "doc_id")) %>% 
drop_na()
#convert dfm into a stm structure that is compatible with analysis in library(stm)
dfm_stm <- convert(dfm, to = "stm")
## Warning in dfm2stm(x, docvars, omit_empty = TRUE): Dropped empty document(s):
## text631, text1374, text1411, text4060, text5252

Search K model

To begin the analysis, I set K = 3-4-5-6-7-8. This means that I run a search stm model with the value of K ranging from 3-7 topics.

#run search K function to figure out what the best value of K is
K = c(4,5,6,7,8)
#run test model to search best value of K
model_test <- searchK(dfm_stm$documents, dfm_stm$vocab, K = K, verbose = TRUE)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ....
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.794) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.781, relative change = 1.860e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.774, relative change = 1.083e-03) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.769, relative change = 6.686e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.767, relative change = 3.388e-04) 
## Topic 1: countri, new, abus, unit, polit 
##  Topic 2: prison, death, tortur, peopl, kill 
##  Topic 3: person, provid, case, labor, women 
##  Topic 4: arrest, prison, howev, polit, member 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.766, relative change = 1.931e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.765, relative change = 1.216e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.764, relative change = 8.128e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.764, relative change = 5.687e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.763, relative change = 4.147e-05) 
## Topic 1: countri, unit, abus, polit, new 
##  Topic 2: prison, kill, death, tortur, peopl 
##  Topic 3: provid, person, case, labor, women 
##  Topic 4: howev, arrest, prison, polit, member 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.763, relative change = 3.130e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.763, relative change = 2.428e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.763, relative change = 1.921e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.763, relative change = 1.547e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.763, relative change = 1.263e-05) 
## Topic 1: countri, unit, abus, polit, new 
##  Topic 2: prison, kill, death, tortur, peopl 
##  Topic 3: provid, person, case, labor, women 
##  Topic 4: howev, arrest, prison, polit, member 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.763, relative change = 1.045e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Model Converged 
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      .....
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (7 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.791) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.777, relative change = 2.070e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.770, relative change = 1.149e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.766, relative change = 5.421e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.763, relative change = 5.008e-04) 
## Topic 1: women, howev, countri, foreign, new 
##  Topic 2: prison, peopl, death, tortur, kill 
##  Topic 3: provid, person, case, labor, women 
##  Topic 4: howev, prison, kill, arrest, member 
##  Topic 5: polit, countri, offici, group, public 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.760, relative change = 3.859e-04) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.758, relative change = 2.515e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.757, relative change = 1.720e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.756, relative change = 1.235e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.756, relative change = 9.126e-05) 
## Topic 1: women, foreign, countri, howev, polit 
##  Topic 2: prison, death, tortur, peopl, kill 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: howev, kill, person, arrest, prison 
##  Topic 5: countri, polit, group, abus, unit 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.755, relative change = 6.906e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.755, relative change = 5.344e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.755, relative change = 4.225e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.754, relative change = 3.415e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.754, relative change = 2.819e-05) 
## Topic 1: polit, foreign, women, countri, religi 
##  Topic 2: prison, death, tortur, peopl, kill 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: howev, kill, person, children, arrest 
##  Topic 5: countri, polit, abus, unit, group 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.754, relative change = 2.356e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.754, relative change = 1.997e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.754, relative change = 1.718e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.754, relative change = 1.490e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.753, relative change = 1.305e-05) 
## Topic 1: polit, foreign, countri, religi, public 
##  Topic 2: prison, death, tortur, peopl, kill 
##  Topic 3: provid, person, case, labor, women 
##  Topic 4: howev, kill, children, person, section 
##  Topic 5: countri, unit, abus, polit, group 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.753, relative change = 1.153e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.753, relative change = 1.024e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Model Converged 
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ......
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (8 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.785) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.772, relative change = 1.923e-03) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.765, relative change = 1.033e-03) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.761, relative change = 5.453e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.759, relative change = 3.688e-04) 
## Topic 1: new, inherit, time, women, howev 
##  Topic 2: prison, peopl, death, tortur, sentenc 
##  Topic 3: person, provid, labor, case, women 
##  Topic 4: prison, howev, arrest, polit, section 
##  Topic 5: polit, offici, countri, public, religi 
##  Topic 6: kill, group, militari, civilian, member 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.757, relative change = 3.474e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.755, relative change = 2.981e-04) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.753, relative change = 2.177e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.752, relative change = 1.579e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.751, relative change = 1.183e-04) 
## Topic 1: women, howev, countri, worker, foreign 
##  Topic 2: prison, peopl, tortur, death, arrest 
##  Topic 3: provid, person, case, labor, children 
##  Topic 4: prison, howev, arrest, polit, section 
##  Topic 5: polit, countri, offici, public, organ 
##  Topic 6: kill, militari, group, civilian, attack 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.751, relative change = 9.131e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.750, relative change = 7.239e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.750, relative change = 5.862e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.749, relative change = 4.826e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.749, relative change = 4.035e-05) 
## Topic 1: women, howev, worker, countri, foreign 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, arrest, offici, polit 
##  Topic 5: countri, polit, public, offici, organ 
##  Topic 6: kill, militari, group, civilian, children 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.749, relative change = 3.417e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.749, relative change = 2.920e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.749, relative change = 2.519e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.748, relative change = 2.189e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.748, relative change = 1.917e-05) 
## Topic 1: women, howev, worker, countri, foreign 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, offici, arrest, polit 
##  Topic 5: countri, polit, public, offici, new 
##  Topic 6: kill, militari, group, civilian, children 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.748, relative change = 1.690e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.748, relative change = 1.499e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.748, relative change = 1.336e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.748, relative change = 1.198e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.748, relative change = 1.081e-05) 
## Topic 1: women, howev, worker, countri, labor 
##  Topic 2: prison, tortur, death, peopl, sentenc 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, offici, arrest, polit 
##  Topic 5: countri, polit, public, new, offici 
##  Topic 6: kill, militari, group, civilian, children 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Model Converged 
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      .......
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (8 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.782) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.770, relative change = 1.814e-03) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.763, relative change = 9.543e-04) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.759, relative change = 5.702e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.757, relative change = 3.974e-04) 
## Topic 1: new, inherit, time, women, howev 
##  Topic 2: prison, peopl, death, tortur, sentenc 
##  Topic 3: person, provid, labor, case, women 
##  Topic 4: prison, arrest, howev, polit, member 
##  Topic 5: offici, public, religi, case, countri 
##  Topic 6: kill, case, militari, group, investig 
##  Topic 7: countri, polit, group, arrest, prison 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.754, relative change = 3.616e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.752, relative change = 3.254e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.750, relative change = 2.583e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.749, relative change = 2.029e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.748, relative change = 1.638e-04) 
## Topic 1: women, worker, howev, countri, foreign 
##  Topic 2: prison, peopl, tortur, death, sentenc 
##  Topic 3: provid, person, labor, children, case 
##  Topic 4: prison, howev, arrest, section, see 
##  Topic 5: offici, public, case, organ, religi 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, group, polit, area, civilian 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.747, relative change = 1.350e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.746, relative change = 1.127e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.745, relative change = 9.486e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.745, relative change = 8.030e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.744, relative change = 6.840e-05) 
## Topic 1: women, worker, foreign, countri, howev 
##  Topic 2: prison, peopl, tortur, death, sentenc 
##  Topic 3: provid, person, labor, children, prison 
##  Topic 4: howev, prison, arrest, section, see 
##  Topic 5: offici, public, case, organ, countri 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, group, civilian, polit, abus 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.744, relative change = 5.854e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.744, relative change = 5.038e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.743, relative change = 4.350e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.743, relative change = 3.778e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.743, relative change = 3.289e-05) 
## Topic 1: women, worker, foreign, countri, howev 
##  Topic 2: prison, tortur, peopl, death, arrest 
##  Topic 3: provid, person, labor, prison, children 
##  Topic 4: howev, prison, arrest, section, see 
##  Topic 5: offici, case, public, organ, countri 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, civilian, group, abus, polit 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.743, relative change = 2.878e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.743, relative change = 2.527e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.742, relative change = 2.220e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.742, relative change = 1.965e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.742, relative change = 1.743e-05) 
## Topic 1: women, worker, foreign, countri, howev 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, labor, prison, children 
##  Topic 4: howev, prison, arrest, section, see 
##  Topic 5: offici, case, public, organ, countri 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, abus, civilian, group, unit 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -6.742, relative change = 1.543e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -6.742, relative change = 1.372e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -6.742, relative change = 1.225e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -6.742, relative change = 1.097e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Model Converged 
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ........
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (9 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.778) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.765, relative change = 1.937e-03) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.758, relative change = 1.012e-03) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.754, relative change = 5.993e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.751, relative change = 3.949e-04) 
## Topic 1: inherit, new, time, women, clear 
##  Topic 2: peopl, prison, death, tortur, sentenc 
##  Topic 3: person, provid, labor, women, howev 
##  Topic 4: prison, arrest, howev, member, parti 
##  Topic 5: offici, polit, religi, public, foreign 
##  Topic 6: kill, militari, group, case, investig 
##  Topic 7: countri, polit, group, arrest, prison 
##  Topic 8: case, prison, investig, offic, person 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.749, relative change = 2.931e-04) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.748, relative change = 2.428e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.746, relative change = 2.153e-04) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.745, relative change = 1.870e-04) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.744, relative change = 1.570e-04) 
## Topic 1: women, new, howev, time, countri 
##  Topic 2: prison, peopl, tortur, death, sentenc 
##  Topic 3: labor, provid, person, children, women 
##  Topic 4: howev, prison, arrest, section, see 
##  Topic 5: religi, offici, polit, foreign, public 
##  Topic 6: kill, case, militari, investig, offic 
##  Topic 7: countri, group, polit, north, civilian 
##  Topic 8: case, person, investig, offic, provid 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.743, relative change = 1.311e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.742, relative change = 1.096e-04) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.742, relative change = 9.186e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.741, relative change = 7.765e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.741, relative change = 6.591e-05) 
## Topic 1: women, howev, countri, section, see 
##  Topic 2: prison, peopl, tortur, death, arrest 
##  Topic 3: labor, provid, person, children, general 
##  Topic 4: howev, prison, arrest, section, see 
##  Topic 5: offici, polit, religi, foreign, public 
##  Topic 6: kill, case, investig, militari, offic 
##  Topic 7: countri, group, polit, civilian, abus 
##  Topic 8: case, person, investig, provid, offic 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.740, relative change = 5.648e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.740, relative change = 4.879e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.740, relative change = 4.253e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.739, relative change = 3.726e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.739, relative change = 3.296e-05) 
## Topic 1: women, howev, countri, section, see 
##  Topic 2: prison, peopl, tortur, death, arrest 
##  Topic 3: labor, provid, person, general, children 
##  Topic 4: howev, arrest, section, prison, see 
##  Topic 5: offici, polit, public, religi, foreign 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, abus, unit, group, polit 
##  Topic 8: case, person, investig, provid, offic 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.739, relative change = 2.931e-05) 
## ....................................................................................................
## Completed E-Step (9 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.739, relative change = 2.623e-05) 
## ....................................................................................................
## Completed E-Step (6 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.739, relative change = 2.363e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.738, relative change = 2.140e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.738, relative change = 1.947e-05) 
## Topic 1: women, howev, countri, section, see 
##  Topic 2: prison, death, tortur, peopl, arrest 
##  Topic 3: provid, labor, person, prison, general 
##  Topic 4: howev, arrest, section, prison, see 
##  Topic 5: offici, polit, public, prison, religi 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, abus, unit, polit, group 
##  Topic 8: case, person, investig, provid, offic 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -6.738, relative change = 1.787e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -6.738, relative change = 1.647e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -6.738, relative change = 1.529e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -6.738, relative change = 1.426e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -6.738, relative change = 1.336e-05) 
## Topic 1: women, howev, countri, section, see 
##  Topic 2: prison, death, tortur, peopl, arrest 
##  Topic 3: provid, labor, person, prison, women 
##  Topic 4: howev, arrest, section, see, prison 
##  Topic 5: offici, polit, prison, public, foreign 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, abus, unit, polit, north 
##  Topic 8: case, person, investig, offic, provid 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -6.738, relative change = 1.260e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -6.738, relative change = 1.193e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -6.738, relative change = 1.134e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -6.738, relative change = 1.087e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -6.737, relative change = 1.040e-05) 
## Topic 1: women, howev, countri, section, see 
##  Topic 2: prison, death, tortur, peopl, arrest 
##  Topic 3: provid, labor, person, prison, women 
##  Topic 4: howev, arrest, section, see, prison 
##  Topic 5: offici, polit, prison, public, foreign 
##  Topic 6: kill, case, investig, offic, militari 
##  Topic 7: countri, unit, abus, north, war 
##  Topic 8: case, person, investig, offic, provid 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -6.737, relative change = 1.002e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Model Converged
#create plot object with value of coherence, exclusivity, residual and lower bound. 
plot <- data.frame("K" = K, 
                   "Coherence" =unlist(model_test$results$semcoh),
                   "Exclusivity" =unlist(model_test$results$exclus),
                   "Residual" =  unlist(model_test$results$residual),
                   "Lower Bound" = unlist(model_test$results$lbound))

# Reshape to long format
library("reshape2")
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
plot <- melt(plot, id=c("K"))
plot
#plot values to get a better sense of statistical fit of each K
library("ggplot2")
ggplot(plot, aes(K, value, color = variable)) +
  geom_line(size = 1.5, show.legend = FALSE) +
  facet_wrap(~variable,scales = "free_y") +
  labs(x = "Number of topics K",
       title = "Statistical fit of models with different K")+
  theme_fivethirtyeight()

After briefly analyzing the four metrics for statistical fit of the models with different Ks, I run my final model with 5 topics and plot the results. I then use the FREX method to keep the weights for each topic.

model <- stm(documents = dfm_stm$documents,
         vocab = dfm_stm$vocab,
         data = metadata$organization,
         K = 6 ,
         verbose = TRUE)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ......
##   Recovering initialization...
##      ................
## Initialization complete.
## ....................................................................................................
## Completed E-Step (8 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.785) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.772, relative change = 1.930e-03) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.765, relative change = 1.041e-03) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.761, relative change = 5.467e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.759, relative change = 3.760e-04) 
## Topic 1: new, inherit, time, women, howev 
##  Topic 2: prison, peopl, death, tortur, sentenc 
##  Topic 3: person, provid, labor, case, women 
##  Topic 4: prison, howev, arrest, polit, section 
##  Topic 5: polit, offici, countri, public, religi 
##  Topic 6: kill, group, militari, civilian, member 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.756, relative change = 3.595e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.754, relative change = 2.960e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.753, relative change = 2.103e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.752, relative change = 1.511e-04) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.751, relative change = 1.133e-04) 
## Topic 1: women, howev, countri, worker, foreign 
##  Topic 2: prison, peopl, tortur, death, arrest 
##  Topic 3: provid, person, case, labor, children 
##  Topic 4: prison, howev, arrest, polit, section 
##  Topic 5: polit, countri, offici, public, organ 
##  Topic 6: kill, militari, group, civilian, attack 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.751, relative change = 8.781e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.750, relative change = 6.988e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.750, relative change = 5.671e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.749, relative change = 4.676e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.749, relative change = 3.916e-05) 
## Topic 1: women, howev, worker, countri, foreign 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, arrest, polit, offici 
##  Topic 5: countri, polit, public, offici, organ 
##  Topic 6: kill, militari, group, civilian, attack 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.749, relative change = 3.315e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.749, relative change = 2.837e-05) 
## ....................................................................................................
## Completed E-Step (5 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.749, relative change = 2.449e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.748, relative change = 2.126e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.748, relative change = 1.864e-05) 
## Topic 1: women, worker, howev, countri, foreign 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, offici, arrest, polit 
##  Topic 5: countri, polit, public, offici, organ 
##  Topic 6: kill, militari, group, civilian, children 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.748, relative change = 1.644e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.748, relative change = 1.455e-05) 
## ....................................................................................................
## Completed E-Step (3 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.748, relative change = 1.300e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.748, relative change = 1.164e-05) 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.748, relative change = 1.047e-05) 
## Topic 1: women, worker, howev, countri, foreign 
##  Topic 2: prison, tortur, death, peopl, arrest 
##  Topic 3: provid, person, case, labor, prison 
##  Topic 4: prison, howev, offici, arrest, polit 
##  Topic 5: countri, polit, public, offici, new 
##  Topic 6: kill, militari, group, civilian, children 
## ....................................................................................................
## Completed E-Step (4 seconds). 
## Completed M-Step. 
## Model Converged
plot.STM(model, "summary", n=5)

labelTopics(model)$frex
##      [,1]       [,2]       [,3]        [,4]        [,5]       [,6]       
## [1,] "foreign"  "islam"    "muslim"    "citizen"   "permit"   "religion" 
## [2,] "peopl"    "al"       "death"     "ill-treat" "tortur"   "un"       
## [3,] "disabl"   "sexual"   "complaint" "traffick"  "provid"   "discrimin"
## [4,] "opposit"  "ngo"      "local"     "section"   "parti"    "beat"     
## [5,] "european" "unit"     "watch"     "reform"    "write"    "world"    
## [6,] "kill"     "civilian" "soldier"   "displac"   "conflict" "arm"      
##      [,7]       
## [1,] "christian"
## [2,] "amnesti"  
## [3,] "prohibit" 
## [4,] "see"      
## [5,] "polici"   
## [6,] "armi"
#Save top 20 features across topics and forms of weighting
labels <- labelTopics(model, n=30)
#only keep FREX weighting
topwords <- data.frame("features" = t(labels$frex))
#assign topic number as column name
colnames(topwords) <- paste("Topics", c(1:5))
#Return the result
topwords[1:5]
theta <- make.dt(model)
theta[1,1:5]

Tidymodelling and Visualization

top_terms <- model_beta%>%
  arrange(beta) %>%
  group_by(topic) %>%
  top_n(7, beta) %>%
  arrange(-beta) %>%
  select(topic, term) %>%
  summarise(terms = list(term)) %>%
  mutate(terms = map(terms, paste, collapse = ", ")) %>% 
  unnest(cols = c(terms))

gamma_terms <- model_gamma %>%
  group_by(topic) %>%
  summarise(gamma = mean(gamma)) %>%
  arrange(desc(gamma)) %>%
  left_join(top_terms, by = "topic") %>%
  mutate(topic = paste0("Topic ", topic),
         topic = reorder(topic, gamma))

gamma_terms %>%
  top_n(10, gamma) %>%
  ggplot(aes(topic, gamma, label = terms, fill = topic)) +
  geom_col(show.legend = FALSE) +
  geom_text(hjust = 1, nudge_y = 0.0009, size = 3) +
  coord_flip() +
  theme_hc() +
  theme(plot.title = element_text(size = 12)) +
  labs(x = NULL, y = expression(gamma),
       title = "Six Topics in Human Rights Texts",
       subtitle = "Six topics by prevalence with the top words that contribute to each topic")+ 
  theme_fivethirtyeight()

td_beta <- tidytext::tidy(model)
td_beta %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  filter(beta != "however") %>% 
 ungroup() %>%
    mutate(topic = paste0("Topic ", topic),
         term = reorder_within(term, beta, topic)) %>%
  ggplot(aes(term, beta, fill = as.factor(topic))) +
  geom_col(alpha = 0.8, show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free_y") +
  coord_flip() +
  scale_x_reordered() +
  labs(x = NULL, y = expression(beta),
       title = "Highest word probabilities for each topic",
       subtitle = "Different words are associated with different topics")+
       scale_color_manual(aesthetics = "Darjeeling2")+
  theme_fivethirtyeight()