Let’s see if I can get something working using the mallet package.

Getting some text to model

First, I will get some text to use in topic modeling. I’ll just start with the same text we currently have in the chapter on topic modeling using the topicmodels package.

library(dplyr)
library(gutenbergr)
library(tidytext)
library(stringr)
library(tidyr)

titles <- c("Twenty Thousand Leagues under the Sea", "The War of the Worlds",
            "Pride and Prejudice", "Great Expectations")
books <- gutenberg_works(title %in% titles) %>%
    gutenberg_download(meta_fields = "title")

by_chapter <- books %>%
    group_by(title) %>%
    mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
    ungroup() %>%
    filter(chapter > 0)

by_chapter
## # A tibble: 51,602 × 4
##    gutenberg_id
##           <int>
## 1            36
## 2            36
## 3            36
## 4            36
## 5            36
## 6            36
## 7            36
## 8            36
## 9            36
## 10           36
## # ... with 51,592 more rows, and 3 more variables: text <chr>, title <chr>,
## #   chapter <int>

Converting to the format mallet wants

The input to the mallet topic modeling is an array of text strings (of type character), plus document IDs. Let’s try to make that.

by_chapter_text <- by_chapter %>%
    unite(title_chapter, title, chapter) %>%
    group_by(title_chapter) %>%
    summarize(text = str_c(text, collapse = " "))

by_chapter_text    
## # A tibble: 193 × 2
##            title_chapter
##                    <chr>
## 1   Great Expectations_1
## 2  Great Expectations_10
## 3  Great Expectations_11
## 4  Great Expectations_12
## 5  Great Expectations_13
## 6  Great Expectations_14
## 7  Great Expectations_15
## 8  Great Expectations_16
## 9  Great Expectations_17
## 10 Great Expectations_18
## # ... with 183 more rows, and 1 more variables: text <chr>

I think that should do it.

Setting up for mallet topic modeling

This is the part I can only run from sudo R.

This first step imports the data into a mallet object. We need the document IDs, the documents themselves as an array of text strings, and a file (!!!) of stop words (one per line).

# make a file of stop words
library(readr)
write_csv(stop_words %>% select(word), "./stoplist.csv")

# set up the mallet object
library(mallet)
mallet_object <- mallet.import(by_chapter_text$title_chapter,
                               by_chapter_text$text,
                               "./stoplist.csv")

mallet_object
## [1] "Java-Object{[cc.mallet.types.Instance@378bf509, cc.mallet.types.Instance@5fd0d5ae, cc.mallet.types.Instance@2d98a335, cc.mallet.types.Instance@16b98e56, cc.mallet.types.Instance@7ef20235, cc.mallet.types.Instance@27d6c5e0, cc.mallet.types.Instance@4f3f5b24, cc.mallet.types.Instance@15aeb7ab, cc.mallet.types.Instance@7b23ec81, cc.mallet.types.Instance@6acbcfc0, cc.mallet.types.Instance@5f184fc6, cc.mallet.types.Instance@3feba861, cc.mallet.types.Instance@5b480cf9, cc.mallet.types.Instance@6f496d9f, cc.mallet.types.Instance@723279cf, cc.mallet.types.Instance@10f87f48, cc.mallet.types.Instance@b4c966a, cc.mallet.types.Instance@2f4d3709, cc.mallet.types.Instance@4e50df2e, cc.mallet.types.Instance@1d81eb93, cc.mallet.types.Instance@7291c18f, cc.mallet.types.Instance@34a245ab, cc.mallet.types.Instance@7cc355be, cc.mallet.types.Instance@6e8cf4c6, cc.mallet.types.Instance@12edcd21, cc.mallet.types.Instance@34c45dca, cc.mallet.types.Instance@52cc8049, cc.mallet.types.Instance@5b6f7412, cc.mallet.types.Instance@27973e9b, cc.mallet.types.Instance@312b1dae, cc.mallet.types.Instance@7530d0a, cc.mallet.types.Instance@27bc2616, cc.mallet.types.Instance@3941a79c, cc.mallet.types.Instance@506e1b77, cc.mallet.types.Instance@4fca772d, cc.mallet.types.Instance@9807454, cc.mallet.types.Instance@3d494fbf, cc.mallet.types.Instance@1ddc4ec2, cc.mallet.types.Instance@133314b, cc.mallet.types.Instance@b1bc7ed, cc.mallet.types.Instance@7cd84586, cc.mallet.types.Instance@30dae81, cc.mallet.types.Instance@1b2c6ec2, cc.mallet.types.Instance@4edde6e5, cc.mallet.types.Instance@70177ecd, cc.mallet.types.Instance@1e80bfe8, cc.mallet.types.Instance@66a29884, cc.mallet.types.Instance@4769b07b, cc.mallet.types.Instance@cc34f4d, cc.mallet.types.Instance@17a7cec2, cc.mallet.types.Instance@65b3120a, cc.mallet.types.Instance@6f539caf, cc.mallet.types.Instance@79fc0f2f, cc.mallet.types.Instance@50040f0c, cc.mallet.types.Instance@2dda6444, cc.mallet.types.Instance@5e9f23b4, cc.mallet.types.Instance@4783da3f, cc.mallet.types.Instance@378fd1ac, cc.mallet.types.Instance@49097b5d, cc.mallet.types.Instance@6e2c634b, cc.mallet.types.Instance@37a71e93, cc.mallet.types.Instance@7e6cbb7a, cc.mallet.types.Instance@7c3df479, cc.mallet.types.Instance@7106e68e, cc.mallet.types.Instance@7eda2dbb, cc.mallet.types.Instance@6576fe71, cc.mallet.types.Instance@76fb509a, cc.mallet.types.Instance@300ffa5d, cc.mallet.types.Instance@1f17ae12, cc.mallet.types.Instance@4d405ef7, cc.mallet.types.Instance@6193b845, cc.mallet.types.Instance@2e817b38, cc.mallet.types.Instance@c4437c4, cc.mallet.types.Instance@433c675d, cc.mallet.types.Instance@3f91beef, cc.mallet.types.Instance@1a6c5a9e, cc.mallet.types.Instance@37bba400, cc.mallet.types.Instance@179d3b25, cc.mallet.types.Instance@254989ff, cc.mallet.types.Instance@5d099f62, cc.mallet.types.Instance@37f8bb67, cc.mallet.types.Instance@49c2faae, cc.mallet.types.Instance@20ad9418, cc.mallet.types.Instance@31cefde0, cc.mallet.types.Instance@439f5b3d, cc.mallet.types.Instance@1d56ce6a, cc.mallet.types.Instance@5197848c, cc.mallet.types.Instance@17f052a3, cc.mallet.types.Instance@2e0fa5d3, cc.mallet.types.Instance@5010be6, cc.mallet.types.Instance@685f4c2e, cc.mallet.types.Instance@7daf6ecc, cc.mallet.types.Instance@2e5d6d97, cc.mallet.types.Instance@238e0d81, cc.mallet.types.Instance@31221be2, cc.mallet.types.Instance@377dca04, cc.mallet.types.Instance@728938a9, cc.mallet.types.Instance@21b8d17c, cc.mallet.types.Instance@6433a2, cc.mallet.types.Instance@5910e440, cc.mallet.types.Instance@6267c3bb, cc.mallet.types.Instance@533ddba, cc.mallet.types.Instance@246b179d, cc.mallet.types.Instance@7a07c5b4, cc.mallet.types.Instance@26a1ab54, cc.mallet.types.Instance@3d646c37, cc.mallet.types.Instance@41cf53f9, cc.mallet.types.Instance@5a10411, cc.mallet.types.Instance@2ef1e4fa, cc.mallet.types.Instance@306a30c7, cc.mallet.types.Instance@b81eda8, cc.mallet.types.Instance@68de145, cc.mallet.types.Instance@27fa135a, cc.mallet.types.Instance@46f7f36a, cc.mallet.types.Instance@421faab1, cc.mallet.types.Instance@2b71fc7e, cc.mallet.types.Instance@5ce65a89, cc.mallet.types.Instance@25f38edc, cc.mallet.types.Instance@1a86f2f1, cc.mallet.types.Instance@3eb07fd3, cc.mallet.types.Instance@506c589e, cc.mallet.types.Instance@69d0a921, cc.mallet.types.Instance@446cdf90, cc.mallet.types.Instance@799f7e29, cc.mallet.types.Instance@4b85612c, cc.mallet.types.Instance@277050dc, cc.mallet.types.Instance@5c29bfd, cc.mallet.types.Instance@7aec35a, cc.mallet.types.Instance@67424e82, cc.mallet.types.Instance@42110406, cc.mallet.types.Instance@531d72ca, cc.mallet.types.Instance@22d8cfe0, cc.mallet.types.Instance@579bb367, cc.mallet.types.Instance@1de0aca6, cc.mallet.types.Instance@255316f2, cc.mallet.types.Instance@41906a77, cc.mallet.types.Instance@4b9af9a9, cc.mallet.types.Instance@5387f9e0, cc.mallet.types.Instance@6e5e91e4, cc.mallet.types.Instance@2cdf8d8a, cc.mallet.types.Instance@30946e09, cc.mallet.types.Instance@5cb0d902, cc.mallet.types.Instance@46fbb2c1, cc.mallet.types.Instance@1698c449, cc.mallet.types.Instance@5ef04b5, cc.mallet.types.Instance@5f4da5c3, cc.mallet.types.Instance@443b7951, cc.mallet.types.Instance@14514713, cc.mallet.types.Instance@69663380, cc.mallet.types.Instance@5b37e0d2, cc.mallet.types.Instance@4459eb14, cc.mallet.types.Instance@5a2e4553, cc.mallet.types.Instance@28c97a5, cc.mallet.types.Instance@6659c656, cc.mallet.types.Instance@6d5380c2, cc.mallet.types.Instance@45ff54e6, cc.mallet.types.Instance@2328c243, cc.mallet.types.Instance@bebdb06, cc.mallet.types.Instance@7a4f0f29, cc.mallet.types.Instance@45283ce2, cc.mallet.types.Instance@2077d4de, cc.mallet.types.Instance@7591083d, cc.mallet.types.Instance@77a567e1, cc.mallet.types.Instance@736e9adb, cc.mallet.types.Instance@6d21714c, cc.mallet.types.Instance@108c4c35, cc.mallet.types.Instance@4ccabbaa, cc.mallet.types.Instance@4bf558aa, cc.mallet.types.Instance@2d38eb89, cc.mallet.types.Instance@5fa7e7ff, cc.mallet.types.Instance@4629104a, cc.mallet.types.Instance@27f8302d, cc.mallet.types.Instance@4d76f3f8, cc.mallet.types.Instance@2d8e6db6, cc.mallet.types.Instance@23ab930d, cc.mallet.types.Instance@4534b60d, cc.mallet.types.Instance@3fa77460, cc.mallet.types.Instance@619a5dff, cc.mallet.types.Instance@1ed6993a, cc.mallet.types.Instance@7e32c033, cc.mallet.types.Instance@7ab2bfe1, cc.mallet.types.Instance@497470ed, cc.mallet.types.Instance@63c12fb0, cc.mallet.types.Instance@b1a58a3, cc.mallet.types.Instance@6438a396, cc.mallet.types.Instance@e2144e4, cc.mallet.types.Instance@6477463f, cc.mallet.types.Instance@3d71d552, cc.mallet.types.Instance@1cf4f579, cc.mallet.types.Instance@18769467, cc.mallet.types.Instance@46ee7fe8, cc.mallet.types.Instance@7506e922, cc.mallet.types.Instance@4ee285c6]}"

There are some other possible inputs to the mallet object (whether to convert to lowercase, how to tokenize).

Now, we set up a mallet topic model trainer object. Let’s use 4 topics, as per the usual for us here.

topic_model <- MalletLDA(num.topics = 4)

topic_model
## [1] "Java-Object{cc.mallet.topics.RTopicModel@28ba21f3}"

The next step is to load the documents into the topic model trainer object.

topic_model$loadDocuments(mallet_object)

One of the things stored in the topic model object is a “vocabulary”. What is there?

vocabulary <- topic_model$getVocabulary()
vocabulary[1:50]
##  [1] "chapter"      "father"       "family"       "pirrip"       "christian"   
##  [6] "philip"       "infant"       "tongue"       "names"        "explicit"    
## [11] "pip"          "called"       "authority"    "tombstone"    "sister"      
## [16] "joe"          "gargery"      "married"      "blacksmith"   "mother"      
## [21] "likeness"     "days"         "photographs"  "fancies"      "unreasonably"
## [26] "derived"      "tombstones"   "shape"        "letters"      "odd"         
## [31] "idea"         "square"       "stout"        "dark"         "curly"       
## [36] "black"        "hair"         "character"    "inscription"  "georgiana"   
## [41] "wife"         "drew"         "childish"     "conclusion"   "freckled"    
## [46] "sickly"       "stone"        "lozenges"     "foot"         "half"

Actually topic modeling

Now we can train a model. The first line there instructs the modeling how to optimize hyperparameters.

topic_model$setAlphaOptimization(20, 50)
topic_model$train(200)

The model is now trained! Now we can get the probability for each topics in each document.

doc_topics <- mallet.doc.topics(topic_model, 
                                smoothed = TRUE, normalized = TRUE)
doc_topics 
##                [,1]         [,2]         [,3]         [,4]
##   [1,] 0.6965624945 0.0351636740 0.0329052005 0.2353686309
##   [2,] 0.8329565163 0.0743004036 0.0014738436 0.0912692365
##   [3,] 0.8158110890 0.0700628412 0.0025698602 0.1115562096
##   [4,] 0.7618584079 0.1347358185 0.0401997490 0.0632060246
##   [5,] 0.8413657317 0.1401643010 0.0002238481 0.0182461192
##   [6,] 0.7492433317 0.1691728209 0.0342387200 0.0473451274
##   [7,] 0.7854819833 0.1057350645 0.0001519186 0.1086310337
##   [8,] 0.7139473551 0.1813051742 0.0003412000 0.1044062707
##   [9,] 0.8379845156 0.1547609654 0.0002223583 0.0070321607
##  [10,] 0.8553434832 0.1382986652 0.0001335933 0.0062242583
##  [11,] 0.7841478906 0.1620211409 0.0001178811 0.0537130874
##  [12,] 0.8309342111 0.0740733127 0.0010908212 0.0939016550
##  [13,] 0.8723684130 0.0386920889 0.0002013623 0.0887381358
##  [14,] 0.8190567556 0.0779187979 0.0075208512 0.0955035953
##  [15,] 0.7189983232 0.2500958388 0.0021719985 0.0287338396
##  [16,] 0.7005794073 0.2735407349 0.0050711026 0.0208087552
##  [17,] 0.7994857399 0.1651289462 0.0031729365 0.0322123774
##  [18,] 0.7623267653 0.1113664047 0.0758655596 0.0504412704
##  [19,] 0.8584630584 0.0964261208 0.0002287004 0.0448821203
##  [20,] 0.8831978138 0.0957680495 0.0032774507 0.0177566859
##  [21,] 0.7653328619 0.1252713788 0.0174954440 0.0919003153
##  [22,] 0.8336964998 0.1155690486 0.0263540800 0.0243803717
##  [23,] 0.8055299410 0.0042937088 0.0003417818 0.1898345684
##  [24,] 0.6961240631 0.2133686729 0.0069670053 0.0835402587
##  [25,] 0.7644676954 0.1417072125 0.0290215151 0.0648035770
##  [26,] 0.7735851612 0.1117684946 0.0913874715 0.0232588727
##  [27,] 0.6538970268 0.2392458570 0.0002504132 0.1066067030
##  [28,] 0.7914504739 0.1607491064 0.0016145856 0.0461858341
##  [29,] 0.7757521345 0.0999640701 0.0002213760 0.1240624194
##  [30,] 0.8728096707 0.1224033524 0.0002710691 0.0045159078
##  [31,] 0.7024289749 0.2451204444 0.0192697700 0.0331808107
##  [32,] 0.7391514210 0.2114515079 0.0042222831 0.0451747879
##  [33,] 0.8343951796 0.0427950890 0.0189219483 0.1038877832
##  [34,] 0.7797829364 0.1413625025 0.0072218387 0.0716327224
##  [35,] 0.8322955318 0.0977649375 0.0146791116 0.0552604190
##  [36,] 0.7793783335 0.1552451337 0.0003400424 0.0650364903
##  [37,] 0.9233153036 0.0017107399 0.0002258659 0.0747480906
##  [38,] 0.7483386003 0.1293795204 0.0003576358 0.1219242435
##  [39,] 0.7111106642 0.2319698006 0.0066458820 0.0502736533
##  [40,] 0.8500765525 0.0907239913 0.0002258659 0.0589735903
##  [41,] 0.7200448046 0.1145214703 0.0582188033 0.1072149218
##  [42,] 0.6793351558 0.1589764927 0.0673372213 0.0943511302
##  [43,] 0.8323315801 0.0974767054 0.0002488588 0.0699428557
##  [44,] 0.6878517691 0.1640912522 0.0012046642 0.1468523145
##  [45,] 0.8500140108 0.0128997960 0.0009374854 0.1361487078
##  [46,] 0.7673170051 0.1938354059 0.0004439738 0.0384036153
##  [47,] 0.7998156067 0.1665884780 0.0002359693 0.0333599461
##  [48,] 0.7552863527 0.1387030899 0.0520241614 0.0539863959
##  [49,] 0.7335096398 0.0003295982 0.0446423447 0.2215184173
##  [50,] 0.6542192592 0.0315712446 0.1673011476 0.1469083486
##  [51,] 0.6918737367 0.2812130865 0.0074034583 0.0195097185
##  [52,] 0.5773382241 0.2341973310 0.0186122094 0.1698522355
##  [53,] 0.7998876710 0.1416400999 0.0015779453 0.0568942837
##  [54,] 0.7579767906 0.1716805120 0.0023982975 0.0679443999
##  [55,] 0.7270254872 0.0987027430 0.0053603248 0.1689114450
##  [56,] 0.8348167716 0.0804850611 0.0008266020 0.0838715653
##  [57,] 0.8184876607 0.1395748839 0.0001541381 0.0417833173
##  [58,] 0.8339407096 0.0441666754 0.0333452355 0.0885473796
##  [59,] 0.9099489418 0.0534107405 0.0002404989 0.0363998188
##  [60,] 0.0160086718 0.9814855820 0.0009301629 0.0015755833
##  [61,] 0.0049179770 0.9900357404 0.0002857526 0.0047605299
##  [62,] 0.0108240553 0.9722149353 0.0143005741 0.0026604353
##  [63,] 0.0238503441 0.9606578377 0.0008772453 0.0146145729
##  [64,] 0.0327357807 0.9641995751 0.0023889313 0.0006757128
##  [65,] 0.0445712171 0.9489410300 0.0005120143 0.0059757386
##  [66,] 0.0147596142 0.9441230625 0.0003501396 0.0407671837
##  [67,] 0.0335277112 0.9523215427 0.0021415475 0.0120091986
##  [68,] 0.0088346437 0.9769786949 0.0030740904 0.0111125710
##  [69,] 0.0120844370 0.9832308968 0.0001245460 0.0045601202
##  [70,] 0.0060156090 0.9895554941 0.0003495291 0.0040793678
##  [71,] 0.0533778944 0.9441046780 0.0009344992 0.0015829284
##  [72,] 0.0008937325 0.9900895163 0.0063563159 0.0026604353
##  [73,] 0.0183638319 0.9775624306 0.0003214999 0.0037522376
##  [74,] 0.0007915482 0.9894636353 0.0091475494 0.0005972671
##  [75,] 0.0008403264 0.9944164995 0.0041091012 0.0006340729
##  [76,] 0.0007971570 0.9823031641 0.0056694610 0.0112302179
##  [77,] 0.0143171650 0.9822644538 0.0004449593 0.0029734220
##  [78,] 0.0006949745 0.9969266636 0.0003095834 0.0020687786
##  [79,] 0.0185550376 0.9427408836 0.0204241300 0.0182799488
##  [80,] 0.0398150891 0.8853857547 0.0024573949 0.0723417612
##  [81,] 0.0281149257 0.9712088019 0.0002510404 0.0004252320
##  [82,] 0.0325251287 0.9590127299 0.0003736342 0.0080885072
##  [83,] 0.0453865307 0.9409851890 0.0127929928 0.0008352875
##  [84,] 0.0093584101 0.9895059438 0.0004215654 0.0007140807
##  [85,] 0.0055241395 0.9932582754 0.0004519822 0.0007656030
##  [86,] 0.0046533012 0.9943210566 0.0003807306 0.0006449116
##  [87,] 0.0007014745 0.9392213468 0.0034301315 0.0566470471
##  [88,] 0.0459163011 0.9491729706 0.0002168264 0.0046939019
##  [89,] 0.0006896491 0.9892874760 0.0003072112 0.0097156637
##  [90,] 0.0010262103 0.9977423213 0.0004571358 0.0007743326
##  [91,] 0.0075036429 0.9847170477 0.0067393625 0.0010399469
##  [92,] 0.1203120468 0.8576150196 0.0004197997 0.0216531339
##  [93,] 0.0077899429 0.9904930623 0.0006373690 0.0010796258
##  [94,] 0.0010526152 0.9625971926 0.0308776640 0.0054725283
##  [95,] 0.0101844390 0.9549013858 0.0235136877 0.0114004875
##  [96,] 0.0074475981 0.9615235293 0.0120508079 0.0189780647
##  [97,] 0.0156937234 0.8790709498 0.0350927412 0.0701425856
##  [98,] 0.0272215194 0.9692235756 0.0030796813 0.0004752237
##  [99,] 0.0317824939 0.9672872711 0.0003453143 0.0005849207
## [100,] 0.0231264794 0.9698223610 0.0002161251 0.0068350345
## [101,] 0.0012929033 0.9812841236 0.0019620522 0.0154609210
## [102,] 0.0006711352 0.9895750570 0.0017903698 0.0079634380
## [103,] 0.0268664797 0.9630643509 0.0003086301 0.0097605392
## [104,] 0.0163899285 0.9816211484 0.0007383120 0.0012506111
## [105,] 0.0051918113 0.9729273348 0.0108357680 0.0110450859
## [106,] 0.0489244976 0.9037158881 0.0020643864 0.0452952279
## [107,] 0.0055313310 0.9888254592 0.0002491681 0.0053940416
## [108,] 0.0118791993 0.9874419779 0.0002519871 0.0004268356
## [109,] 0.0077093580 0.9732071769 0.0026825376 0.0164009275
## [110,] 0.0090525285 0.9873825880 0.0002813411 0.0032835425
## [111,] 0.0005385964 0.9988150795 0.0002399232 0.0004064009
## [112,] 0.0009026966 0.9839721610 0.0004021153 0.0147230271
## [113,] 0.0049675457 0.9913751775 0.0017285003 0.0019287765
## [114,] 0.0006535894 0.9985620930 0.0002911480 0.0004931695
## [115,] 0.0063434769 0.9700787422 0.0002857526 0.0232920282
## [116,] 0.0077788887 0.9797297802 0.0004519822 0.0120393490
## [117,] 0.0036477269 0.9949937879 0.0005042859 0.0008541992
## [118,] 0.0165739319 0.9540212071 0.0050759963 0.0243288647
## [119,] 0.0431259128 0.9560256943 0.0003149336 0.0005334593
## [120,] 0.0027750788 0.9808806659 0.0137805615 0.0025636938
## [121,] 0.0005297202 0.0347475071 0.1438484555 0.8208743172
## [122,] 0.0089381254 0.0006218245 0.0134377279 0.9770023222
## [123,] 0.0301049146 0.0049379016 0.0002813411 0.9646758427
## [124,] 0.0773492921 0.0016727214 0.0273664766 0.8936115100
## [125,] 0.0406605134 0.0007967552 0.0310533434 0.9274893879
## [126,] 0.0219847645 0.0509483225 0.0007889920 0.9262779211
## [127,] 0.0003924976 0.0004520647 0.0455299430 0.9536254948
## [128,] 0.0766711229 0.0321377881 0.0006949071 0.8904961819
## [129,] 0.0121896261 0.0018660840 0.1327894441 0.8531548457
## [130,] 0.0524629842 0.0308803891 0.0002170612 0.9164395654
## [131,] 0.0562022616 0.0323239149 0.1413636087 0.7701102147
## [132,] 0.0198317671 0.0199750891 0.0801674542 0.8800256897
## [133,] 0.0401547893 0.0600110055 0.0002829294 0.8995512758
## [134,] 0.2126967787 0.0244206779 0.0121116186 0.7507709248
## [135,] 0.0253685665 0.0228673985 0.0488591999 0.9029048351
## [136,] 0.0051308044 0.0010854175 0.0967532024 0.8970305756
## [137,] 0.2041916719 0.0349028975 0.0017313733 0.7591740573
## [138,] 0.0077426771 0.0053406403 0.0460467639 0.9408699186
## [139,] 0.0825886111 0.0563538080 0.0178497580 0.8432078229
## [140,] 0.0128356122 0.1582427491 0.0401992516 0.7887223872
## [141,] 0.0833086449 0.0670799964 0.0005484358 0.8490629229
## [142,] 0.0222087362 0.0129687148 0.0239724059 0.9408501431
## [143,] 0.0077592629 0.0183402677 0.0003495291 0.9735509402
## [144,] 0.0881310265 0.0015000968 0.0353116194 0.8750572573
## [145,] 0.1663656308 0.0927765927 0.0451398500 0.6957179265
## [146,] 0.0190777036 0.0397387088 0.0209994453 0.9201841422
## [147,] 0.0937038995 0.0282204085 0.0002628927 0.8778127994
## [148,] 0.0096918426 0.0834223936 0.8312450081 0.0756407558
## [149,] 0.0587936822 0.1333237886 0.7834430683 0.0244394610
## [150,] 0.0404134402 0.0135340836 0.9454412318 0.0006112443
## [151,] 0.0006823292 0.0462743161 0.8812632864 0.0717800683
## [152,] 0.0065622483 0.0102567257 0.9309966578 0.0521843682
## [153,] 0.0140059760 0.1009387858 0.8687379100 0.0163173281
## [154,] 0.0321009378 0.0116607790 0.8722699686 0.0839683146
## [155,] 0.0286496808 0.0006332201 0.8114767466 0.1592403525
## [156,] 0.0411473887 0.0218562883 0.8750091986 0.0619871243
## [157,] 0.0160878906 0.0298465017 0.9322178133 0.0218477944
## [158,] 0.0396095127 0.0208717182 0.9242673573 0.0152514117
## [159,] 0.0007493681 0.1723844528 0.7280506413 0.0988155378
## [160,] 0.0362708963 0.0634554661 0.8466764404 0.0535971972
## [161,] 0.0600163824 0.0037952662 0.8519054021 0.0842829493
## [162,] 0.0085977269 0.0525307092 0.8842390221 0.0546325418
## [163,] 0.0519260800 0.0263218070 0.7916744729 0.1300776401
## [164,] 0.0024390060 0.0164441606 0.9668416391 0.0142751943
## [165,] 0.0523451093 0.0653500516 0.8706407095 0.0116641295
## [166,] 0.0232896863 0.0004098794 0.8668985818 0.1094018525
## [167,] 0.0164217760 0.0378442203 0.9142793619 0.0314546418
## [168,] 0.0004991654 0.0016841729 0.9297756344 0.0680410273
## [169,] 0.0077724958 0.0412289550 0.9475144362 0.0034841130
## [170,] 0.0842016126 0.1914771227 0.7127029119 0.0116183528
## [171,] 0.0059827889 0.0061495600 0.9162218486 0.0716458026
## [172,] 0.0526334600 0.0860964289 0.8335704784 0.0276996327
## [173,] 0.0229795132 0.0013709115 0.8344862537 0.1411633216
## [174,] 0.0113986133 0.0015114817 0.8722680061 0.1148218990
## [175,] 0.0022359031 0.0007992124 0.9902732951 0.0066915894
## [176,] 0.0163112209 0.0213367562 0.9283021781 0.0340498448
## [177,] 0.0004019637 0.0004629674 0.9595287961 0.0396062727
## [178,] 0.0010865438 0.0251165884 0.8881177073 0.0856791604
## [179,] 0.0088053633 0.0116289410 0.9135738826 0.0659918131
## [180,] 0.0265063477 0.0172930065 0.8766904133 0.0795102325
## [181,] 0.0006526415 0.1167763972 0.8820785071 0.0004924543
## [182,] 0.0004675302 0.0109280053 0.9300703709 0.0585340936
## [183,] 0.0005047645 0.0140417041 0.9222577643 0.0631957671
## [184,] 0.0113278132 0.0688539013 0.8240581166 0.0957601689
## [185,] 0.0104560209 0.0189728257 0.9041304525 0.0664407009
## [186,] 0.0090908381 0.0004985972 0.8352030014 0.1552075633
## [187,] 0.0194327706 0.0006122750 0.8838666459 0.0960883085
## [188,] 0.0667696300 0.1311003634 0.7041189066 0.0980110999
## [189,] 0.0294489339 0.0614809021 0.8941628575 0.0149073064
## [190,] 0.0131397283 0.0016024091 0.8614177332 0.1238401294
## [191,] 0.0741837412 0.0187621652 0.7455853774 0.1614687162
## [192,] 0.1774260253 0.0944641811 0.6687138687 0.0593959249
## [193,] 0.1472068067 0.0043440910 0.7959766691 0.0524724332

And the probability of words in topics. (I’ll refrain from printing all those out.)

topic_words <- mallet.topic.words(topic_model, 
                                  smoothed = TRUE, normalized = TRUE)

There are some convenience functions in this package to get out, for instance the top words in each topic.

mallet.top.words(topic_model, topic_words[4,])
##       words     weights
## 1    people 0.006316649
## 2     black 0.005652338
## 3  martians 0.005453045
## 4      time 0.005021243
## 5     night 0.004589441
## 6      road 0.004489794
## 7   brother 0.003592975
## 8     smoke 0.003559759
## 9       red 0.003294035
## 10    water 0.003260819
mallet.top.words(topic_model, topic_words[3,])
##       words     weights
## 1   captain 0.016564669
## 2  nautilus 0.013314042
## 3      nemo 0.009730674
## 4       sea 0.009653887
## 5       ned 0.008246136
## 6   conseil 0.007222316
## 7      land 0.006838384
## 8     water 0.006556834
## 9       sir 0.004841936
## 10  surface 0.004099667
mallet.top.words(topic_model, topic_words[2,])
##        words     weights
## 1  elizabeth 0.013334585
## 2      darcy 0.008778957
## 3       miss 0.007330394
## 4     bennet 0.006784558
## 5    bingley 0.006427665
## 6       jane 0.006385678
## 7     sister 0.005566925
## 8       time 0.005168045
## 9       lady 0.004748171
## 10    family 0.004118361
mallet.top.words(topic_model, topic_words[1,])
##       words     weights
## 1       joe 0.014964616
## 2       pip 0.006842261
## 3    looked 0.006521378
## 4      miss 0.006521378
## 5  havisham 0.006380992
## 6      time 0.006340881
## 7   herbert 0.006280716
## 8       don 0.006080164
## 9      hand 0.005799391
## 10  wemmick 0.005699115

Wow, that looks perfect.