Open Context has an excellent API for grabbing the archaeological data it holds. For instance, it has Open Context 2618 site diaries - here’s one of them. Append .json on the end of a file name, and poof, lots of data. Here’s the json version of that same diary. So, I wanted all of those diaries - this URL (click & then note where the .json lives; delete the .json to see the regular html) has ’em all.
I copied and pasted that list of urls into a .txt file, and fed it to wget at the command line:
wget -i urlstograb.txt -O output.txt
which gave me a file some 27 mb in size. Now, it’s possible to feed json into R, but the problem was that that single file was not formatted as a single json file, which is what R expects. Rather, it was 2600 ish individual json files, all wrapped together. This makes things awkward. My solution - and there are no doubt more elegant ones - was to use regex to mark off every line that contained a diary, and then deleting everything else. Then I used regex again to strip out the html. Nearly every entry begins with ‘summary journal’ in it, so I used that as a marker again, inserting a pipe character | and then opening in Excel. Excel says, hey, what caracter are you using to delimit this? I say, the pipe! and Excel turns it into a comma separated file, with two columns, Id and Text. There were some 1200 diaries that came from (an)other excavation(s) where there was clearly a style guide for what went into them; I removed these from the present analysis.
The 1400 diaries then that I topic model in R I am also exploring with Textplot. Texplot eventually produces a network graph; modularity of that graph finds 13 very strong groups. Accordingly, I topic model for 13 topics.
I want to see how topics via topic models compare versus topics via Textplot.
setwd("/Users/shawngraham/Desktop/data mining and tools/open-context/")
Then we import our data:
#Rio makes this easy. I'm assuming you know how to install packages.
library("rio")
##importing opencontext site diaries
documents <- import("diaries-in-date-order-R.csv")
Then we set up Mallet in R. We need as much memory as we’ve got, so:
options(java.parameters = "-Xmx5120m")
library(rJava)
## from http://cran.r-project.org/web/packages/mallet/mallet.pdf
library(mallet)
Now we pass the diaries to Mallet. There’s a bit of iteration here; I ran the entire topic modeling code until I got a list of the words and their frequencies (which you’ll see further below); the most frequent words are frequent by orders of magnitude, so I insert them into my stopword list so that they don’t overwhelm the analysis.
mallet.instances <- mallet.import(as.character(documents$id), as.character(documents$text), "/Users/shawngraham/Desktop/data mining and tools/TextAnalysisWithR/data/stoplist2.csv", FALSE, token.regexp="\\p{L}[\\p{L}\\p{P}]+\\p{L}")
#set the number of desired topics
num.topics <- 13
topic.model <- MalletLDA(num.topics)
## Load our documents. We could also pass in the filename of a
## saved instance list file that we build from the command-line tools.
topic.model$loadDocuments(mallet.instances)
## Get the vocabulary, and some statistics about word frequencies.
## These may be useful in further curating the stopword list.
vocabulary <- topic.model$getVocabulary()
word.freqs <- mallet.word.freqs(topic.model)
head(word.freqs)
## words term.freq doc.freq
## 1 july 100 94
## 2 working 188 156
## 3 measured 55 44
## 4 side 970 405
## 5 hill 60 43
## 6 prickly 2 2
# write.csv(word.freqs, "oc-word-freqs.csv" ) <- study this file, use it to modify your stoplist file
So let’s see what we get:
## Optimize hyperparameters every 20 iterations,
## after 50 burn-in iterations.
topic.model$setAlphaOptimization(20, 50)
## Now train a model. Note that hyperparameter optimization is on, by default.
## We can specify the number of iterations. Here we'll use a large-ish round number.
topic.model$train(1000)
## NEW: run through a few iterations where we pick the best topic for each token,
## rather than sampling from the posterior distribution.
topic.model$maximize(10)
## Get the probability of topics in documents and the probability of words in topics.
## By default, these functions return raw word counts. Here we want probabilities,
## so we normalize, and add "smoothing" so that nothing has exactly 0 probability.
doc.topics <- mallet.doc.topics(topic.model, smoothed=T, normalized=T)
topic.words <- mallet.topic.words(topic.model, smoothed=T, normalized=T)
# save(doc.topics, file = "ocdoctopics.RData") <-we need this for doing self-organized maps. We'll worry about these later.
What are the top words in each topic?
mallet.top.words(topic.model, topic.words[1,])
## words weights
## 1 half 0.01709561
## 2 digging 0.01668474
## 3 pot 0.01331559
## 4 dug 0.01282254
## 5 pottery 0.01249384
## 6 day 0.01241167
## 7 dirt 0.01200080
## 8 section 0.01191862
## 9 began 0.01175427
## 10 find 0.01117905
mallet.top.words(topic.model, topic.words[2,])
## words weights
## 1 oven 0.11048688
## 2 mudbrick 0.03683250
## 3 feature 0.01494587
## 4 slag 0.01193155
## 5 large 0.01153838
## 6 underneath 0.01140732
## 7 july 0.01127626
## 8 work 0.01114520
## 9 digging 0.01088309
## 10 time 0.01009674
mallet.top.words(topic.model, topic.words[3,])
## words weights
## 1 excavated 0.01400361
## 2 section 0.01327059
## 3 excavation 0.01217106
## 4 sample 0.01217106
## 5 northern 0.01202446
## 6 structure 0.01143805
## 7 corner 0.01099824
## 8 season 0.01099824
## 9 material 0.01092493
## 10 ubaid 0.01085163
mallet.top.words(topic.model, topic.words[4,])
## words weights
## 1 brick 0.03488463
## 2 layer 0.03209051
## 3 mud 0.02977662
## 4 ash 0.02685153
## 5 north 0.02615300
## 6 south 0.02554178
## 7 bricks 0.02331522
## 8 west 0.02270400
## 9 east 0.02139426
## 10 line 0.01615528
mallet.top.words(topic.model, topic.words[5,])
## words weights
## 1 walls 0.02582379
## 2 room 0.01776622
## 3 began 0.01628625
## 4 collapsed 0.01529961
## 5 collapse 0.01464185
## 6 d/trench 0.01398409
## 7 house 0.01381965
## 8 furnace 0.01381965
## 9 discovered 0.01217524
## 10 june 0.01184636
mallet.top.words(topic.model, topic.words[6,])
## words weights
## 1 meters 0.027028264
## 2 work 0.016873764
## 3 meter 0.016529543
## 4 f/trench 0.013087340
## 5 east 0.011194128
## 6 trenches 0.010333577
## 7 tepe 0.010161466
## 8 began 0.009989356
## 9 located 0.009817246
## 10 time 0.009817246
mallet.top.words(topic.model, topic.words[7,])
## words weights
## 1 burial 0.08580118
## 2 bones 0.03547480
## 3 burials 0.02316090
## 4 human 0.01914550
## 5 bone 0.01727165
## 6 a/trench 0.01727165
## 7 skull 0.01365778
## 8 skeleton 0.01272086
## 9 turkey/kenan 0.01057931
## 10 tepe/area 0.01031162
mallet.top.words(topic.model, topic.words[8,])
## words weights
## 1 sounding 0.03687594
## 2 turkey/kenan 0.02269521
## 3 tepe/area 0.02269521
## 4 july 0.01881417
## 5 step 0.01836636
## 6 bottom 0.01791854
## 7 topsoil 0.01762000
## 8 mudbrick 0.01702292
## 9 excavated 0.01478386
## 10 started 0.01448531
mallet.top.words(topic.model, topic.words[9,])
## words weights
## 1 turkey/kenan 0.05480939
## 2 tepe/area 0.05480939
## 3 today 0.04440979
## 4 day 0.02681425
## 5 daily 0.02464561
## 6 july 0.02306842
## 7 tomorrow 0.01616821
## 8 floor 0.01385171
## 9 taking 0.01271810
## 10 work 0.01261953
mallet.top.words(topic.model, topic.words[10,])
## words weights
## 1 rocks 0.031244287
## 2 side 0.021873372
## 3 west 0.021092462
## 4 east 0.013283365
## 5 flat 0.012502456
## 6 compacted 0.012307228
## 7 balk 0.010940636
## 8 pottery 0.009964499
## 9 line 0.009574045
## 10 large 0.008988362
mallet.top.words(topic.model, topic.words[11,])
## words weights
## 1 rocks 0.03842486
## 2 part 0.03314275
## 3 side 0.02930779
## 4 south 0.02851186
## 5 rock 0.02778828
## 6 north 0.02706470
## 7 west 0.02597934
## 8 east 0.02402568
## 9 corner 0.02373625
## 10 pottery 0.01932243
mallet.top.words(topic.model, topic.words[12,])
## words weights
## 1 loci 0.02812991
## 2 small 0.02165249
## 3 removed 0.01972000
## 4 stones 0.01778751
## 5 large 0.01768015
## 6 season 0.01700020
## 7 excavated 0.01657076
## 8 stone 0.01549715
## 9 surfaces 0.01524664
## 10 part 0.01370781
mallet.top.words(topic.model, topic.words[13,])
## words weights
## 1 mudbrick 0.05053302
## 2 section 0.02497572
## 3 south 0.01910048
## 4 east 0.01821919
## 5 plaster 0.01689726
## 6 north 0.01528157
## 7 cut 0.01410652
## 8 side 0.01322524
## 9 west 0.01307836
## 10 top 0.01175643
Let’s see what the topic labels look like.
###from my other script; above was mimno's example script
topic.docs <- t(doc.topics)
topic.docs <- topic.docs / rowSums(topic.docs)
write.csv(topic.docs, "oc-topics-docs.csv" ) ## "C:\\Malletopic-docs.csv"
## Get a vector containing short names for the topics
topics.labels <- rep("", num.topics)
for (topic in 1:num.topics) topics.labels[topic] <- paste(mallet.top.words(topic.model, topic.words[topic,], num.top.words=5)$words, collapse=" ")
# have a look at keywords for each topic
topics.labels
## [1] "half digging pot dug pottery"
## [2] "oven mudbrick feature slag large"
## [3] "excavated section excavation sample northern"
## [4] "brick layer mud ash north"
## [5] "walls room began collapsed collapse"
## [6] "meters work meter f/trench east"
## [7] "burial bones burials human bone"
## [8] "sounding turkey/kenan tepe/area july step"
## [9] "turkey/kenan tepe/area today day daily"
## [10] "rocks side west east flat"
## [11] "rocks part side south rock"
## [12] "loci small removed stones large"
## [13] "mudbrick section south east plaster"
And if we turn them into word clouds, we get a sense of the relative importance of words that appear in different topics.
## Loading required package: RColorBrewer
## NULL
## NULL
## Warning in wordcloud(topic.top.words$words, topic.top.words$weights, c(4,
## : southern could not be fit on page. It will not be plotted.
## Warning in wordcloud(topic.top.words$words, topic.top.words$weights, c(4,
## : sounding could not be fit on page. It will not be plotted.
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
## NULL
##generate clusters of diaries that are similar in their distribution of topics
topic_docs <- data.frame(topic.docs)
names(topic_docs) <- documents$id
library(cluster)
topic_df_dist <- as.matrix(daisy(t(topic_docs), metric = "euclidean", stand = TRUE))
# Change row values to zero if less than row minimum plus row standard deviation
# keep only closely related documents and avoid a dense spagetti diagram
# that's difficult to interpret (hat-tip: http://stackoverflow.com/a/16047196/1036500)
topic_df_dist[ sweep(topic_df_dist, 1, (apply(topic_df_dist,1,min) + apply(topic_df_dist,1,sd) )) > 0 ] <- 0
#' Use kmeans to identify groups of similar diaries
km <- kmeans(topic_df_dist, num.topics)
# get names for each cluster
allnames <- vector("list", length = num.topics)
for(i in 1:num.topics){
allnames[[i]] <- names(km$cluster[km$cluster == i])
}
allnames
## [[1]]
## [1] "142" "146" "172" "197" "211" "222" "262" "326" "332" "343"
## [11] "345" "354" "376" "379" "410" "413" "494" "512" "515" "521"
## [21] "528" "535" "565" "577" "606" "614" "622" "624" "645" "653"
## [31] "656" "664" "697" "736" "752" "753" "765" "775" "778" "815"
## [41] "831" "850" "866" "878" "898" "909" "911" "964" "971" "988"
## [51] "990" "1000" "1002" "1003" "1004" "1094" "1096" "1098" "1101" "1102"
## [61] "1109" "1110" "1136" "1147" "1159" "1165" "1171" "1174" "1175" "1188"
## [71] "1190" "1211" "1218" "1224" "1226" "1235" "1246" "1257" "1271" "1272"
## [81] "1280" "1285" "1286" "1288" "1290" "1294" "1307" "1324" "1334" "1336"
## [91] "1344" "1347" "1352" "1355" "1360" "1361" "1364" "1373" "1374" "1377"
## [101] "1393" "1395" "1416"
##
## [[2]]
## [1] "37" "86" "104" "132" "151" "153" "164" "169" "174" "175"
## [11] "195" "199" "215" "248" "253" "263" "282" "296" "299" "301"
## [21] "312" "324" "335" "342" "353" "362" "365" "370" "380" "396"
## [31] "406" "414" "417" "423" "427" "441" "443" "444" "453" "457"
## [41] "465" "472" "478" "493" "498" "522" "530" "541" "568" "574"
## [51] "578" "588" "589" "593" "594" "599" "607" "612" "626" "642"
## [61] "655" "661" "679" "688" "694" "712" "735" "741" "744" "747"
## [71] "755" "768" "770" "788" "793" "803" "806" "824" "830" "835"
## [81] "839" "857" "876" "900" "919" "935" "979" "986" "999" "1006"
## [91] "1015" "1019" "1021" "1031" "1033" "1037" "1039" "1040" "1043" "1049"
## [101] "1057" "1062" "1065" "1067" "1068" "1073" "1077" "1078" "1083" "1090"
## [111] "1100" "1104" "1108" "1126" "1127" "1134" "1141" "1146" "1157" "1158"
## [121] "1161" "1164" "1169" "1184" "1189" "1198" "1201" "1205" "1206" "1208"
## [131] "1217" "1223" "1227" "1228" "1229" "1239" "1241" "1243" "1245" "1252"
## [141] "1256" "1261" "1264" "1265" "1266" "1292" "1293" "1302" "1303" "1304"
## [151] "1326" "1343" "1351" "1365" "1367" "1370" "1380" "1424" "1446"
##
## [[3]]
## [1] "307" "310" "356" "430" "431" "450" "482" "518" "544" "613"
## [11] "623" "628" "675" "743" "822" "840" "903" "913" "914" "1095"
## [21] "1129" "1153" "1166" "1179" "1183" "1193" "1199" "1204"
##
## [[4]]
## [1] "189" "191" "196" "209" "212" "213" "217" "219" "220" "227"
## [11] "228" "245" "252" "255" "267" "268" "280" "284" "288" "294"
## [21] "297" "298" "309" "313" "314" "322" "327" "330" "341" "347"
## [31] "361" "364" "372" "384" "412" "436" "440" "452" "454" "455"
## [41] "459" "460" "461" "468" "480" "495" "500" "517" "526" "539"
## [51] "546" "555" "558" "666" "678" "693" "708" "715" "729" "758"
## [61] "780" "786" "814" "841" "853" "867" "870" "894" "902" "907"
## [71] "929" "931" "939" "947" "1142" "1150" "1269"
##
## [[5]]
## [1] "4" "139" "225" "232" "233" "240" "241" "247" "257" "258"
## [11] "264" "265" "266" "285" "289" "300" "302" "311" "323" "328"
## [21] "334" "340" "357" "360" "374" "382" "385" "393" "397" "408"
## [31] "409" "418" "420" "421" "437" "439" "442" "456" "458" "470"
## [41] "471" "481" "485" "490" "527" "548" "560" "561" "562" "566"
## [51] "597" "598" "639" "652" "670" "671" "676" "685" "710" "711"
## [61] "724" "727" "738" "742" "761" "762" "764" "769" "790" "794"
## [71] "802" "804" "817" "819" "820" "833" "836" "846" "847" "877"
## [81] "882" "886" "895" "908" "918" "920" "928" "933" "936" "943"
## [91] "948" "951" "953" "956" "1059" "1151" "1222" "1240" "1244" "1306"
##
## [[6]]
## [1] "31" "149" "162" "243" "256" "279" "398" "492" "529" "531"
## [11] "683" "725" "756" "797" "809" "856" "869" "885" "1119" "1220"
## [21] "1238" "1267" "1282" "1295" "1350"
##
## [[7]]
## [1] "3" "5" "7" "13" "16" "20" "26" "27" "29" "30"
## [11] "32" "38" "44" "45" "46" "48" "51" "56" "58" "59"
## [21] "63" "64" "66" "72" "73" "75" "79" "80" "82" "84"
## [31] "87" "93" "96" "100" "101" "105" "108" "109" "112" "114"
## [41] "116" "117" "118" "119" "120" "122" "123" "126" "127" "129"
## [51] "154" "168" "170" "178" "204" "238" "246" "251" "259" "273"
## [61] "290" "317" "320" "339" "348" "349" "368" "371" "373" "401"
## [71] "415" "416" "434" "447" "449" "469" "483" "487" "497" "508"
## [81] "536" "554" "701" "732" "798" "881" "916" "925" "926" "955"
## [91] "1058" "1388"
##
## [[8]]
## [1] "629" "633" "641" "681" "689" "740" "746" "771" "808" "823"
## [11] "858" "890" "944" "967" "974" "1116" "1120" "1123" "1155" "1191"
## [21] "1196" "1202" "1207" "1210" "1225" "1259" "1268" "1277" "1287" "1296"
## [31] "1305" "1311" "1314" "1319" "1325" "1329" "1339" "1341" "1342" "1349"
## [41] "1354" "1358" "1359" "1362" "1371" "1389" "1390" "1396" "1398" "1401"
## [51] "1404" "1408" "1411" "1412" "1413" "1414" "1417" "1419" "1420" "1422"
## [61] "1423" "1425" "1426" "1428" "1433" "1434" "1438" "1441" "1442" "1443"
## [71] "1452" "1453" "1455" "1458" "1460"
##
## [[9]]
## [1] "8" "33" "68" "78" "107" "130" "136" "141" "143" "144"
## [11] "145" "150" "152" "157" "159" "163" "166" "171" "173" "179"
## [21] "181" "182" "188" "190" "202" "226" "260" "270" "276" "306"
## [31] "319" "321" "344" "346" "358" "381" "404" "422" "451" "467"
## [41] "477" "501" "511" "524" "540" "571" "592" "602" "608" "615"
## [51] "616" "617" "621" "631" "638" "649" "650" "657" "677" "682"
## [61] "690" "692" "699" "722" "739" "757" "759" "760" "791" "792"
## [71] "799" "805" "828" "852" "865" "884" "891" "906" "921" "952"
## [81] "959" "994" "1036" "1097" "1125" "1152" "1173" "1186" "1253" "1323"
## [91] "1429"
##
## [[10]]
## [1] "183" "559" "567" "573" "580" "585" "603" "604" "620" "635"
## [11] "654" "658" "668" "674" "704" "719" "733" "750" "776" "777"
## [21] "787" "796" "812" "825" "837" "887" "912" "915" "977" "978"
## [31] "987" "992" "993" "995" "996" "998" "1001" "1005" "1007" "1008"
## [41] "1009" "1010" "1011" "1014" "1023" "1026" "1029" "1034" "1035" "1038"
## [51] "1041" "1046" "1047" "1050" "1054" "1066" "1069" "1072" "1075" "1080"
## [61] "1081" "1084" "1086" "1087" "1091" "1106" "1113" "1122" "1130" "1131"
## [71] "1138" "1139" "1140" "1144" "1154" "1160" "1172" "1176" "1185" "1195"
## [81] "1213" "1214" "1249" "1260" "1270" "1276" "1291" "1299" "1317" "1353"
## [91] "1366" "1369" "1375" "1376" "1378" "1379" "1381" "1384" "1385" "1386"
##
## [[11]]
## [1] "121" "140" "147" "158" "160" "176" "192" "201" "221" "283"
## [11] "287" "331" "363" "366" "367" "389" "432" "433" "466" "479"
## [21] "484" "486" "496" "510" "513" "525" "542" "550" "556" "576"
## [31] "582" "583" "619" "627" "630" "637" "646" "651" "663" "709"
## [41] "723" "728" "737" "754" "785" "800" "810" "813" "818" "826"
## [51] "860" "872" "875" "982" "983" "984" "1022" "1051" "1085" "1099"
## [61] "1103" "1111" "1112" "1115" "1118" "1121" "1124" "1128" "1132" "1135"
## [71] "1143" "1145" "1149" "1162" "1167" "1177" "1181" "1203" "1209" "1212"
## [81] "1215" "1231" "1236" "1237" "1247" "1250" "1251" "1262" "1263" "1274"
## [91] "1281" "1300" "1301" "1313" "1322" "1331" "1338" "1340" "1348" "1356"
## [101] "1372" "1391" "1427" "1431"
##
## [[12]]
## [1] "2" "6" "9" "10" "11" "12" "14" "15" "17" "18"
## [11] "19" "21" "22" "23" "24" "25" "28" "34" "35" "36"
## [21] "39" "40" "41" "42" "43" "47" "49" "50" "52" "53"
## [31] "54" "55" "57" "60" "61" "62" "65" "67" "69" "70"
## [41] "71" "74" "76" "77" "81" "83" "85" "88" "89" "90"
## [51] "91" "92" "94" "95" "97" "98" "99" "102" "103" "106"
## [61] "110" "111" "113" "115" "124" "125" "128" "131" "133" "134"
## [71] "135" "137" "138" "148" "155" "156" "161" "165" "167" "177"
## [81] "180" "184" "185" "186" "187" "193" "194" "198" "200" "203"
## [91] "205" "206" "207" "208" "210" "214" "216" "218" "223" "224"
## [101] "229" "230" "231" "234" "235" "236" "237" "239" "242" "244"
## [111] "249" "250" "254" "261" "269" "271" "272" "274" "275" "277"
## [121] "278" "281" "286" "291" "292" "293" "295" "303" "304" "305"
## [131] "308" "315" "316" "318" "325" "329" "333" "336" "337" "338"
## [141] "350" "351" "352" "355" "359" "369" "375" "378" "383" "386"
## [151] "387" "388" "390" "391" "392" "394" "395" "399" "400" "402"
## [161] "403" "405" "407" "411" "419" "424" "425" "426" "428" "429"
## [171] "435" "438" "445" "446" "448" "462" "464" "473" "475" "476"
## [181] "488" "489" "491" "499" "502" "503" "504" "505" "506" "509"
## [191] "514" "516" "519" "520" "523" "532" "533" "534" "537" "543"
## [201] "545" "547" "549" "551" "552" "553" "557" "563" "564" "569"
## [211] "570" "572" "575" "579" "581" "584" "586" "587" "590" "591"
## [221] "595" "596" "600" "601" "605" "609" "610" "611" "618" "625"
## [231] "632" "634" "636" "643" "644" "648" "659" "660" "665" "667"
## [241] "669" "672" "673" "684" "686" "687" "691" "695" "696" "698"
## [251] "700" "703" "705" "706" "713" "714" "716" "717" "718" "720"
## [261] "721" "726" "730" "734" "745" "748" "749" "751" "763" "766"
## [271] "767" "772" "773" "774" "779" "781" "782" "783" "789" "795"
## [281] "801" "807" "811" "816" "821" "827" "829" "832" "834" "838"
## [291] "843" "844" "845" "848" "849" "851" "854" "855" "859" "861"
## [301] "862" "864" "868" "871" "874" "879" "880" "883" "888" "892"
## [311] "893" "896" "897" "899" "901" "904" "905" "910" "917" "922"
## [321] "923" "924" "927" "930" "932" "934" "937" "938" "940" "941"
## [331] "942" "945" "946" "949" "950" "954" "957" "958" "960" "961"
## [341] "962" "963" "965" "966" "968" "969" "970" "972" "973" "975"
## [351] "976" "980" "981" "985" "989" "991" "997" "1012" "1013" "1016"
## [361] "1017" "1018" "1020" "1024" "1025" "1027" "1028" "1030" "1032" "1042"
## [371] "1044" "1045" "1048" "1052" "1053" "1055" "1056" "1060" "1061" "1063"
## [381] "1064" "1070" "1071" "1074" "1076" "1079" "1082" "1088" "1089" "1092"
## [391] "1093" "1105" "1107" "1117" "1137" "1148" "1156" "1163" "1168" "1170"
## [401] "1178" "1180" "1182" "1187" "1194" "1197" "1216" "1221" "1230" "1232"
## [411] "1233" "1242" "1254" "1255" "1258" "1273" "1275" "1278" "1279" "1283"
## [421] "1284" "1297" "1298" "1308" "1309" "1312" "1315" "1316" "1318" "1320"
## [431] "1321" "1327" "1328" "1330" "1332" "1333" "1335" "1337" "1346" "1357"
## [441] "1363" "1382" "1383" "1387" "1392" "1394" "1397" "1399" "1400" "1402"
## [451] "1403" "1405" "1406" "1407" "1409" "1410" "1415" "1418" "1421" "1430"
## [461] "1432" "1435" "1436" "1437" "1439" "1440" "1444" "1445" "1447" "1448"
## [471] "1449" "1450" "1451" "1454" "1456" "1457" "1459"
##
## [[13]]
## [1] "377" "463" "474" "507" "538" "640" "647" "662" "680" "702"
## [11] "707" "731" "784" "842" "863" "873" "889" "1114" "1133" "1192"
## [21] "1200" "1219" "1234" "1248" "1289" "1310" "1345" "1368"
library(igraph)
g <- as.undirected(graph.adjacency(topic_df_dist))
layout1 <- layout.fruchterman.reingold(g, niter=500)
plot(g, layout=layout1, edge.curved = TRUE, vertex.size = 1, vertex.color= "grey", edge.arrow.size = 0, vertex.label.dist=0.5, vertex.label = NA)
write.graph(g, file="oc2.graphml", format="graphml")
For the Textplot output, please see my repo here. There will also be an interactive version of the network above, in that repository (along with a key for retrieving the original diaries.)