2/7/2022

What this App does

Under the hood

The function which runs under the hood transforms dialogues selected by users into a TermDocumentMatrix, deletes stopwords (i.e., most frequent ‘function’ words such as “and”, “or”, etc.), and builds a cloud. Here is the code for it.

make.wordcloud <- function(x) {
  my_text <- my_corpus[[x]]
  text.sw <- my_text %in% sw
  text.clean <- my_text[!text.sw]
  doc <- Corpus(VectorSource(text.clean))
  dtm <- TermDocumentMatrix(doc) 
  matrix <- as.matrix(dtm)
  words <- sort(rowSums(matrix),decreasing=TRUE) 
  df <- data.frame(word = names(words),freq=words)
  wordcloud2(data=df, size=1, color='random-dark', shape = "circle")
}

And here an example of output

Why frequencies?

Word frequencies can be a powerful tool not only for authorship attribution, but also for topic modelling. And wordclouds vizualize them is a very informative way. We could further study frequencies themselves: thus, the interrogative “who/what” is strongly correlated to the average number of replies per page (see next slide).

Frequency of “who/what” ~ replies

library(ggplot2)
ggplot(df, aes(x=ratio, y=τίς, color=type)) +
  geom_point()