Latent Dirichlet allocation (LDA) is a particularly popular method for fitting a topic model. It treats each document as a mixture of topics, and each topic as a mixture of words. This allows documents to “overlap” each other in terms of content, rather than being separated into discrete groups, in a way that mirrors typical use of natural language:
LDA extracted 20 different topics:
library(tidyverse)
library(tidytext)
lda_tweets <- read_rds(here::here("data", "lda_tweets.rds"))
topic_tweets <- tidy(lda_tweets)
tw_top_terms <- topic_tweets %>%
group_by(topic) %>%
top_n(20, beta) %>%
ungroup() %>%
arrange(topic, -beta)
tw_top_terms %>%
mutate(
topic = paste0("Topic ", topic),
term = reorder_within(term, beta, topic),
) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = F) +
facet_wrap(~topic, scales = "free", ncol = 3) +
coord_flip() +
scale_x_reordered()
Besides estimating each topic as a mixture of words, LDA also models each document as a mixture of topics. We can examine the per-document-per-topic probabilities called gamma. The following chart shows the occurrence of each topic by gender. That is, topic 4 is present as the main option in 171 users classified as “f” by gender extractor.
topic_tweets <- tidy(lda_tweets, matrix = "gamma")
gender_output <- read_tsv(here::here(
"data",
"gender_extractor",
"neda_liwc_gender_output.tsv"
),
col_names = c("id", "name", "processed_name", "gender")
) %>%
select(name, gender)
topic_tweets %>%
left_join(gender_output, by = c("document" = "name")) %>%
# filter(document == "(Name)") %>%
# arrange(desc(gamma))
# count(topic, gender, sort = T) %>%
group_by(document) %>%
top_n(1, wt = gamma) %>%
ungroup() %>%
count(gender, topic, sort = T) -> f
f %>%
mutate(topic = fct_reorder(as_factor(topic), n)) %>%
arrange(desc(n)) %>%
ggplot(aes(x = topic, y = n, fill = gender)) +
geom_col(show.legend = T, position = "dodge2") +
scale_x_reordered() +
coord_flip()
The following table shows the same information as the previous chart:
tw_top_terms %>%
group_by(topic) %>%
summarise(text = paste(term, collapse = " ")) %>%
rename("Most common words in topics" = text) -> tw_top_terms_b
f %>%
pivot_wider(id_cols = topic, names_from = gender, values_from = n) %>%
left_join(tw_top_terms_b) %>%
select(topic, `Most common words in topics`, u, f, m) %>%
mutate(topic = paste0("Topic ", topic)) %>%
group_by(topic) %>%
gt::gt()
| Most common words in topics | u | f | m |
|---|---|---|---|
| Topic 10 | |||
| people love day time life fucking yall fuck feel shit stop women girl ur literally friends happy gonna person lol | 91 | 193 | 37 |
| Topic 6 | |||
| health mental people eating day week time support love disorders #mentalhealth care learn women awareness join disorder check body march | 185 | 136 | 43 |
| Topic 1 | |||
| love day time people happy feel life week book amazing fuck words read @thesarahfader word shit twitter march omg gonna | 31 | 65 | 13 |
| Topic 19 | |||
| puyo day post time love life people game speaker bluetooth black happy photography video @emtfr portable week free @nhlflyers follow | 48 | 60 | 25 |
| Topic 5 | |||
| school people students love daily day time @tazbat99 week support join youth happy kids share check night #saludtues life health | 51 | 29 | 19 |
| Topic 20 | |||
| people day news disability disabled love google check time join women students online week happy @thespybrief daily support world march | 45 | 37 | 16 |
| Topic 9 | |||
| @financialbuzz news watch de @alfamart cse breaking buy love free @youtube time day di @marcuslemonis @smoclerk1 check online otc reviews | 31 | 15 | 37 |
| Topic 2 | |||
| love day people time god @gospelflava morning #icymi @morematters life happy @floss84 week feel #elivetweets hope due @spann night @mollysdailykiss | 19 | 33 | 11 |
| Topic 17 | |||
| trump people president women white black time @realdonaldtrump day trumps @aoc house national children news woman emergency love life american | 26 | 33 | 17 |
| Topic 7 | |||
| people love day time @samsungmobileus life women happy disabled feel world white black trans @specialolympipa stop week support lot hope | 17 | 31 | 8 |
| Topic 4 | |||
| digital detox love people day time happy video @gpbgeorge women @tonyakay week #oscars black world life book birthday @tonyakayfan10 @youtube | 15 | 28 | 8 |
| Topic 8 | |||
| @jamesmaslow love day time people follow happy @brimoniq story week ig louis life world wait night amazing @officialwith1d2 morning @teenvogue | 26 | 27 | 10 |
| Topic 14 | |||
| love day time people live @gpbgeorge happy watch @fcbarcelona @fitetv @proudxtianmaga hope week world game birthday night life women tonight | 9 | 25 | 17 |
| Topic 3 | |||
| fala coutinho momento por bastidores em foco #fitness #training da daily os diego tati martins dos famosos famosidade política schueng | 23 | 9 | 11 |
| Topic 11 | |||
| god @davepperlmutter love day time life people happy @donnasiggers1 @braedenlemaster book jesus amen read week @bradwallactor beautiful world friends #wrongplacewrongtime | 17 | 21 | 13 |
| Topic 18 | |||
| de la el en los es con por psychology se lo tweet las post para info click audience #retweet #fiverr | 9 | 18 | 11 |
| Topic 16 | |||
| @swampmusicinfo music love swamp players @laurarjacobs scared #carista pinned customization #car @davelackie #soundcloud laura ur song jacobs bro da day | 14 | 16 | 11 |
| Topic 15 | |||
| de #nyc #newyork @nycdailypics anos da day em eu pahealthdept não love se la há happy #ligadoamusica para top city | 13 | 14 | 7 |
| Topic 13 | |||
| de la le di il pour les en je des che une sur pas vous ce cest du qui si | 11 | 9 | 6 |
| Topic 12 | |||
| nie na się jak jest ale pst że @blakeshelton march #weather #lax february #alert issued nws tak ja #la ze | 10 | 6 | 1 |