PS 5708 Week 14

Text Mining on Election News

Text mining on news data \(\textit{was}\) a growth industry in the past, way until the emergence of generative AI tools. Yet the rich semantic as well as technical foundation behind text mining still earn them a unique place in the development of natural language processing (NLP). Among the many types of textual data that have been analyzed using text mining during its heyday, election news has consistently been one of the most popular topics.

For today’s lecture, we will demonstrate the use of structural topic model (STM)-in combination with other off-the-shelf text processing techniques-to analyze a collection of 245 election news articles (scrapped from 6 major Taiwanese news media using a user(me)-written Python scrapper) published in the month of December 2023, one month prior to Taiwan’s 2024 Presidential election.

But before we turn to those fancy NLP stuffs, we will do a quick example of Chinese text processing, just to give you a sense of how this thing works in the context of parsing Chinese characters that are connnected together, unlike Latin alphabet. We compare the use of \(\textsf{jiebaR}\) and the super-handy \(\textsf{quanteda}\) for Chinese text processing.

# install.packages(c("jiebaRD", "jieba", "stm", "tmcn", "tm", "dplyr", "tidyverse", "tidytext", "stringr"))

library(jiebaRD) # For parsing Chinese text
library(jiebaR)
library(stm) 
library(tmcn)
library(tm)
library(dplyr)
library(tidyverse)
library(tidytext)
library(stringr)


# Load Taiwan 2024 Presidential election-related news data (scrapped from 6 major news websites)
df <- read_csv(url("https://www.dropbox.com/scl/fi/6sn10ldskosxigj80q8mo/taiwan_election.csv?rlkey=q0gj98hasg1wbx4dy2xwkv26g&st=wordnk1f&dl=1"), locale = locale(encoding = "UTF-8"))

## Rows: 245 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): source, text, url
## dbl  (1): docid
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dim(df)

## [1] 245   5

# 245 news, 5 columns

# A quick example. Examine the first document (first piece of news)
text <- df[1, c("text")]

# Remove all types of brackets using gsub() with a regular expression
cleaned_text1 <- gsub(("[()\\[\\]<>]|\\d+"), "", text)
cleaned_text2 <- gsub("[\uFF00-\uFFFF   ]", "", cleaned_text1) # 去除中文常見的全形符號

# Remove empty space
library(stringi)
cleaned_text3 <- stri_remove_empty(cleaned_text2)
# Collapse into a single character string
cleaned_text4 <- paste(cleaned_text3, collapse = "")

## We first go with jiebaR
# Initialize a jieba parser (seg) using worker()
seg <- jiebaR::worker()
class(seg)

## [1] "jiebar"  "segment" "jieba"

# Include user-defined tokens to better seg relevant terms
new_user_word(seg, c("民眾黨","被提名人", "黃瀞瑩", "柯文哲"))

## [1] TRUE

tokens_jb <- jiebaR::segment(cleaned_text4, jiebar = seg)  # Tokenize the texts into words using the function segment() with the designated jiebar object "cutter" created earlier 

# Remove Chinese stopwords
library(stopwords)
stopwords_getlanguages("marimo")  # Locate the stopwords source for zh_tw

## [1] "en"    "de"    "ru"    "ar"    "he"    "zh_tw" "zh_cn" "ko"    "ja"

head(stopwords::stopwords("zh_tw", source = "marimo"), 20) # Out of a total of 394 stopwords

##  [1] "我"       "把我"     "對我"     "我自己"   "我們"     "把我們"  
##  [7] "對我們"   "我們自己" "你"       "把你"     "對你"     "你自己"  
## [13] "你們"     "把你們"   "對你們"   "你們自己" "您"       "把您"    
## [19] "對您"     "您自己"

tw_stopwords <- stopwords::stopwords("zh_tw", source = "marimo")
jb_filtered <- filter_segment(tokens_jb, tw_stopwords)

# Concatenate into a list object
segmented_docs <- list(jb_filtered) 
corpus_jb <- Corpus(VectorSource(segmented_docs)) # Make the tokens into corpus
# Create the DTM 
dtm_jb <- DocumentTermMatrix(corpus_jb)
# Inspect the resulting DTM
inspect(dtm_jb) # For the first document (in fact, the only one document), list term-frequency distribution

## <<DocumentTermMatrix (documents: 1, terms: 90)>>
## Non-/sparse entries: 90/0
## Sparsity           : 0%
## Maximal term length: 7
## Weighting          : term frequency (tf)
## Sample             :
##     Terms
## Docs "一周", "上周", "中選會", "各", "相關", "要", "候選人", "強調", "選舉",
##    1       1       1         1     2       2     2         4       2       3
##     Terms
## Docs "總統",
##    1       3

# The first row is a list of terms, and the numbers in the second row correspond to their frequencies in the corpus. 

## Now switch to quanteda 
# use quanteda's built-in tokenizer and stopwords functions instead
library(quanteda)
# Chinese stopwords
ch_stop <- quanteda::stopwords("zh", source = "misc")
# read text files
corp <- corpus(text)
# tokenize
ch_toks <- corp %>% 
  tokens(remove_punct = TRUE) %>%
  tokens_remove(pattern = ch_stop)

# construct a DFM (Document-term-frequency-matrix)
dfm_qt <- dfm(ch_toks)
topfeatures(dfm_qt)

## 候選人   總統   選舉   強調   相關     選   圖片   故事     離   明年 
##      4      3      3      2      2      2      1      1      1      1

# Comparing jiebaR and quanteda-segged document-term-frequency matrix

# Get term frequencies (total count across all docs)
term_freq_jb <- colSums(as.matrix(dtm_jb))
term_freq_qt <- colSums(as.matrix(dfm_qt))

# Sort the two methods' term frequency distributions in descending order and list term side-by-side for the ease of comparison
jb_freq_sorted <- sort(term_freq_jb, decreasing = TRUE)
qt_freq_sorted <- sort(term_freq_qt, decreasing = TRUE)

data.frame(Rank = 1:10, jiebaR = head(jb_freq_sorted, 10), quanteda = head(qt_freq_sorted, 10))

##           Rank jiebaR quanteda
## "候選人",    1      4        4
## "總統",      2      3        3
## "選舉",      3      3        3
## "各",        4      2        2
## "強調",      5      2        2
## "相關",      6      2        2
## "要",        7      2        1
## "一周",      8      1        1
## "上周",      9      1        1
## "中選會",   10      1        1

# Term frequency distributions sorted by these two methods largely concur with each other, at least for the first document

Alright, enough warm-up. We will now play for real. Let’s take on all 245 news articles and see what little gem we can mine from this data set.

STM Estimate of Topic Prevalence of Taiwan’s 2024 Presidential election cycle: December 2023

Source: CNA

options(encoding = "UTF-8")
Sys.setlocale("LC_ALL", "zh_TW.UTF-8") # macOS/Linux Chinese encoding

## [1] "zh_TW.UTF-8/zh_TW.UTF-8/zh_TW.UTF-8/C/zh_TW.UTF-8/zh_TW.UTF-8"

# Sys.setlocale("LC_ALL", "Chinese") # For Windows users
# install.packages(c("dplyr", "quanteda", "stm", "lubridate", "ggplot2", "readr", "geomtextpath","stringr", "tidyr"))
library(dplyr)
library(quanteda)
library(stm)
library(lubridate)
library(ggplot2)
library(readr)
library(geomtextpath)
library(stringr)
library(tidyr)
library(stopwords)

# What's inside this data set

df <- df %>%
  mutate(date = ymd(date), # Convert "date" to ymd (year-month-day) format
         week = floor_date(date, "week")) %>%  # Add another column "week" into the data set
  filter(!is.na(text), text != "") # Ensure no empty text
cat(nrow(df), "news articles\n")

## 245 news articles

cat(ncol(df), "variables\n")

## 6 variables

# Inspect the data
dim(df)

## [1] 245   6

names(df)

## [1] "docid"  "source" "date"   "text"   "url"    "week"

str(df)

## tibble [245 × 6] (S3: tbl_df/tbl/data.frame)
##  $ docid : num [1:245] 1 2 3 4 5 6 7 8 9 10 ...
##  $ source: chr [1:245] "udn.com" "chinatimes.com" "cna.com.tw" "chinatimes.com" ...
##  $ date  : Date[1:245], format: "2023-12-18" "2023-12-22" ...
##  $ text  : chr [1:245] "圖片故事：離明年的總統、立委選舉只剩廿多天，上周起，國民黨的候選人強調要先攻南台灣， 民進黨總統候選人也嚴陣以待"| __truncated__ "2024總統大選最後倒數，今日（12月22日）有3家民調機構發布總統大選民調，分別為TVBS、美麗島電子報、ETtoday民調雲。 "| __truncated__ "2023/12/20 — 民眾黨柯文哲、民進黨賴清德、國民黨侯友宜參與首場總統候選人電視政見發表會，三輪政見發表正面交鋒； "| __truncated__ "中選會11日舉行總統候選人號次抽籤，柯文哲抽到1號，蕭美琴抽到2號，李利貞抽到3號。 文章報導抽籤過程、候選人受訪內"| __truncated__ ...
##  $ url   : chr [1:245] "https://udncollege.udn.com/24063/" "https://www.chinatimes.com/realtimenews/20231222004104-260407" "https://www.cna.com.tw/news/aipl/202312200023.aspx" "https://www.chinatimes.com/realtimenews/20231211000940-260407" ...
##  $ week  : Date[1:245], format: "2023-12-17" "2023-12-17" ...

# Display the first 6 documents (news)
head(df)

## # A tibble: 6 × 6
##   docid source         date       text                          url   week      
##   <dbl> <chr>          <date>     <chr>                         <chr> <date>    
## 1     1 udn.com        2023-12-18 圖片故事：離明年的總統、立委選舉只剩廿多天，上周起，國民… http… 2023-12-17
## 2     2 chinatimes.com 2023-12-22 2024總統大選最後倒數，今日（12月22日）有3家民調… http… 2023-12-17
## 3     3 cna.com.tw     2023-12-20 2023/12/20 — 民眾黨柯文哲、民進黨賴清德、國… http… 2023-12-17
## 4     4 chinatimes.com 2023-12-11 中選會11日舉行總統候選人號次抽籤，柯文哲抽到1號，蕭美… http… 2023-12-10
## 5     5 tvbs.com.tw    2023-12-12 美麗島電子報今日公布最新總統大選民調，民進黨賴蕭配35.… http… 2023-12-10
## 6     6 tvbs.com.tw    2023-12-29 總統辯論12/30登場，媒體與學者分析指出，辯論若要改變… http… 2023-12-24

This is a typical data set for text mining. The first column “docid” is document ID. The second (“source”), the third (“date”), the fifth (“url”), and the sixth (“week”, which we just generated) are document-level \(\textbf{metadata}\), which is a bracket term for a set of data that describes and gives information about other data, such as the “text” data stored in the fourth column that need to be processed. One good thing about \(\textsf{stm}\) is that it is capable of leveraging metadata of text documents to derive additional inference from text data, more than just classifying text into topics or producing wordcloud (if that’s what you sign up for). For example, we can use “source” to indicate house effect of partisan media, and use “date” and “week” information for unveiling the change in topic proportion over time.

We now use the more efficient tricks offered by \(\textsf{quanteda}\) to expedite text processing of Chinese text, but this comes with its own hazard, as we will see shortly.

# Customize postitive and negative terms in order to seg them in text (so we can use them as positive or negative terms for counting the net sentiment rating of a piece of news)
pos <- c("領先","穩定","團結","勝選","支持","信任","清廉","和平","繁榮","守護",
         "民主","正確","感動","信心","贏","勝","優勢","回流","肯定","沉穩",
         "熱烈","歡呼","高票","過半","當選","成功","掌聲","感動","信心","優秀")
neg <- c("落後","貪腐","違建","爭議","賴皮寮","賣台","親中","戰爭","危機","失言",
         "下滑","崩盤","棄保","流失","負面","批評","痛批","質疑","攻擊","失敗",
         "違法","黑金","貪污","造假","抹黑","恐懼","擔憂","失誤","防守","插話",
         "貪汙","造假","抹黑","恐嚇","恐懼","恐慌")

# Build dictionary
sent_dict <- dictionary(list(positive = pos, negative = neg))

# Text preprocessing: use quanteda to tokenize data and remove stopwords
corp <- corpus(df, text_field = "text") # Convert the text column into corpus
toks <- tokens(corp, remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(c(stopwords::stopwords("zh_tw", source = "marimo"),"記者","表示","報導","指出","稱","今天")) %>%
  tokens_compound(pattern = phrase(c("賴皮寮","萬里老家","黃金交叉","棄保效應",
                                     "政見發表會","電視辯論","選前之夜","下架民進黨",
                                     "抗中保台","潛艦國造","長照3.0","0-6歲國家養",
                                     "侯友宜","賴清德","柯文哲","蕭美琴","趙少康","吳欣盈")),
                  concatenator = "_")  # The compound function was down since February 2025, so I don't expect this line of code would work (though it used to work smoothly in the past) 

# Convert tokenized input into document-term-frequency-matrix (dfm)
dfm_clean <- dfm(toks) %>% dfm_trim(min_termfreq = 4, min_docfreq = 3)
cat("Vocab：", ncol(dfm_clean), "terms\n")

## Vocab： 327 terms

# Compute net sentiment score for each document
sent_dfm <- dfm_lookup(dfm_clean, sent_dict) # Generate two columns: positive / negative
sent_mat <- as.matrix(sent_dfm) # Convert to matrix
positive <- sent_mat[, "positive"] # Retrieve positive terms
negative <- sent_mat[, "negative"] # Retrieve negative terms
positive[is.na(positive)] <- 0 # Fill NA with 0
negative[is.na(negative)] <- 0
df$positive <- positive
df$negative <- negative
df$net_sentiment <- positive - negative
df$net_sentiment <- ifelse(df$net_sentiment > 0, 1, 0) # For simplicity sake, I used dichotomzied measure, you can do continuous one if you would like

# Examine the first few documents and their sentiment scores
print(head(df[, c("docid","source","positive","negative","net_sentiment")]))

## # A tibble: 6 × 5
##   docid source         positive negative net_sentiment
##   <dbl> <chr>             <dbl>    <dbl>         <dbl>
## 1     1 udn.com               0        0             0
## 2     2 chinatimes.com        1        0             1
## 3     3 cna.com.tw            0        0             0
## 4     4 chinatimes.com        0        0             0
## 5     5 tvbs.com.tw           1        0             1
## 6     6 tvbs.com.tw           0        0             0

# Inspect sentiment distribution, it seems well-balanced
table(df$net_sentiment)

## 
##   0   1 
## 145 100

Now we need to concatenate the dfm object into \(\textsf{stm}\)-readable files. A total of three files are needed: \(\textsf{documents}\), \(\textsf{vocab}\), \(\textsf{meta}\) (document-level metadata).

# Convert to dfm and append document-level meta data into the dfm
out <- convert(dfm_clean, to = "stm",
               docvars = df[,c("source","date","week","net_sentiment")])
docs <- out$documents
vocab <- out$vocab
meta_stm <- out$meta

# Recode media source
meta_stm$source <- ifelse(meta_stm$source == "chinatimes.com", "中時", ifelse(meta_stm$source == "cna.com.tw", "中央社",
                                               ifelse(meta_stm$source == "ltn.com.tw", "自由", ifelse(meta_stm$source == "setn.com", "三立",ifelse(meta_stm$source == "tvbs.com.tw", "TVBS", ifelse(meta_stm$source == "udn.com", "聯合", meta_stm$source))))))

# Count date number, beginning in December 1st
meta_stm$date_num <- as.numeric(meta_stm$date - min(meta_stm$date)) + 1

Now the thorny question. Exactly how many topics (aka., clusters or groups of a collection of documents with similar semantic structure) are appropriate to classify the collection of texts in this data set? We can’t just make up a number (right?) We should resort to a somewhat more scientific criterion for determining this number. To this end, we turn to perplexity analysis, which measures how well a model under given number of topics predicts unseen documents. A lower perplexity score indicates a better fit of the model to the data, as it means the model is less “surprised” by the new documents it encounters. Graphically, the relationship between the log-likelihood of mis-classified documents and the number of latent topics should look like an “elbow.” The optimal number of topics is found at the “elbow” point on the curve, where the rate of decrease in log-likelihood sharply declines, indicating that adding more topics provides diminishing returns to sorting the documents (to corresponding topics). We can utilize the \(\textsf{stm}\) package’s handy \(\textsf{searchk()}\) function or write our own for loop for this task.

K_candidates <- c(15, 20, 25) # Specify the range of K
results <- data.frame()
for(K in K_candidates){
  set.seed(2025L)
  mod_temp <- stm(docs, vocab, K = K,
                  prevalence = ~ source + s(as.numeric(date)) + net_sentiment,
                  data = meta_stm,
                  max.em.its = 200,
                  init.type = "Spectral",
                  seed = 2025L,
                  verbose = FALSE)
  
  semcoh <- mean(semanticCoherence(mod_temp, docs))  # Semantic coherence 
  exclus <- mean(exclusivity(mod_temp))   # Exclusivity
  
  results <- rbind(results, data.frame(K = K, semcoh = semcoh, exclus = exclus))
  cat("K =", K, "semcoh =", round(semcoh, 4),
      "，exclus =", round(exclus, 4), "\n")
}

## K = 15 semcoh = -74.716 ，exclus = 9.5239 
## K = 20 semcoh = -75.217 ，exclus = 9.5732 
## K = 25 semcoh = -77.2986 ，exclus = 9.6261

print(results)

##    K    semcoh   exclus
## 1 15 -74.71602 9.523924
## 2 20 -75.21704 9.573190
## 3 25 -77.29860 9.626123

K_best <- results$K[which.max(results$semcoh)]  # Select the best K based on max semcoh & if exlus > 4
cat("Optimal K =", K_best, "\n")

## Optimal K = 15

With this optmized \(\textit{K}\) (number of topics), we set out to estimate a 15-topic structural topic model.

set.seed(2025L) # Set random seed
stm_model <- stm(documents = docs, vocab = vocab, K = K_best,
                 prevalence = ~ source + s(date_num) + net_sentiment,
                 data = meta_stm, 
                 max.em.its = 300, 
                 seed = 2025, verbose = TRUE)

## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ...............
##   Recovering initialization...
##      ...
## Initialization complete.
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -4.672) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -4.262, relative change = 8.784e-02) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -4.191, relative change = 1.647e-02) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -4.163, relative change = 6.890e-03) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -4.144, relative change = 4.406e-03) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 情, 選舉, 學者 
##  Topic 5: 德, 清, 賴, 青年, 侯 
##  Topic 6: 康, 侯, 趙, 當選, 配 
##  Topic 7: 萬人, 造勢, 場, 湧進, 友 
##  Topic 8: 自由, 時報, 賴, 清, 德 
##  Topic 9: 民進黨, 架, 選戰, 賴, 票 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 抽 
##  Topic 12: 場, 候選人, 政見發表會, 中央社, 總統 
##  Topic 13: 民調, 顯示, 度, 賴, 支持 
##  Topic 14: 哲, 文, 柯, 立, 新聞 
##  Topic 15: 長, 照, 賴, 自由, 時報 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -4.131, relative change = 3.129e-03) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -4.123, relative change = 2.082e-03) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -4.112, relative change = 2.629e-03) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -4.104, relative change = 1.837e-03) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -4.100, relative change = 9.349e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 情, 大 
##  Topic 5: 柯, 哲, 文, 清, 德 
##  Topic 6: 康, 侯, 趙, 當選, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 自由, 時報, 賴, 清, 德 
##  Topic 9: 民進黨, 侯, 康, 選戰, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 抽 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 顯示, 支持, 度, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 賴, 自由, 時報 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -4.097, relative change = 8.301e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -4.095, relative change = 5.930e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -4.093, relative change = 5.129e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -4.091, relative change = 3.232e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -4.090, relative change = 2.710e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 情, 大 
##  Topic 5: 柯, 哲, 文, 清, 德 
##  Topic 6: 康, 侯, 當選, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 自由, 時報, 清, 德 
##  Topic 9: 民進黨, 康, 侯, 選戰, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 顯示, 支持, 度, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 賴, 版, 頭 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -4.089, relative change = 2.452e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -4.088, relative change = 2.590e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -4.087, relative change = 2.337e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -4.086, relative change = 1.756e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -4.086, relative change = 1.774e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 大, 情 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 侯, 當選, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 康, 侯, 選戰, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 顯示, 度, 最新 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 賴, 版, 頭 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -4.085, relative change = 1.741e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -4.084, relative change = 1.990e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -4.083, relative change = 2.169e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -4.082, relative change = 1.925e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -4.082, relative change = 2.026e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 大, 情 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 康, 侯, 選戰, 賴 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 最新 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -4.081, relative change = 1.032e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -4.081, relative change = 1.400e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -4.080, relative change = 1.479e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -4.079, relative change = 1.397e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -4.079, relative change = 1.278e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 大, 情 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 康, 侯, 選戰, 賴 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -4.078, relative change = 1.174e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -4.078, relative change = 1.086e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -4.078, relative change = 9.935e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -4.077, relative change = 8.644e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -4.077, relative change = 7.958e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 大, 情 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 選戰 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -4.077, relative change = 7.946e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 37 (approx. per word bound = -4.076, relative change = 8.143e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 38 (approx. per word bound = -4.076, relative change = 8.308e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 39 (approx. per word bound = -4.076, relative change = 8.769e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 40 (approx. per word bound = -4.075, relative change = 1.038e-04) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 投票, 大, 情 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 41 (approx. per word bound = -4.074, relative change = 1.667e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 42 (approx. per word bound = -4.074, relative change = 1.501e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 43 (approx. per word bound = -4.073, relative change = 1.166e-04) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 44 (approx. per word bound = -4.073, relative change = 9.314e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 45 (approx. per word bound = -4.073, relative change = 8.448e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 萬人, 造勢, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 46 (approx. per word bound = -4.072, relative change = 7.736e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 47 (approx. per word bound = -4.072, relative change = 7.474e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 48 (approx. per word bound = -4.072, relative change = 7.150e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 49 (approx. per word bound = -4.071, relative change = 7.048e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 50 (approx. per word bound = -4.071, relative change = 6.974e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 51 (approx. per word bound = -4.071, relative change = 7.655e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 52 (approx. per word bound = -4.071, relative change = 8.002e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 53 (approx. per word bound = -4.070, relative change = 6.995e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 54 (approx. per word bound = -4.070, relative change = 6.068e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 55 (approx. per word bound = -4.070, relative change = 6.566e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 56 (approx. per word bound = -4.069, relative change = 6.612e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 57 (approx. per word bound = -4.069, relative change = 4.051e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 58 (approx. per word bound = -4.069, relative change = 4.284e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 59 (approx. per word bound = -4.069, relative change = 4.548e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 60 (approx. per word bound = -4.069, relative change = 4.656e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 61 (approx. per word bound = -4.069, relative change = 4.442e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 62 (approx. per word bound = -4.068, relative change = 3.794e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 63 (approx. per word bound = -4.068, relative change = 3.576e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 64 (approx. per word bound = -4.068, relative change = 3.422e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 65 (approx. per word bound = -4.068, relative change = 3.087e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 哲, 文, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 66 (approx. per word bound = -4.068, relative change = 2.646e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 67 (approx. per word bound = -4.068, relative change = 2.386e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 68 (approx. per word bound = -4.068, relative change = 2.265e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 69 (approx. per word bound = -4.068, relative change = 1.882e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 70 (approx. per word bound = -4.068, relative change = 1.757e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 文, 哲, 賴, 清 
##  Topic 6: 康, 當選, 侯, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 71 (approx. per word bound = -4.068, relative change = 1.575e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 72 (approx. per word bound = -4.067, relative change = 1.599e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 73 (approx. per word bound = -4.067, relative change = 1.718e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 74 (approx. per word bound = -4.067, relative change = 2.000e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 75 (approx. per word bound = -4.067, relative change = 2.952e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 文, 哲, 賴, 清 
##  Topic 6: 康, 侯, 當選, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 76 (approx. per word bound = -4.067, relative change = 6.327e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 77 (approx. per word bound = -4.067, relative change = 5.662e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 78 (approx. per word bound = -4.067, relative change = 3.214e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 79 (approx. per word bound = -4.066, relative change = 1.627e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 80 (approx. per word bound = -4.066, relative change = 1.409e-05) 
## Topic 1: 選舉, 人, 第, 萬, 中選會 
##  Topic 2: 柯, 文, 哲, 公布, 新聞 
##  Topic 3: 配, 侯, 民調, 領先, 康 
##  Topic 4: 選, 總統, 大, 情, 投票 
##  Topic 5: 柯, 文, 哲, 賴, 清 
##  Topic 6: 康, 侯, 當選, 趙, 反 
##  Topic 7: 造勢, 萬人, 場, 湧進, 美 
##  Topic 8: 賴, 清, 德, 自由, 時報 
##  Topic 9: 民進黨, 侯, 康, 賴, 架 
##  Topic 10: 友, 宜, 侯, 清, 德 
##  Topic 11: 號, 候選人, 次, 抽籤, 中選會 
##  Topic 12: 場, 政見發表會, 候選人, 中央社, 總統 
##  Topic 13: 民調, 支持, 度, 顯示, 賴 
##  Topic 14: 立, 哲, 文, 柯, 新聞 
##  Topic 15: 長, 照, 版, 頭, 賴 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 81 (approx. per word bound = -4.066, relative change = 1.694e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 82 (approx. per word bound = -4.066, relative change = 1.217e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 83 (approx. per word bound = -4.066, relative change = 1.307e-05) 
## ..........................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Model Converged

# The output reveals the problem with tokens_compound(), as most n-gram terms we customized and incorporated earlier, particularly name entities (such as "柯文哲", "侯友宜", etc.), were not properly segmented from the corpus. I only hope that the developers will fix it in the future.

# Set topic format
colnames(stm_model$theta) <- paste0("Topic", 1:K_best) # Will display Topic1, Topic2, Topic3...
# Force stm output to dataframe (instead of tibble class)
meta_stm <- as.data.frame(meta_stm)

Now the fun stuff, plotting! Let’s see what these topics are all about by plotting their term distributions on the basis of criterion FREX, which stands for high frequency and exclusivity (Bischof and Airoldi 2012; Roberts et al. 2014).

plot.STM(stm_model, type = "labels") # Display high frequency terms of each topic

plot(stm_model, labeltype = "frex", n = 5) # labeltype sets the metric used for evaluating the words displayed on the plot, here we display topic proportion by the criterion "frex" (high frequency & exclusivity). Then for each topic, display the top 5 terms.

One commonly asked question is how do I know what topic is about which matter? How do researchers name those topics? Yeah, fxxk that, I don’t know that either.

The machine will not automatically conjure up topic names for you. Instead, you have to name those topics yourself. It’s recommended that you first plot out the result, get a sense of how words are distributed across topics, infer what those topics are, label them in the topic.names argument, and then assign those names to corresponding topics using \(\textsf{topic.names}\) or \(\textsf{custom.labels}\) function.

# Let's reverse-engineer to set topic names manually
topic_names <- c('總統參選人','免費醫改: ','民調差距: ', '學者分析選情: ', '青年志工助選: ', '趙少康: ',
                 '凱道造勢: ', '國防自主: ', '動員催票: ', '侯友宜支持度上升: ', '號次抽籤: ', '政見發表會: ', '美麗島民調: ', '改變選戰策略: ', '自由時報＋長照: ')

plot(stm_model, type = "summary", labeltype = "frex", 
     n = 3, # Display only 3 terms as topic names may eat up lots of space
     topic.names = topic_names,
     main = "台灣2024年總統大選熱門議題")

Ta-da!

# You may also leverage the commonly-used wordcloud to show important (high frequency) terms
# install.packages("wordcloud") # You will need wordcloud package
library(wordcloud)
# Let's take a look at what topic 8 (國防自主) is all about?
cloud(stm_model, topic = 8)

Borrowing graph function from the \(\textsf{igraph}\) package (often used in rendering node-edge structures), we may also get a glimpse of inter-topic relationship. Let’s do that.

# install.packages("igraph")
library(igraph) # You will need to require igraph package to plot node-edge like network structure

## 
## 載入套件：'igraph'

## 下列物件被遮斷自 'package:lubridate':
## 
##     %--%, union

## 下列物件被遮斷自 'package:purrr':
## 
##     compose, simplify

## 下列物件被遮斷自 'package:tidyr':
## 
##     crossing

## 下列物件被遮斷自 'package:tibble':
## 
##     as_data_frame

## 下列物件被遮斷自 'package:dplyr':
## 
##     as_data_frame, groups, union

## 下列物件被遮斷自 'package:stats':
## 
##     decompose, spectrum

## 下列物件被遮斷自 'package:base':
## 
##     union

set.seed(2025)
mod.out.corr <- topicCorr(stm_model)
plot(mod.out.corr)

Based on the clustering pattern, it would seem that topic 3 (民調差距), 10 (侯友宜支持度上升), and 13 (美麗島民調) are highly correlated, which makes sense. However, the correlation between topic 2 (免費醫改) and topic 14 (改變選戰策略) is less clear, and the correlation between topic 7 (凱道造勢) and topic 8 (國防自主) is even murkier.

A logical follow-up question to ask is does topic proportion change with time? For instance, some topics may have received greater coverage as voting day approaches, while others may be relegated to the backseat by major media. To answer this question, we first estimate the effect of \(\textsf{date_sum}\) on topic distribution:

# Changes in topic proportion as a function of time (date_num)
eff_time <- estimateEffect(1:15 ~ s(date_num), stm_model, metadata = meta_stm)

We then select four topics “凱道造勢,” “侯友宜支持度上升,” “青年志工助選,” “民調差距” for graphical illustration under the expectation that topic “凱道造勢” should have received more coverage as voting day approached, with news covering “侯友宜支持度上升” inversely related to news highlighting “民調差距” amidst a low coverage of “青年志工助選.” This is conveniently done via \(\textsf{stm}\)’s built-in \(\textsf{plot()}\) function that plot the estimated topic proportion \(\textsf{eff_time}\) as a function a \(\textsf{date_num}\).

plot(eff_time, covariate = "date_num", method = "continuous",
     topics = c(7, 10, 5, 2), model = stm_model,   # For topic '凱道造勢', '侯友宜支持度上升', '青年志工助選', '民調差距'
     ci.level = FALSE, # Suppress confidence intervals
     xlab = "December 2023", ylab = "Topic proportion",
     main = "選戰倒數一個月四大熱門議題消長")

Here we see that 民調差距 wanes as the volume of news mentioning 侯友宜支持度上升 declines throughout the month of December, 2023. In addition, news related to 青年志工助選 was never a salient topic, while the volume of news pertaining to 凱道造勢 increases as voting day emerged on the horizon.

Finally, we investigate the issues of “sentiment” with empirical focus on

term distribution across sentiments within a topic
the relationship between topics and sentiment
the reporting patterns of partisan media

We first estimate topic content as a function of sentiment:

news.content <- stm(docs, vocab, K = 15, prevalence =~ source + s(date_num) + net_sentiment, content =~ net_sentiment, max.em.its = 300, data = meta_stm, init.type = "Spectral", verbose = FALSE)

The \(\textsf{stm}\)’s perspective plot is a useful tool for mapping term distribution by issue position (usually some variables measured on the binary scale, pointing to the two extremes, such as (‘bad’, ‘good’), (‘oppose’, ‘support’), etc.) Let’s see how news agencies in general reported ‘免費醫療改革’ (topic 2)

plot(news.content, type = "perspectives", n = 50, topics = 2, plabels = c('Negative', 'Positive'), main = "Topic 2: 免費醫改")   # plabels argument sets the labels displayed on the 'left' and the 'right' end of the horizontal axis

Strangely, this topic is primarily owned by 柯文哲 (click here for an introduction of “issue ownership” theory), though I don’t recall Mr. Ko had said much about this issue.

What about ‘國防自主’ (topic 8)?

plot(news.content, type = "perspectives", n = 50, topics = 8, plabels = c('Negative', 'Positive'), main = "Topic 8: 國防自主")   # plabels argument sets the labels displayed on the 'left' and the 'right' end of the horizontal axis

Wow, this topic is mainly covered by 自由時報 (which took a slightly positive position), but 賴清德 is uniquely associated with negative sentiment toward this issue, at least for the data analyzed here.

So how did the six major media (which we drew our news from), in general, portray these 15 topics? We can obtain this information by first estimating the effects of news media on topic distribution along the sentiment scale:

prep <- estimateEffect(1:15 ~ source + s(date_num) + net_sentiment, stm_model, meta = meta_stm, uncertainty = "Global")

And then feed the estimated object \(\textsf{prep}\) to the plotting function for visualization.

plot(prep, covariate = "net_sentiment", topics = 1:15,  model = stm_model, 
     method = "difference", cov.value1 = 1, cov.value2 = 0, 
     main = "Topic distribution along sentiment scale", xlab = "Negative ←                               → Positive",
     xlim = c(-.3, .2), labeltype = "custom", custom.labels = topic_names)

It seems that all six major media generally reported electoral affairs in a positive light, but portrayed candidates and policy platforms negatively within this period!

By the way, you can also obtain the regular \(\textsf{R}\) regression output from the object generated by \(\textsf{estimateEffect()}\).

summary(prep)

## 
## Call:
## estimateEffect(formula = 1:15 ~ source + s(date_num) + net_sentiment, 
##     stmobj = stm_model, metadata = meta_stm, uncertainty = "Global")
## 
## 
## Topic 1:
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.18985    0.07305   2.599  0.00996 ** 
## source中央社   0.16423    0.03536   4.645 5.75e-06 ***
## source中時     0.02285    0.03188   0.717  0.47418    
## source自由    -0.01065    0.03314  -0.321  0.74823    
## source聯合    -0.03616    0.05956  -0.607  0.54438    
## sourceTVBS     0.01165    0.03203   0.364  0.71648    
## s(date_num)1  -0.08106    0.16674  -0.486  0.62732    
## s(date_num)2   0.04505    0.13263   0.340  0.73439    
## s(date_num)3  -0.30217    0.09704  -3.114  0.00208 ** 
## s(date_num)4  -0.12579    0.09658  -1.302  0.19409    
## s(date_num)5  -0.23440    0.09237  -2.538  0.01183 *  
## s(date_num)6  -0.15209    0.09735  -1.562  0.11960    
## s(date_num)7  -0.23877    0.09537  -2.504  0.01300 *  
## s(date_num)8  -0.21089    0.10736  -1.964  0.05071 .  
## s(date_num)9  -0.20187    0.10687  -1.889  0.06016 .  
## s(date_num)10 -0.19961    0.08861  -2.253  0.02523 *  
## net_sentiment -0.02599    0.02182  -1.191  0.23503    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 2:
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.28343    0.09288   3.052  0.00255 ** 
## source中央社  -0.29010    0.05099  -5.690 3.89e-08 ***
## source中時    -0.30673    0.04737  -6.475 5.75e-10 ***
## source自由    -0.29324    0.05526  -5.306 2.65e-07 ***
## source聯合    -0.29875    0.09035  -3.306  0.00110 ** 
## sourceTVBS    -0.28776    0.05055  -5.692 3.83e-08 ***
## s(date_num)1   0.04221    0.19172   0.220  0.82594    
## s(date_num)2   0.06494    0.15664   0.415  0.67882    
## s(date_num)3   0.08336    0.12507   0.667  0.50573    
## s(date_num)4   0.03864    0.11458   0.337  0.73626    
## s(date_num)5   0.15276    0.13856   1.102  0.27144    
## s(date_num)6   0.03458    0.13067   0.265  0.79154    
## s(date_num)7   0.04128    0.12983   0.318  0.75081    
## s(date_num)8   0.10739    0.15225   0.705  0.48132    
## s(date_num)9  -0.11067    0.13952  -0.793  0.42848    
## s(date_num)10 -0.02316    0.11218  -0.206  0.83660    
## net_sentiment -0.01147    0.03352  -0.342  0.73261    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 3:
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.02096    0.09868  -0.212 0.831966    
## source中央社   0.04433    0.04320   1.026 0.305921    
## source中時     0.15948    0.04671   3.415 0.000756 ***
## source自由     0.02265    0.04512   0.502 0.616158    
## source聯合     0.02808    0.07987   0.351 0.725540    
## sourceTVBS     0.14831    0.04720   3.142 0.001898 ** 
## s(date_num)1   0.03586    0.21295   0.168 0.866423    
## s(date_num)2  -0.06089    0.14816  -0.411 0.681462    
## s(date_num)3  -0.13258    0.13116  -1.011 0.313169    
## s(date_num)4  -0.02088    0.11846  -0.176 0.860263    
## s(date_num)5  -0.01764    0.13206  -0.134 0.893829    
## s(date_num)6  -0.09455    0.13248  -0.714 0.476139    
## s(date_num)7   0.01512    0.14183   0.107 0.915184    
## s(date_num)8   0.17136    0.17021   1.007 0.315121    
## s(date_num)9  -0.12291    0.14784  -0.831 0.406644    
## s(date_num)10  0.34962    0.13339   2.621 0.009354 ** 
## net_sentiment  0.11857    0.03088   3.840 0.000159 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 4:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.013450   0.085874  -0.157    0.876    
## source中央社   0.057274   0.039262   1.459    0.146    
## source中時     0.058006   0.039040   1.486    0.139    
## source自由    -0.014187   0.038781  -0.366    0.715    
## source聯合     0.661529   0.091459   7.233 7.11e-12 ***
## sourceTVBS     0.059170   0.041272   1.434    0.153    
## s(date_num)1  -0.048615   0.175854  -0.276    0.782    
## s(date_num)2  -0.033433   0.135699  -0.246    0.806    
## s(date_num)3   0.173515   0.123106   1.409    0.160    
## s(date_num)4   0.124234   0.109224   1.137    0.257    
## s(date_num)5   0.009947   0.115835   0.086    0.932    
## s(date_num)6   0.111752   0.111047   1.006    0.315    
## s(date_num)7  -0.096234   0.113286  -0.849    0.397    
## s(date_num)8   0.085697   0.124380   0.689    0.492    
## s(date_num)9  -0.019061   0.132819  -0.144    0.886    
## s(date_num)10  0.058720   0.112414   0.522    0.602    
## net_sentiment -0.037398   0.027022  -1.384    0.168    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 5:
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.1537665  0.0813107   1.891 0.059879 .  
## source中央社   0.0023085  0.0260421   0.089 0.929443    
## source中時    -0.0008475  0.0247425  -0.034 0.972706    
## source自由     0.1067518  0.0315992   3.378 0.000858 ***
## source聯合    -0.0279881  0.0508100  -0.551 0.582284    
## sourceTVBS     0.0152849  0.0275178   0.555 0.579128    
## s(date_num)1  -0.1584368  0.1463487  -1.083 0.280131    
## s(date_num)2   0.0548475  0.1076818   0.509 0.611001    
## s(date_num)3  -0.1813000  0.1013475  -1.789 0.074959 .  
## s(date_num)4  -0.1263669  0.0873112  -1.447 0.149182    
## s(date_num)5  -0.1784443  0.0909672  -1.962 0.051022 .  
## s(date_num)6  -0.1048603  0.1005805  -1.043 0.298260    
## s(date_num)7  -0.1706630  0.1000366  -1.706 0.089369 .  
## s(date_num)8  -0.1654288  0.1026842  -1.611 0.108554    
## s(date_num)9  -0.1496191  0.1069422  -1.399 0.163152    
## s(date_num)10 -0.1404743  0.0898554  -1.563 0.119360    
## net_sentiment -0.0357063  0.0177996  -2.006 0.046036 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 6:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   -0.026336   0.056926  -0.463  0.64406   
## source中央社   0.045590   0.029022   1.571  0.11760   
## source中時     0.036414   0.028305   1.286  0.19959   
## source自由    -0.002356   0.028860  -0.082  0.93501   
## source聯合     0.054955   0.052186   1.053  0.29343   
## sourceTVBS     0.075933   0.031089   2.442  0.01535 * 
## s(date_num)1  -0.092860   0.125134  -0.742  0.45880   
## s(date_num)2   0.100512   0.107517   0.935  0.35085   
## s(date_num)3  -0.042437   0.080371  -0.528  0.59801   
## s(date_num)4  -0.035259   0.070865  -0.498  0.61927   
## s(date_num)5  -0.022605   0.078708  -0.287  0.77422   
## s(date_num)6   0.095728   0.083212   1.150  0.25118   
## s(date_num)7  -0.143445   0.086786  -1.653  0.09973 . 
## s(date_num)8   0.315747   0.099779   3.164  0.00177 **
## s(date_num)9  -0.141673   0.090258  -1.570  0.11788   
## s(date_num)10 -0.050522   0.072250  -0.699  0.48510   
## net_sentiment  0.055728   0.020090   2.774  0.00600 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 7:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    0.066636   0.087833   0.759  0.44884   
## source中央社  -0.030809   0.044244  -0.696  0.48693   
## source中時     0.005032   0.043800   0.115  0.90864   
## source自由     0.060299   0.053459   1.128  0.26053   
## source聯合    -0.128078   0.077077  -1.662  0.09795 . 
## sourceTVBS     0.014742   0.048901   0.301  0.76333   
## s(date_num)1  -0.015692   0.179762  -0.087  0.93052   
## s(date_num)2   0.115676   0.159013   0.727  0.46769   
## s(date_num)3  -0.150602   0.115076  -1.309  0.19195   
## s(date_num)4   0.176718   0.108941   1.622  0.10615   
## s(date_num)5   0.042209   0.118227   0.357  0.72141   
## s(date_num)6  -0.118636   0.113218  -1.048  0.29582   
## s(date_num)7   0.374471   0.136989   2.734  0.00676 **
## s(date_num)8  -0.274565   0.144778  -1.896  0.05916 . 
## s(date_num)9   0.087834   0.131553   0.668  0.50502   
## s(date_num)10  0.104009   0.123463   0.842  0.40043   
## net_sentiment -0.071527   0.028968  -2.469  0.01428 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 8:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.089117   0.093438   0.954    0.341    
## source中央社  -0.050696   0.038149  -1.329    0.185    
## source中時    -0.022428   0.036736  -0.611    0.542    
## source自由     0.237578   0.051001   4.658 5.42e-06 ***
## source聯合    -0.081806   0.069290  -1.181    0.239    
## sourceTVBS     0.002439   0.039471   0.062    0.951    
## s(date_num)1  -0.170587   0.188224  -0.906    0.366    
## s(date_num)2   0.113468   0.133250   0.852    0.395    
## s(date_num)3  -0.124102   0.118975  -1.043    0.298    
## s(date_num)4   0.094117   0.110648   0.851    0.396    
## s(date_num)5  -0.011029   0.119037  -0.093    0.926    
## s(date_num)6  -0.015708   0.119731  -0.131    0.896    
## s(date_num)7   0.009858   0.131303   0.075    0.940    
## s(date_num)8  -0.058093   0.131480  -0.442    0.659    
## s(date_num)9  -0.120959   0.129265  -0.936    0.350    
## s(date_num)10 -0.014707   0.113987  -0.129    0.897    
## net_sentiment -0.055443   0.025545  -2.170    0.031 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 9:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.030071   0.070837   0.425 0.671595    
## source中央社   0.049003   0.032899   1.490 0.137732    
## source中時     0.069858   0.033067   2.113 0.035722 *  
## source自由     0.003659   0.034082   0.107 0.914608    
## source聯合     0.183164   0.082393   2.223 0.027194 *  
## sourceTVBS    -0.002310   0.032458  -0.071 0.943317    
## s(date_num)1   0.126354   0.157103   0.804 0.422075    
## s(date_num)2  -0.162998   0.131942  -1.235 0.217963    
## s(date_num)3   0.123475   0.103531   1.193 0.234252    
## s(date_num)4  -0.145250   0.083750  -1.734 0.084210 .  
## s(date_num)5  -0.058165   0.097114  -0.599 0.549808    
## s(date_num)6  -0.130119   0.091153  -1.427 0.154812    
## s(date_num)7  -0.066535   0.094899  -0.701 0.483949    
## s(date_num)8  -0.111693   0.107590  -1.038 0.300306    
## s(date_num)9  -0.016811   0.114063  -0.147 0.882957    
## s(date_num)10 -0.089803   0.087068  -1.031 0.303437    
## net_sentiment  0.075869   0.022675   3.346 0.000959 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 10:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    0.051123   0.127741   0.400   0.6894  
## source中央社  -0.046889   0.067205  -0.698   0.4861  
## source中時     0.131674   0.069413   1.897   0.0591 .
## source自由    -0.007541   0.068735  -0.110   0.9127  
## source聯合    -0.107917   0.114817  -0.940   0.3483  
## sourceTVBS     0.079421   0.065859   1.206   0.2291  
## s(date_num)1   0.206841   0.266905   0.775   0.4392  
## s(date_num)2  -0.118623   0.223821  -0.530   0.5966  
## s(date_num)3   0.171236   0.170998   1.001   0.3177  
## s(date_num)4  -0.022242   0.161291  -0.138   0.8904  
## s(date_num)5   0.248017   0.172181   1.440   0.1511  
## s(date_num)6   0.038584   0.176394   0.219   0.8271  
## s(date_num)7   0.143873   0.176428   0.815   0.4157  
## s(date_num)8   0.027895   0.203091   0.137   0.8909  
## s(date_num)9   0.198602   0.203360   0.977   0.3298  
## s(date_num)10 -0.019285   0.159302  -0.121   0.9037  
## net_sentiment  0.020645   0.044247   0.467   0.6412  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 11:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.003676   0.066602  -0.055 0.956028    
## source中央社   0.035823   0.034033   1.053 0.293631    
## source中時     0.070560   0.031353   2.251 0.025371 *  
## source自由     0.002532   0.033697   0.075 0.940172    
## source聯合    -0.019357   0.058460  -0.331 0.740858    
## sourceTVBS     0.017969   0.032395   0.555 0.579655    
## s(date_num)1   0.094026   0.159808   0.588 0.556870    
## s(date_num)2  -0.134323   0.113706  -1.181 0.238706    
## s(date_num)3   0.361270   0.107563   3.359 0.000918 ***
## s(date_num)4  -0.183899   0.083716  -2.197 0.029048 *  
## s(date_num)5   0.175040   0.093225   1.878 0.061713 .  
## s(date_num)6  -0.019111   0.089198  -0.214 0.830541    
## s(date_num)7   0.020479   0.088404   0.232 0.817013    
## s(date_num)8  -0.016401   0.101616  -0.161 0.871923    
## s(date_num)9  -0.003985   0.098978  -0.040 0.967916    
## s(date_num)10  0.029978   0.081652   0.367 0.713858    
## net_sentiment -0.051431   0.022652  -2.270 0.024112 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 12:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -0.050515   0.093584  -0.540    0.590    
## source中央社   0.228021   0.050918   4.478 1.19e-05 ***
## source中時     0.024995   0.043773   0.571    0.569    
## source自由     0.055256   0.047921   1.153    0.250    
## source聯合     0.015128   0.091482   0.165    0.869    
## sourceTVBS     0.063153   0.047166   1.339    0.182    
## s(date_num)1   0.025370   0.199979   0.127    0.899    
## s(date_num)2  -0.051491   0.156208  -0.330    0.742    
## s(date_num)3   0.132781   0.136960   0.969    0.333    
## s(date_num)4   0.007532   0.121655   0.062    0.951    
## s(date_num)5   0.198943   0.145194   1.370    0.172    
## s(date_num)6   0.058938   0.130987   0.450    0.653    
## s(date_num)7   0.144085   0.133512   1.079    0.282    
## s(date_num)8   0.087714   0.149813   0.585    0.559    
## s(date_num)9   0.224196   0.145474   1.541    0.125    
## s(date_num)10  0.059250   0.125887   0.471    0.638    
## net_sentiment -0.046052   0.032113  -1.434    0.153    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 13:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.047562   0.070235   0.677  0.49898    
## source中央社  -0.038796   0.031333  -1.238  0.21692    
## source中時    -0.050407   0.031945  -1.578  0.11597    
## source自由    -0.016385   0.034059  -0.481  0.63092    
## source聯合     0.002482   0.052522   0.047  0.96236    
## sourceTVBS    -0.012715   0.033630  -0.378  0.70572    
## s(date_num)1   0.130991   0.145579   0.900  0.36918    
## s(date_num)2  -0.005062   0.102314  -0.049  0.96058    
## s(date_num)3  -0.046931   0.089410  -0.525  0.60017    
## s(date_num)4   0.079078   0.081631   0.969  0.33371    
## s(date_num)5  -0.237528   0.087874  -2.703  0.00739 ** 
## s(date_num)6   0.219505   0.090220   2.433  0.01574 *  
## s(date_num)7  -0.099149   0.095314  -1.040  0.29933    
## s(date_num)8  -0.058698   0.093165  -0.630  0.52930    
## s(date_num)9  -0.043678   0.097903  -0.446  0.65592    
## s(date_num)10 -0.086440   0.079604  -1.086  0.27868    
## net_sentiment  0.105460   0.020535   5.136 6.03e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 14:
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.21632    0.08138   2.658 0.008411 ** 
## source中央社  -0.14542    0.04959  -2.932 0.003709 ** 
## source中時    -0.20634    0.04633  -4.454 1.32e-05 ***
## source自由    -0.21571    0.04829  -4.467 1.25e-05 ***
## source聯合    -0.22175    0.07501  -2.956 0.003440 ** 
## sourceTVBS    -0.17608    0.04827  -3.648 0.000328 ***
## s(date_num)1  -0.13372    0.16310  -0.820 0.413148    
## s(date_num)2   0.07375    0.13048   0.565 0.572486    
## s(date_num)3  -0.12820    0.10742  -1.193 0.233941    
## s(date_num)4   0.13850    0.10192   1.359 0.175523    
## s(date_num)5  -0.14241    0.10598  -1.344 0.180384    
## s(date_num)6   0.05180    0.10874   0.476 0.634287    
## s(date_num)7   0.10566    0.11306   0.935 0.350991    
## s(date_num)8  -0.11104    0.13839  -0.802 0.423164    
## s(date_num)9   0.30801    0.13729   2.244 0.025819 *  
## s(date_num)10 -0.01039    0.10508  -0.099 0.921335    
## net_sentiment -0.01772    0.02926  -0.606 0.545332    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 15:
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   -0.017501   0.056638  -0.309   0.7576  
## source中央社  -0.024117   0.027581  -0.874   0.3828  
## source中時     0.007230   0.028830   0.251   0.8022  
## source自由     0.071584   0.034866   2.053   0.0412 *
## source聯合    -0.018918   0.052129  -0.363   0.7170  
## sourceTVBS    -0.009799   0.028752  -0.341   0.7335  
## s(date_num)1   0.044001   0.115007   0.383   0.7024  
## s(date_num)2   0.001519   0.090916   0.017   0.9867  
## s(date_num)3   0.068262   0.076745   0.889   0.3747  
## s(date_num)4   0.003175   0.069686   0.046   0.9637  
## s(date_num)5   0.079563   0.079962   0.995   0.3208  
## s(date_num)6   0.030826   0.077001   0.400   0.6893  
## s(date_num)7  -0.038343   0.076557  -0.501   0.6170  
## s(date_num)8   0.217652   0.100828   2.159   0.0319 *
## s(date_num)9   0.114549   0.100380   1.141   0.2550  
## s(date_num)10  0.036052   0.070322   0.513   0.6087  
## net_sentiment -0.022449   0.019767  -1.136   0.2573  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output, however, should be interpreted in factorial term, such as “being sourced from 中時” reduces the probability that a news article will be sorted into Topic 14 (改變選戰策略) by 1 - exp(-0.20961) \(\approx\) 0.19 (or about 19%).

Finally, we ask if partisan media tended to portray certain topics positively or negatively during this past Presidential election. For example, did 自由時報 and 三立 covered news related to 侯友宜 negatively, compared to 聯合報 and 中國時報 that reported 侯友宜 favorably but portrayed news pertaining to 賴清德 in a negative light. To probe this alledged partisan effect, we disaggregate the data (\(\textsf{prep}\)) by news media and map each media’s topic position on the sentiment scale.

# Generate a list of major media
major_media <- unique(meta_stm$source)

# Plotting

par(mfrow = c(2, 3), mar = c(4, 1, 1, 1), oma = c(3, 0, 0, 0)) # Generate a 2 by 3 plotting environment. The position indices within c() refer to the margin space in c(bottom, left, top, right) (that is, rotating in counterclockwise fashion)

for(i in 1:6){
  plot(prep,
       covariate = "net_sentiment",
       method = "difference",
       cov.value1 = 1, 
       cov.value2 = 0,
       moderator = "source",
       moderator.value = major_media[i],
       model = stm_model,
       topics = 1:K_best,  
       main = paste(major_media[i]),
       xlab = "Negative ←    → Positive",
       # xlim = c(-0.8, 0.7),
       labeltype = "custom", # Use custom labels (see below)
       custom.labels = topic_names,
       printlegend = FALSE,
       verbose.labels = FALSE)
  abline(v=0, col="red", lty=2, lwd=2) # x = 0 reference line
}

Contrary to common perception, UDN turned out to be the most rhetorically neutral media. DPP-leaning partisan media (三立) portrayed 侯友宜支持度上升 positively but reported 民調差距 in a negative (though insignificant) way, compared to 中時 that reported the same topic negatively. 中時 is also the only major media that conveyed the topic of 民調差距 positively, lending indirect evidence to our claim, despite the actual content and writing style remain to be ascertained. Interestingly, if there is one thing that these partisan media can agree on, it will be their treatment of 趙少康, the KMT’s vice presidential candidate: 趙少康 received a generally favorable coverage by all six media. Overall, the differences in reporting patterns owing to major media’s partisan orientation are not as pronounced as we initially thought it would be.

PS 5708 Week 14

Huan-Kai Tseng

2025-11-27

Text Mining on Election News

STM Estimate of Topic Prevalence of Taiwan’s 2024 Presidential election cycle: December 2023