主題

PTT八卦版:鄉民因為南投縣政府要求論壇刪除特定文章,引起鄉民認為政府限制言論自由的討論分析。

動機和分析目的

2020年於南投縣旭光高中發生一起震驚社會的性侵案件,加害人的父親試圖利用派出所巡佐兼副所長身份來包庇兒子,引起社會公眾的不滿。因此,我們嘗試使用文字探勘以及情緒分析的方式來探討廣大網民對於此事件的看法。

資料基本介紹

  • 資料來源: 文字平台收集PTT Gossip版2020-09-01到 2021-03-25 所有文章
  • 資料集: tsj_a.csv(內文),tsj_r.csv(回文)
  • 關鍵字:TSJ,田勝傑,口交狂魔,田裕璋,惡徒
  • 資料時間:2020-09-01 ~ 2021-03-25

這次我們以最近發生的TSJ事件,主要分析ptt上網友的相關討論,本次主要針對以下方向分析:

1.TSJ事件討論大概出現在哪個時間點,話題高峰在哪裡?
2.正面和負面的討論內容各是甚麼,有沒有時間點上的差異?
3.正面和負面討論的情緒分數大約多少?

系統參數設定

Sys.setlocale(category = "LC_ALL", locale = "zh_TW.UTF-8") # 避免中文亂碼
## [1] ""

安裝需要的packages

# echo = T,results = 'hide'
packages = c("dplyr", "tidytext", "stringr", "wordcloud2", "ggplot2",'readr','data.table','reshape2','wordcloud','tidyr','scales')
existing = as.character(installed.packages()[,1])
for(pkg in packages[!(packages %in% existing)]) install.packages(pkg)

讀進library

rm(list=ls(all=T))
library(dplyr)
library(stringr)
library(tidytext)
library(wordcloud2)
library(data.table)
library(ggplot2)
library(reshape2)
library(wordcloud)
library(tidyr)
library(readr)
library(scales)
require(jiebaR)
library(janeaustenr)
library(ngram)
require(widyr)
require(readr)
require(NLP)
require(ggraph)
require(igraph)
# 把文章和留言讀進來
MetaData = fread('csv/tsj_a.csv',encoding = 'UTF-8')
Reviews  = fread('csv/tsj_r.csv',encoding = 'UTF-8')

# 再篩一次文章,從488篩到剩下201
keywords = c('TSJ','口交','惡徒','狂魔','吹狂魔','田勝傑','田聖傑','田裕璋','tsj','旭光')
toMatch = paste(keywords,collapse="|")
MetaData = with(MetaData, MetaData[grepl(toMatch,sentence)|grepl(toMatch,artTitle),])

# 挑選文章對應的留言,從40058到18372
Reviews = left_join(MetaData, Reviews[,c("artUrl", "cmtContent")], by = "artUrl")

1. 資料前處理

(1). 文章斷詞

設定斷詞引擎

# 加入自定義的字典
jieba_tokenizer <- worker(user="dict/user_dict.txt", stop_word = "dict/stop_words.txt")

# 設定斷詞function
customized_tokenizer <- function(t) {
  lapply(t, function(x) {
    tokens <- segment(x, jieba_tokenizer)
    return(tokens)
  })
}
# 把文章和留言的斷詞結果併在一起
MToken <- MetaData %>% unnest_tokens(word, sentence, token=customized_tokenizer)
RToken <- Reviews %>% unnest_tokens(word, cmtContent, token=customized_tokenizer)

# 把資料併在一起
data <- rbind(MToken[,c("artDate","artUrl", "word")],RToken[,c("artDate","artUrl", "word")]) 

(2). 資料基本清理

  • 日期格式化
  • 去除特殊字元、詞頻太低的字
# 格式化日期欄位
data$artDate= data$artDate %>% as.Date("%Y/%m/%d")

#只取tsj
data_tsj = data %>% filter(word == "tsj")

# 過濾特殊字元
data_select = data %>% 
  filter(!grepl('[[:punct:]]',word)) %>% # 去標點符號 
  filter(!grepl("['^0-9a-z']",word)) %>% # 去英文、數字
  filter(nchar(.$word)>1) 
data_select = bind_rows(data_tsj,data_select)
  
# 算每天不同字的詞頻
# word_count:artDate,word,count
word_count <- data_select %>%
  select(artDate,word) %>%
  group_by(artDate,word) %>%
  summarise(count=n()) %>%  # 算字詞單篇總數用summarise
  filter(count>3) %>%  # 過濾出現太少次的字
  arrange(desc(count))
## `summarise()` has grouped output by 'artDate'. You can override using the `.groups` argument.

(3). 統計每日貼文數量

#每日統計貼文
date = data %>% select(artDate, artUrl) %>% distinct()
date = date %>% group_by(artDate) %>% summarize(count_day = n()) %>% ungroup() 
date  = date %>% arrange(desc(count_day))
date_plot <- date %>% 
  ggplot(aes(x = artDate, y = count_day)) +
  geom_line(color = "purple", size = 1.5) +
   geom_vline(xintercept = c(as.numeric(as.Date("2021-03-23")),
                            as.numeric(as.Date("2020-10-23")),
                            as.numeric(as.Date("2020-09-10"))
                            ), col='red', size = 1) + 
  scale_x_date(labels = date_format("%Y/%m/%d")) +
  ggtitle("「TSJ」討論文章數") + 
  xlab("日期") + 
  ylab("數量")
date_plot

圖中發現有三個時間點

  • 2020/09/10: TSJ事件剛爆發的時間
  • 2020/10/23: 南投縣政府社會處及勞動處發函要求各網路公司移除留言以及文章
  • 2021/03/23: PTT站方刪除122篇內文具有加害者相關的文章,巴哈姆特論壇拒絕刪文遭罰。部份台灣鄉民批評南投縣府的做法,並繼續使用同音字、藏頭詩、示意圖等各種方式,繼續揭露相關訊息。

2. 準備LIWC字典

讀檔,字詞間以“,”將字分隔

P <- read_file("dict/liwc/positive.txt") # 正向字典txt檔
N <- read_file("dict/liwc/negative.txt") # 負向字典txt檔

#字典txt檔讀進來是一整個字串
typeof(P)
## [1] "character"

分割字詞,並將兩個情緒字典併在一起

# 將字串依,分割
# strsplit回傳list , 我們取出list中的第一個元素
P = strsplit(P, ",")[[1]]
N = strsplit(N, ",")[[1]]

# 建立dataframe 有兩個欄位word,sentiments,word欄位內容是字典向量
P = data.frame(word = P, sentiment = "positive") #661
N = data.frame(word = N, sentiment = "negative") #1049

# 把兩個字典拼在一起
LIWC = rbind(P, N)

3. 「TSJ」事件分析

###查看發文次數最高的三天,最常出現的詞彙

g_tokens_by_date <- data_select %>% count(artDate, word, sort = TRUE)
  
plot_merge <- g_tokens_by_date %>% 
  filter(artDate == as.Date("2021-03-23") | 
         artDate == as.Date("2020-10-23") | 
         artDate == as.Date("2020-09-10")) %>% 
  group_by(artDate) %>% 
  top_n(7, n) %>% 
  ungroup() %>% 
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x=word, y=n, fill = artDate)) +
  geom_col(show.legend = FALSE) +
  labs(x = NULL, y = NULL) +
  facet_wrap(~artDate, scales="free", ncol = 2) + 
  coord_flip()
plot_merge

  • 2020/09/10: 當天新聞標題<警察兒性侵女同學,警駁吃案遭蓋樓罵爆!緊急刪文分局長道歉>,加害人tsj的父親是南投縣草屯分局副所長,因此當被爆出吃案的消息後,網友利用經典橋段「好大的官威」,來表示不滿。

  • 2020/10/23: 由於有外界公權力介入,當天出現很多「自殺文」,也就是挑戰公權力,故意發文提到相關敏感字眼,而ptt的網友都會在回文處回復「勇者」來讚揚貼文者不畏強權的態度,另外「有聲音」是因為有網友自行製作海綿寶寶的迷因圖,回文者紛紛表示「有畫面又有聲音」,來認可貼文者。

  • 2021/03/21: 由於雙方未成年,因此並未公開少年的真實姓名,縣府多次要求PTT、Dcard、巴哈姆特等網路平台刪文,並以《兒少法》對巴哈姆特開罰6萬元。對此,網紅「小商人」23日直接在臉書公布少年姓名。

觀察頻繁出現在“田勝傑”或是“tsj”周遭的文字

  • 利用11gram並且去除數字還有其他英文
ngram_11 <- function(t) {
  lapply(t, function(x) {
    tokens <- segment(x, jieba_tokenizer)
    ngram <- ngrams(tokens, 11)
    ngram <- lapply(ngram, paste, collapse = " ")
    unlist(ngram)
  })
}

data_tsj <- MetaData %>%
  select(artUrl, sentence) %>%
  unnest_tokens(word, sentence, token = customized_tokenizer)

data_tsj= data_tsj %>%  filter(data_tsj$word =="tsj"|!str_detect(word, regex("[0-9a-zA-Z]")))

w = data_tsj %>% group_by(artUrl) %>% summarize(sentence = paste(word,collapse = ""))

g_ngram_11 <- w %>%
  select(artUrl, sentence) %>%
  unnest_tokens(ngram, sentence, token = ngram_11)

g_ngrams_11_separated <- g_ngram_11 %>%
  separate(ngram, paste0("word", c(1:11),sep=""), sep = " ")

1.常見於“田勝傑”附近的字詞分析

g_check_words <- g_ngrams_11_separated %>%
  filter(word6 == "田勝傑")
g_check_words_count <- g_check_words %>%
  melt(id.vars = "artUrl", measure.vars = paste0("word", c(1:11),sep="")) %>%
  rename(word=value) %>%
  filter(variable!="word6") %>%
  filter(!(word %in% stop_words), nchar(word)>1) %>%
  count(word, sort = TRUE)
g_check_words_count %>%
  arrange(desc(abs(n))) %>%
  head(15) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = n > 0)) +
  geom_col(show.legend = FALSE) +
  xlab("出現在「田勝傑」附近的字") +
  ylab("出現次數") +
  coord_flip()

2.常見於“tsj”附近的字詞分析

g_check_words <- g_ngrams_11_separated %>%
  filter(word6 == "tsj")
g_check_words_count <- g_check_words %>%
  melt(id.vars = "artUrl", measure.vars = paste0("word", c(1:11),sep="")) %>%
  rename(word=value) %>%
  filter(variable!="word6") %>%
  filter(!(word %in% stop_words), nchar(word)>1) %>%
  count(word, sort = TRUE)
g_check_words_count %>%
  arrange(desc(abs(n))) %>%
  head(15) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = n > 0)) +
  geom_col(show.legend = FALSE) +
  xlab("出現在「tsj」附近的字") +
  ylab("出現次數") +
  coord_flip()

字詞關聯圖

g_words_by_art <- data_select %>%
  count(artUrl, word, sort = TRUE)
g_word_pairs <- g_words_by_art %>%
  pairwise_count(word, artUrl, sort = TRUE)
## Warning: `distinct_()` was deprecated in dplyr 0.7.0.
## Please use `distinct()` instead.
## See vignette('programming') for more help
## Warning: `tbl_df()` was deprecated in dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
g_word_cors <- g_words_by_art %>%
  group_by(word) %>%
  filter(n() >= 20) %>%
  pairwise_cor(word, artUrl, sort = TRUE)
threshold <- 0.65 #手動調參
remove_words <- g_word_cors %>%
                filter(correlation>threshold) %>%
                .$item1 %>%
                unique()
set.seed(2017)
g_word_cors_new <- g_word_cors %>%
                filter(!(item1 %in% remove_words|item2 %in% remove_words))
g_word_cors_new %>%
  filter(correlation > .505) %>%
  graph_from_data_frame() %>%
  ggraph(layout = "fr") +
  geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) + 
  geom_node_point(color = "lightblue", size = 3) +
  geom_node_text(aes(label = name), repel = TRUE) +
  theme_void()

正負情緒分數折線圖

算出每天情緒總和(sentiment_count)

# sentiment_count:artDate,sentiment,count
sentiment_count = data_select %>%
  select(artDate,word) %>%
  inner_join(LIWC) %>% 
  group_by(artDate,sentiment) %>%
  summarise(count=n())  
## Joining, by = "word"
## `summarise()` has grouped output by 'artDate'. You can override using the `.groups` argument.
# 檢視資料的日期區間
range(sentiment_count$artDate) #"2020-09-08" "2021-03-25"
## [1] "2020-09-09" "2021-03-24"
sentiment_count %>%
  ggplot()+
  geom_line(aes(x=artDate,y=count,colour=sentiment))+

  scale_x_date(labels = date_format("%Y/%m/%d"),
               limits = as.Date(c('2020-09-08','2021-03-25'))
               )+
  # 加上標示日期的線
  geom_vline(aes(xintercept = as.numeric(artDate[which(sentiment_count$artDate == as.Date('2020-09-10   '))
[1]])),colour = "green",linetype=4) +
  # 加上標示日期的線
  geom_vline(aes(xintercept = as.numeric(artDate[which(sentiment_count$artDate == as.Date('2020-10-23   '))
[1]])),colour = "green",linetype=4) +
  # 加上標示日期的線
  geom_vline(aes(xintercept = as.numeric(artDate[which(sentiment_count$artDate == as.Date('2021-03-21   '))
[1]])),colour = "green",linetype=4) 

  • 2020/10/23因為自殺文的出現,回文中出現大量「勇者」、「厲害」、「了不起」的正向文字出現,導致當天正向文字的數量超過負面。

正負情緒比例折線圖

sentiment_count %>% 
  # 標準化的部分
  group_by(artDate) %>%
  mutate(ratio = count/sum(count)) %>%
  # 畫圖的部分
  ggplot()+
  geom_line(aes(x=artDate,y=ratio,colour=sentiment))+
  scale_x_date(labels = date_format("%m/%d"),
               limits = as.Date(c('2020-09-08','2021-03-25'))
               )

1.分析2020/09/10前後五天的情緒

sentiment_count %>%  filter(artDate<=as.Date("2020-09-17",format="%Y-%m-%d"))%>%
  # 標準化的部分
  group_by(artDate) %>%
  mutate(ratio = count/sum(count)) %>%
  # 畫圖的部分
  ggplot()+
  geom_line(aes(x=artDate,y=ratio,colour=sentiment))+
  scale_x_date(labels = date_format("%m/%d"),
               limits = as.Date(c('2020-09-08','2020-09-17'))
               )+
  # 加上標示日期的線
  geom_vline(aes(xintercept = as.numeric(artDate[which(sentiment_count$artDate == as.Date('2020-09-10'))
[1]])),colour = "black",linetype=4)

  • 2020/09/09號就有新聞媒體報導,網友情緒到了14號之後,有逐漸緩和的跡象。

2.分析2020/10/23前後五天的情緒

sentiment_count %>%  filter(artDate<=as.Date("2020-10-28",format="%Y-%m-%d")&artDate>=as.Date("2020-10-18",format="%Y-%m-%d"))%>%
  # 標準化的部分
  group_by(artDate) %>%
  mutate(ratio = count/sum(count)) %>%
  # 畫圖的部分
  ggplot()+
  geom_line(aes(x=artDate,y=ratio,colour=sentiment))+
  scale_x_date(labels = date_format("%m/%d"),
               limits = as.Date(c('2020-10-18','2020-10-28'))
               )+
  geom_vline(aes(xintercept = as.integer(as.Date("2020-10-23",format="%Y-%m-%d"))), col = "black",linetype=4)

  • 當公權力介入民間論壇之後,原本舒緩的情緒,再度被帶到高點。

3.分析2021/03/23前後五天的情緒

sentiment_count %>%  filter(artDate<=as.Date("2021-03-25",format="%Y-%m-%d")&artDate>=as.Date("2021-03-18",format="%Y-%m-%d"))%>%
  group_by(artDate) %>%
  mutate(ratio = count/sum(count)) %>%
  ggplot()+
  geom_line(aes(x=artDate,y=ratio,colour=sentiment))+
  scale_x_date(labels = date_format("%m/%d"),
               limits = as.Date(c('2021-03-16','2021-03-25'))
               )+
  geom_vline(aes(xintercept = as.integer(as.Date("2021-03-23",format="%Y-%m-%d"))), col = "black",linetype=4)

  • 2021/03/21因為先有針對巴哈姆特開罰的事件,促使當天負面情緒爆發,而兩天後「小商人」的揭露事件,回文出現大量「厲害」、「支持」等被歸類為正向的詞彙出現,稍微平衡了當天的情緒分數。

4.文字雲 還有 情緒字典代表字

2020-09-10

1.2020-09-10文字雲

# 畫出文字雲
 word_count  %>%
  filter(artDate == as.Date('2020-09-10')) %>% 
  select(word,count) %>% 
  group_by(word) %>% 
  summarise(count = sum(count)) %>%
  arrange(desc(count)) %>%
  filter(count>20) %>%   # 過濾出現太少次的字
  wordcloud2()
## Adding missing grouping variables: `artDate`

2.2020-09-10正負情緒代表字

# sentiment_sum:word,sentiment,sum
sentiment_sum <- 
  word_count %>%
   filter(artDate == as.Date('2020-09-10')) %>% 
    inner_join(LIWC) %>%
    group_by(word,sentiment) %>%
  summarise(
    sum = sum(count)
  ) %>% 
  arrange(desc(sum)) %>%
  data.frame() 
## Joining, by = "word"
## `summarise()` has grouped output by 'word'. You can override using the `.groups` argument.
sentiment_sum %>%
  top_n(20,wt = sum) %>%
  mutate(word = reorder(word, sum)) %>%
  ggplot(aes(word, sum, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  theme(text=element_text(size=14))+
  coord_flip()

3.2020-09-10正負情緒文字雲

sentiment_sum %>%
  acast(word ~ sentiment, value.var = "sum", fill = 0) %>%
  comparison.cloud(
    colors = c("salmon", "#72bcd4"), # positive negative
                   max.words = 50)

2020-10-23

1.2020-10-23文字雲

# 畫出文字雲
 word_count  %>%
  filter(artDate == as.Date('2020-10-23')) %>% 
  select(word,count) %>% 
  group_by(word) %>% 
  summarise(count = sum(count)) %>%
  arrange(desc(count)) %>%
  filter(count>30) %>%   # 過濾出現太少次的字
  wordcloud2()
## Adding missing grouping variables: `artDate`

2.2020-10-23正負情緒代表字

# sentiment_sum:word,sentiment,sum
sentiment_sum <- 
  word_count %>%
   filter(artDate == as.Date('2020-10-23')) %>% 
    inner_join(LIWC) %>%
    group_by(word,sentiment) %>%
  summarise(
    sum = sum(count)
  ) %>% 
  arrange(desc(sum)) %>%
  data.frame() 
## Joining, by = "word"
## `summarise()` has grouped output by 'word'. You can override using the `.groups` argument.
sentiment_sum %>%
  top_n(20,wt = sum) %>%
  mutate(word = reorder(word, sum)) %>%
  ggplot(aes(word, sum, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  theme(text=element_text(size=14))+
  coord_flip()

3.2020-10-23正負情緒文字雲

sentiment_sum %>%
  acast(word ~ sentiment, value.var = "sum", fill = 0) %>%
  comparison.cloud(
    colors = c("salmon", "#72bcd4"), # positive negative
                   max.words = 50)

2021-03-23

1.2021-03-23文字雲

# 畫出文字雲
 word_count  %>%
  filter(artDate == as.Date('2021-03-23')) %>% 
  select(word,count) %>% 
  group_by(word) %>% 
  summarise(count = sum(count)) %>%
  arrange(desc(count)) %>%
  filter(count>15) %>%   # 過濾出現太少次的字
  wordcloud2()
## Adding missing grouping variables: `artDate`

2.2021-03-23正負情緒代表字

# sentiment_sum:word,sentiment,sum
sentiment_sum <- 
  word_count %>%
   filter(artDate == as.Date('2021-03-23')) %>% 
    inner_join(LIWC) %>%
    group_by(word,sentiment) %>%
  summarise(
    sum = sum(count)
  ) %>% 
  arrange(desc(sum)) %>%
  data.frame() 
## Joining, by = "word"
## `summarise()` has grouped output by 'word'. You can override using the `.groups` argument.
sentiment_sum %>%
  top_n(30,wt = sum) %>%
  mutate(word = reorder(word, sum)) %>%
  ggplot(aes(word, sum, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to sentiment",
       x = NULL) +
  theme(text=element_text(size=14))+
  coord_flip()

3.2021-03-23正負情緒文字雲

sentiment_sum %>%
  acast(word ~ sentiment, value.var = "sum", fill = 0) %>%
  comparison.cloud(
    colors = c("salmon", "#72bcd4"), # positive negative
                   max.words = 50)

5.歸類正負面文章

之前的情緒分析大部分是全部的詞彙加總,接下來將正負面情緒的文章分開,看看能不能發現一些新的東西。接下來歸類文章,將每一篇文章正負面情緒的分數算出來,然後大概分類文章屬於正面還是負面。

# 依據情緒值的正負比例歸類文章
article_type = 
  data_select %>%
  inner_join(LIWC) %>% 
  group_by(artUrl,sentiment) %>%
  summarise(count=n()) %>%
  spread(sentiment,count,fill = 0) %>% #把正負面情緒展開,缺值補0
  mutate(type = case_when(positive > negative ~ "positive", 
                             TRUE ~ "negative")) %>%
  data.frame() 
## Joining, by = "word"
## `summarise()` has grouped output by 'artUrl'. You can override using the `.groups` argument.
# 看一下正負比例的文章各有幾篇
article_type %>%
  group_by(type) %>%
  summarise(count = n())
## # A tibble: 2 x 2
##   type     count
## * <chr>    <int>
## 1 negative   148
## 2 positive    49

正負情緒文章數量統計圖

# 
article_type_date = left_join(article_type[,c("artUrl", "type")], MetaData[,c("artUrl", "artDate")], by = "artUrl")
article_type_date$artDate = as.Date(article_type_date$artDate,format="%Y/%m/%d")

article_type_date %>%
  group_by(artDate,type) %>%
  summarise(count = n()) %>%
  ggplot(aes(x = artDate, y = count, fill = type)) + 
  geom_bar(stat = "identity", position = "dodge")+
  scale_x_date(labels = date_format("%m/%d"),
               limits = as.Date(c('2020-09-01','2021-03-25'))
               )
## `summarise()` has grouped output by 'artDate'. You can override using the `.groups` argument.

把正面和負面的文章挑出來,並和斷詞結果合併。

# negative_article:artUrl,word
negative_article <-
article_type %>%
  filter(type=="negative")%>%
  select(artUrl) %>%
  left_join(data_select[,c("artUrl", "word")], by = "artUrl")

# positive_article:artUrl,word
positive_article <-
article_type %>%
  filter(type=="positive")%>%
  select(artUrl) %>%
  left_join(data_select[,c("artUrl", "word")], by = "artUrl")

畫出正負面文章情緒貢獻度較高的關鍵字

情緒關鍵字:負面情緒文章

# 負面情緒關鍵字貢獻圖
negative_article %>%
inner_join(LIWC) %>%
    group_by(word,sentiment) %>%
  summarise(
    sum = n()
    )%>% 
  arrange(desc(sum)) %>%
  data.frame() %>%
  top_n(30,wt = sum) %>%
  ungroup() %>% 
  mutate(word = reorder(word, sum)) %>%
  ggplot(aes(word, sum, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to negative sentiment",
       x = NULL) +
  theme(text=element_text(size=14))+
  coord_flip()
## Joining, by = "word"
## `summarise()` has grouped output by 'word'. You can override using the `.groups` argument.

情緒關鍵字:正面情緒文章

# 正面情緒關鍵字貢獻圖
positive_article %>%
inner_join(LIWC) %>%
    group_by(word,sentiment) %>%
  summarise(
    sum = n()
    )%>% 
  arrange(desc(sum)) %>%
  data.frame() %>%
  top_n(30,wt = sum) %>%
  ungroup() %>% 
  mutate(word = reorder(word, sum)) %>%
  ggplot(aes(word, sum, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~sentiment, scales = "free_y") +
  labs(y = "Contribution to positive sentiment",
       x = NULL) +
  theme(text=element_text(size=14))+
  coord_flip()
## Joining, by = "word"
## `summarise()` has grouped output by 'word'. You can override using the `.groups` argument.

結論

在對整個事件的分析中,可以看出網民們對此事的態度看法基本都是持有批評、諷刺態度。從文章討論數分析可以看到性侵事件本身討論度不大、曝光率不高。反而是在南投縣政府發函強制要求平台刪文,網民們對政府行為的反應更激烈,特別是在政府再次發函要求平台刪文,平台拒絕刪文遭罰款後,討論度達到最高。說明網民對政府的強制性行為更加抵觸和反抗,越打壓就越討論。後面「自殺文」的出現與網友瘋狂回復“勇者”也充分說明了網民對強權態度的反抗。