JMSC 6116 Lecture 1: A Deep Dive into Xi and Tsai’s Taiwan Speech

This short article aims to analyze and compare the speech made by Xi Jinping “Message to Compatriots in Taiwan” in Jan 2019 with Tsai Ing-wen’s statement of response as well as her acceptance speech on Jan 11 2020.

First at all, we install the required libraries and load them into the system.

if (!require("tm")) install.packages("tm", repos="https://cran.cnr.berkeley.edu/")
if (!require("wordcloud")) install.packages("wordcloud", repos="https://cran.cnr.berkeley.edu/")
if (!require("wordcloud2")) install.packages("wordcloud2", repos="https://cran.cnr.berkeley.edu/")
if (!require("plotly")) install.packages("plotly", repos="https://cran.cnr.berkeley.edu/")

Then, let’s obtain the copy of Xi Jinping’s speech.The file is already uploaded to my GitHub in plain text format. The first five lines of his speech are displayed for checking.

con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/xi_taiwan.txt") # Establish a connection via url
xi <- readLines(con)  # Read line by line from the connection to a string array xi
close(con) # Remember to close the connection after use
xi[1:5] # List the first five lines of Xi's speech

## [1] "Resolving Taiwan question, complete reunification a historic task"                                                                                                                                                                                    
## [2] "Since 1949, the Communist Party of China (CPC), the Chinese government and the Chinese people have always unwaveringly taken resolving the Taiwan question to realize China's complete reunification as a historic task, President Xi Jinping said."  
## [3] "Breakthroughs in cross-Straits relations over 70 years"                                                                                                                                                                                               
## [4] "Over the 70 years, estrangement between the mainland and Taiwan was ended in line with the common will of compatriots across the Straits, and Taiwan compatriots have made great contributions to the reform and opening-up in the mainland, said Xi."
## [5] "During the seven decades, the mainland and Taiwan reached the 1992 Consensus based on the one-China principle, and the political exchanges across the Straits have reached new heights, Xi said."

Next, we get the copies of Tsai’s response speech in 2019 and the victory speech in 2020.

con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/tsai_taiwan.txt")
tsai <- readLines(con)
close(con)

con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/tsai_taiwan2020.txt")
tsai2020 <- readLines(con)
close(con)

tsai2020[1:5] # Show the first lines of her victory speech

## [1] "Friends from the domestic and international media, thank you for your patience."                                                                                                                                                                                                                                                                            
## [2] "To begin, I would like to thank everyone who voted today. Regardless of how you voted, by taking part in this election you have put democratic values into practice. With each presidential election, Taiwan is showing the world how much we cherish our free, democratic way of life, and how much we cherish our nation: the Republic of China (Taiwan)."
## [3] "I would also like to offer my respect to Mayor Han and Chairman Soong for completing this democratic journey with me. I will take your constructive criticism with me into my next term. I am confident that although our parties may have different views, we will have many opportunities to cooperate in the future."                                    
## [4] "Today, the Taiwanese people voted to keep the Democratic Progressive Party in office and maintain our majority in the legislature. This result signifies that our administration and legislators have been moving in the right direction over the past four years."                                                                                         
## [5] "I want to thank each and every person who voted for the Tsai-Lai ticket, as well as everyone who supported our DPP candidates. Thank you for choosing democratic and progressive values, and for choosing the path of reform and unity."

Then, we define a R function, namely Preprocessing, to convert all characters into lower case (consistency for counting purpose), remove the punctuations and “stopwords”, and “clean” the text data into a format for next step. The first portion of the “cleaned” version of Xi’s speech are displayed.

Preprocessing <- function(doc){
  doc <- paste(doc,collapse=" ") # Collapse into one single line
  #create corpus
  doc.corpus <- Corpus(VectorSource(doc))
  #clean up
  doc.corpus <- tm_map(doc.corpus, function(x)chartr("ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz",x))### Convert to lower case
  doc.corpus <- tm_map(doc.corpus, removePunctuation)   ### remove punctuation
  doc.corpus <- tm_map(doc.corpus, function(x)removeWords(x,stopwords("english")))  #### remove stopwords
  return(doc.corpus)
}

xi.p <- Preprocessing(xi) # Send the array xi to the function for data cleaning
tsai.p <- Preprocessing(tsai) # Output the results to xi.p and tsai.p (class "Corpus")
tsai2020.p <- Preprocessing(tsai2020) # Output the results to xi.p and tsai.p (class "Corpus")

substr(xi.p$content,1,200) # Inspect the first 200 words of Xi's "cleaned" speech

## [1] "resolving taiwan question complete reunification  historic task since 1949  communist party  china cpc  chinese government   chinese people  always unwaveringly taken resolving  taiwan question  reali"

substr(tsai2020.p$content,1,200) # Inspect the first 200 words of Tsai's "cleaned" speech

## [1] "friends   domestic  international media thank    patience  begin   like  thank everyone  voted today regardless    voted  taking part   election   put democratic values  practice   presidential electi"

Ok. So far so good. We now compare the highest frequency terms used in the speeches. To do this task, we create a term document matrix, which stores terms used (by rows) and the two leaders (by columns).

tdm <- TermDocumentMatrix(Corpus(VectorSource(c(xi.p$content,tsai.p$content,tsai2020.p$content)))) # Create a term document matrix
tdm <- as.matrix(tdm) # convert it into a standard matrix
colnames(tdm) <- c("Xi's speech","Tsai's Statement","Tsai's 2020 Speech") # Assign the names to the columns

Here you go. The top-5 high frequency term plot is presented. The variable “Num_of_term_shown” is defined to control the number of displayed terms. You modify the number, i.e. first line “Num_of_terms_shown <- 5” which means 5 is asigned to the variable “Num_of_terms_shown”, and rerun the program to see the changes.

Num_of_terms_shown <- 5
Create_barplot <- function(cname,t,title2,display_color){
  freqterm <- tdm[,cname]
  barplot <- data.frame(name=names(freqterm),y=freqterm)
  barplot <- barplot[order(barplot$y,decreasing=TRUE),]
  barplot$name <- factor(barplot$name, levels = barplot$name)
  barplot <- barplot[1:t,]
  p_output <- plot_ly(barplot, x = ~name, y = ~y, type = 'bar', 
             text = ~y, textposition = 'auto', name = cname,
             marker = list(color = display_color, line = list(color = display_color, width = 1.5)))
  p_output <- layout(p_output, title = title2, xaxis = list(title = ""), yaxis = list(title = ""))
}

p1 <- Create_barplot("Xi's speech",Num_of_terms_shown,"","red")
p2 <- Create_barplot("Tsai's Statement",Num_of_terms_shown,"","green")
p3 <- Create_barplot("Tsai's 2020 Speech",Num_of_terms_shown,"Top 5 Terms Used in Xi Jingping/Tsai Ing-wen's Speech","lightgreen")

p <- subplot(p1,p2,p3,shareY=T)
layout(p, showlegend = T)

Question: is term frequency a good way for comparison? If not, what is a better option? How can we make a better plot? Hints:

print(paste0("Total number of terms of Xi's speech (cleaned version):",sum(tdm[,"Xi's speech"])))

## [1] "Total number of terms of Xi's speech (cleaned version):652"

print(paste0("Total number of terms of Tsai's response statement (cleaned version):",sum(tdm[,"Tsai's Statement"])))

## [1] "Total number of terms of Tsai's response statement (cleaned version):424"

print(paste0("Total number of terms of Tsai's 2020 victory speech (cleaned version):",sum(tdm[,"Tsai's 2020 Speech"])))

## [1] "Total number of terms of Tsai's 2020 victory speech (cleaned version):541"

Next, we visualize the text frequency by using wordcloud function and put them side-by-side.

min.freq <- 3 # Minimium number of word frequency
par(mfrow=c(1,3)) # 1x3 panel plot
par(mar=c(0.75, 0.75, 0.75, 0.75)) # Set the plot margin
par(bg="black") # background color is black
par(col.main="white") # Title color is white
wordcloud(xi.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"),family="serif")   
title("Xi's Statement")
wordcloud(tsai.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"),family="mono")   
title("Tsai's Speech")
wordcloud(tsai2020.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"), family="Georgia")   
title("Tsai's 2020 Speech")

Finally, we generate a comparison wordcloud, which compares the relative frequency with which a term was used in the two speeches. For example, Xi used the word “reunification” and “said” more frequenctly than did Tsai and the word cloud thus prints the term “reunification”, “comnpatriots” and “said” in the side of Xi. By the same token, you find “taiwaness”, “crossstrait”, “democracy”, and “consultation” are closer to Tsai’s side. The plot shows the difference between the language usage of the leaders and sheds light on the discourse of their speeches.

comparison.cloud(tdm,max.words=100,random.order=FALSE, colors=c("red","green","lightgreen"),family="Georgia")

JMSC 6116 Lecture 1: A Deep Dive into Xi and Tsai’s Taiwan Speech

King-wa Fu

January 17, 2020