This short article aims to analyze and compare the speech made by Xi Jinping “Message to Compatriots in Taiwan” in Jan 2019 with Tsai Ing-wen’s statement of response as well as her acceptance speech on Jan 11 2020.
First at all, we install the required libraries and load them into the system.
if (!require("tm")) install.packages("tm", repos="https://cran.cnr.berkeley.edu/")
if (!require("wordcloud")) install.packages("wordcloud", repos="https://cran.cnr.berkeley.edu/")
if (!require("wordcloud2")) install.packages("wordcloud2", repos="https://cran.cnr.berkeley.edu/")
if (!require("plotly")) install.packages("plotly", repos="https://cran.cnr.berkeley.edu/")
Then, let’s obtain the copy of Xi Jinping’s speech.The file is already uploaded to my GitHub in plain text format. The first five lines of his speech are displayed for checking.
con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/xi_taiwan.txt") # Establish a connection via url
xi <- readLines(con) # Read line by line from the connection to a string array xi
close(con) # Remember to close the connection after use
xi[1:5] # List the first five lines of Xi's speech
## [1] "Resolving Taiwan question, complete reunification a historic task"
## [2] "Since 1949, the Communist Party of China (CPC), the Chinese government and the Chinese people have always unwaveringly taken resolving the Taiwan question to realize China's complete reunification as a historic task, President Xi Jinping said."
## [3] "Breakthroughs in cross-Straits relations over 70 years"
## [4] "Over the 70 years, estrangement between the mainland and Taiwan was ended in line with the common will of compatriots across the Straits, and Taiwan compatriots have made great contributions to the reform and opening-up in the mainland, said Xi."
## [5] "During the seven decades, the mainland and Taiwan reached the 1992 Consensus based on the one-China principle, and the political exchanges across the Straits have reached new heights, Xi said."
Next, we get the copies of Tsai’s response speech in 2019 and the victory speech in 2020.
con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/tsai_taiwan.txt")
tsai <- readLines(con)
close(con)
con <- url("https://raw.githubusercontent.com/jmsc-bc4j/JMSC6116_public/master/tsai_taiwan2020.txt")
tsai2020 <- readLines(con)
close(con)
tsai2020[1:5] # Show the first lines of her victory speech
## [1] "Friends from the domestic and international media, thank you for your patience."
## [2] "To begin, I would like to thank everyone who voted today. Regardless of how you voted, by taking part in this election you have put democratic values into practice. With each presidential election, Taiwan is showing the world how much we cherish our free, democratic way of life, and how much we cherish our nation: the Republic of China (Taiwan)."
## [3] "I would also like to offer my respect to Mayor Han and Chairman Soong for completing this democratic journey with me. I will take your constructive criticism with me into my next term. I am confident that although our parties may have different views, we will have many opportunities to cooperate in the future."
## [4] "Today, the Taiwanese people voted to keep the Democratic Progressive Party in office and maintain our majority in the legislature. This result signifies that our administration and legislators have been moving in the right direction over the past four years."
## [5] "I want to thank each and every person who voted for the Tsai-Lai ticket, as well as everyone who supported our DPP candidates. Thank you for choosing democratic and progressive values, and for choosing the path of reform and unity."
Then, we define a R function, namely Preprocessing, to convert all characters into lower case (consistency for counting purpose), remove the punctuations and “stopwords”, and “clean” the text data into a format for next step. The first portion of the “cleaned” version of Xi’s speech are displayed.
Preprocessing <- function(doc){
doc <- paste(doc,collapse=" ") # Collapse into one single line
#create corpus
doc.corpus <- Corpus(VectorSource(doc))
#clean up
doc.corpus <- tm_map(doc.corpus, function(x)chartr("ABCDEFGHIJKLMNOPQRSTUVWXYZ","abcdefghijklmnopqrstuvwxyz",x))### Convert to lower case
doc.corpus <- tm_map(doc.corpus, removePunctuation) ### remove punctuation
doc.corpus <- tm_map(doc.corpus, function(x)removeWords(x,stopwords("english"))) #### remove stopwords
return(doc.corpus)
}
xi.p <- Preprocessing(xi) # Send the array xi to the function for data cleaning
tsai.p <- Preprocessing(tsai) # Output the results to xi.p and tsai.p (class "Corpus")
tsai2020.p <- Preprocessing(tsai2020) # Output the results to xi.p and tsai.p (class "Corpus")
substr(xi.p$content,1,200) # Inspect the first 200 words of Xi's "cleaned" speech
## [1] "resolving taiwan question complete reunification historic task since 1949 communist party china cpc chinese government chinese people always unwaveringly taken resolving taiwan question reali"
substr(tsai2020.p$content,1,200) # Inspect the first 200 words of Tsai's "cleaned" speech
## [1] "friends domestic international media thank patience begin like thank everyone voted today regardless voted taking part election put democratic values practice presidential electi"
Ok. So far so good. We now compare the highest frequency terms used in the speeches. To do this task, we create a term document matrix, which stores terms used (by rows) and the two leaders (by columns).
tdm <- TermDocumentMatrix(Corpus(VectorSource(c(xi.p$content,tsai.p$content,tsai2020.p$content)))) # Create a term document matrix
tdm <- as.matrix(tdm) # convert it into a standard matrix
colnames(tdm) <- c("Xi's speech","Tsai's Statement","Tsai's 2020 Speech") # Assign the names to the columns
Here you go. The top-5 high frequency term plot is presented. The variable “Num_of_term_shown” is defined to control the number of displayed terms. You modify the number, i.e. first line “Num_of_terms_shown <- 5” which means 5 is asigned to the variable “Num_of_terms_shown”, and rerun the program to see the changes.
Num_of_terms_shown <- 5
Create_barplot <- function(cname,t,title2,display_color){
freqterm <- tdm[,cname]
barplot <- data.frame(name=names(freqterm),y=freqterm)
barplot <- barplot[order(barplot$y,decreasing=TRUE),]
barplot$name <- factor(barplot$name, levels = barplot$name)
barplot <- barplot[1:t,]
p_output <- plot_ly(barplot, x = ~name, y = ~y, type = 'bar',
text = ~y, textposition = 'auto', name = cname,
marker = list(color = display_color, line = list(color = display_color, width = 1.5)))
p_output <- layout(p_output, title = title2, xaxis = list(title = ""), yaxis = list(title = ""))
}
p1 <- Create_barplot("Xi's speech",Num_of_terms_shown,"","red")
p2 <- Create_barplot("Tsai's Statement",Num_of_terms_shown,"","green")
p3 <- Create_barplot("Tsai's 2020 Speech",Num_of_terms_shown,"Top 5 Terms Used in Xi Jingping/Tsai Ing-wen's Speech","lightgreen")
p <- subplot(p1,p2,p3,shareY=T)
layout(p, showlegend = T)
Question: is term frequency a good way for comparison? If not, what is a better option? How can we make a better plot? Hints:
print(paste0("Total number of terms of Xi's speech (cleaned version):",sum(tdm[,"Xi's speech"])))
## [1] "Total number of terms of Xi's speech (cleaned version):652"
print(paste0("Total number of terms of Tsai's response statement (cleaned version):",sum(tdm[,"Tsai's Statement"])))
## [1] "Total number of terms of Tsai's response statement (cleaned version):424"
print(paste0("Total number of terms of Tsai's 2020 victory speech (cleaned version):",sum(tdm[,"Tsai's 2020 Speech"])))
## [1] "Total number of terms of Tsai's 2020 victory speech (cleaned version):541"
Next, we visualize the text frequency by using wordcloud function and put them side-by-side.
min.freq <- 3 # Minimium number of word frequency
par(mfrow=c(1,3)) # 1x3 panel plot
par(mar=c(0.75, 0.75, 0.75, 0.75)) # Set the plot margin
par(bg="black") # background color is black
par(col.main="white") # Title color is white
wordcloud(xi.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"),family="serif")
title("Xi's Statement")
wordcloud(tsai.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"),family="mono")
title("Tsai's Speech")
wordcloud(tsai2020.p, scale=c(5,.5),min.freq=min.freq, max.words=Inf, random.order=F, colors=brewer.pal(8, "Accent"), family="Georgia")
title("Tsai's 2020 Speech")
Finally, we generate a comparison wordcloud, which compares the relative frequency with which a term was used in the two speeches. For example, Xi used the word “reunification” and “said” more frequenctly than did Tsai and the word cloud thus prints the term “reunification”, “comnpatriots” and “said” in the side of Xi. By the same token, you find “taiwaness”, “crossstrait”, “democracy”, and “consultation” are closer to Tsai’s side. The plot shows the difference between the language usage of the leaders and sheds light on the discourse of their speeches.
comparison.cloud(tdm,max.words=100,random.order=FALSE, colors=c("red","green","lightgreen"),family="Georgia")