Introduction

Our President Donald Trump has maintained a strong presence on social media, specifically Twitter, over the years. He has been very outspoken on certain political issues. Once elected as the President of the United States, he continued to tweet and stay proactive on social media. Even as he was running for office he utilized Twitter to gain momentum and voters. Donald Trump has also mentioned that social media had more influence on his campaign than any amount of money spent by other politicains on their own.

However, many of the President’s tweets are controversial in nature and are seen as an incorrect way of communicating for someone holding the highest office in our country. With these tweets though, some of Donald Trump’s behavior can be analyzed and a few research questions are rasied:

To answer these questions, Donald Trump’s tweets will be analyzed through the help of an archive containing his tweets from the start of his presidency to the current day.

library(ggplot2)
library(readr)
library(dplyr)
library(ROAuth)
library(tm)
library(stm)
library(stringr)
library(tidyverse)
library(lubridate)
library(wordcloud)
library(ggthemes)
library(tidytext)
library(ggsci)
library(Zelig)

Reading in the Archive

trump_tweets <- read.csv("C:/Users/Vorbej1/Desktop/trump_tweet.csv")
head(trump_tweets)

This is a sample view of the data. The archive contains the tweets and the date and time of creation.

Data Preparation

library(lubridate)
library(stringr)
library(tidyverse)
library(stm)
trump_tweets <- trump_tweets %>%
  mutate(created_at = mdy_hms(created_at)) 
  
trump_tweets$hour <- hour(trump_tweets$created_at)
trump_tweets <- na.omit(trump_tweets)

Frequency of the President’s Tweets by Hour

trump_tweets_plot <-trump_tweets %>%
  group_by(hour) %>%
  summarise(count = n())

center_title <- theme(plot.title = element_text(hjust=.5))

ggplot(trump_tweets_plot, aes(x=hour, y=count)) + geom_line(size=1.2,color = 'red') + geom_point(color='black') + theme_calc() + labs(title='realDonaldTrump - Tweets by Hour', y='Number of Tweets', x='Hour') + center_title + theme(text = element_text(size=12, face = 'bold'))

When looking at the frequency of Donald Trump’s tweets by hour, some interesting observations arise. The President is active on Twitter from 12:00 midnight up until around 2:00-3:00 A.M. At these times, he’s just as active on Twitter as he is from 3:00 P.M to 11:00 P.M. There is a large dip in tweet frequency earlier in the morning up until about 12:00 P.M, where the activity spikes.

What Words does the President Say the Most?

library(tm)
library(wordcloud)
cloud <- Corpus(VectorSource(trump_tweets$text))
cloud <- cloud %>%
  tm_map(removePunctuation) %>%
  tm_map(stripWhitespace) %>%
  tm_map(removeNumbers)%>%
  tm_map(content_transformer(tolower)) %>%
  tm_map(removeWords, stopwords('english')) %>%
  tm_map(removeWords, c('amp','realdonaldtrump'))
strwrap(as.character(cloud))

wordcloud(cloud, max.words = 35, scale = c(4, 1),colors = topo.colors(n = 10), random.color = TRUE)

Next, a wordcloud is generated to visualize the president’s most used words in his tweets. According to his tweets, Donald Trump is still referencing Hillary Clinton at a high rate. The word ‘crooked’ also appears which is a term he calls Hillary Clinton. It’s interesting to see that 2 years after winning the presidency, he still mentions and makes comments about Clinton. ‘Fake’ and ‘news’ also appear on the cloud. Donald Trump rountinely claims that our news outlets produce ‘fake news’ and that the media is untrustworthy, which could be the reason for this. It was also interesting to see that he uses adjectives at a high frequency such as ‘great’,‘many’, ‘good’, and ‘big’.

Frequency of Sentiment in Tweets

trump_tweets$text <-  sapply(trump_tweets$text, function(row) iconv(row, 'latin1','ASCII',sub=""))
trump_sentiment <- trump_tweets %>%
  unnest_tokens(word, text)

trump_sentiment_freq <- trump_sentiment %>%
  inner_join(get_sentiments("nrc")) %>%
  count(sentiment, sort = TRUE) %>%
  mutate(sentiment = reorder(sentiment, n)) %>%
  ggplot(aes(sentiment,n, fill=sentiment)) + geom_col(color='white', stat='identity') + theme_calc() + labs(x='Sentiment', y='Count') + scale_fill_futurama() + theme(text = element_text(size = 15))


trump_sentiment_freq

To perform this analysis, each tweet’s word is separated into its own row which is then joined with a sentiment library.

After plotting the sentiments, I was surprised to see a high volume of positively related words. However, after seeing the word cloud, the President’s high frequency of adjective use could contribute to this, as well as him speaking highly of our country and other patriotic rhetoric.

Negative sentiments come in at the third highest frequency, which I was expecting to be high. The distribution of many of the other sentiments have slight decreases between them, until we reach disgust which comes in as the lowest sentiment.

Frequency of Words by Sentiment

trump_sentiment_freq2 <- trump_sentiment %>%
  inner_join(get_sentiments("nrc")) %>%
  count(word, sentiment, sort = TRUE) %>%
  group_by(sentiment) %>%
  top_n(10) %>%
  ungroup() %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) + geom_col(show.legend=FALSE, stat='identity') + facet_wrap(~sentiment, scales='free_y', nrow=3) + labs(y = NULL, x = NULL) + coord_flip() + theme_calc() + scale_fill_futurama() + theme(text = element_text(size=20))


trump_sentiment_freq2

When looking at the most commonly used words by sentiment, a few observations stuck out. “Phony” and “witch” are commonly used terms in the disgust sentiment. The word vote appears in many as it isn’t classified into one real sentiment, but it tells me that vote is a term used frequently in his tweeting. Aside from positive, negative, and trust sentiments, the President tweets words from different sentiments pretty evenly.

Does the President Tweet More About ‘Fake News’ during Prime-time News Hours?

trump_tweets <- trump_tweets %>%
  mutate(fake_count = str_count(text, "fake news|Fake news|Fake News|FAKE NEWS|FAKE news")) %>%
  mutate(fake_count_int = as.integer(str_detect(text, "fake news|Fake news|Fake News|FAKE NEWS"))) %>%
  mutate(prime_news_hours = as.integer(ifelse(hour %in% c(20,21,22,23),1,0)))

fake_news_model <- zelig(fake_count_int ~ prime_news_hours, model = 'logit', data = trump_tweets, cite = FALSE)
x.yes <- setx(fake_news_model, prime_news_hours = 1)
x.no <- setx(fake_news_model, prime_news_hours = 0)
fnm_out <- sim(fake_news_model, x=x.yes,x1=x.no)

summary(fnm_out)
## 
##  sim x :
##  -----
## ev
##           mean          sd        50%       2.5%      97.5%
## [1,] 0.0157156 0.002903854 0.01543189 0.01078092 0.02217665
## pv
##          0     1
## [1,] 0.986 0.014
## 
##  sim x1 :
##  -----
## ev
##           mean          sd        50%       2.5%     97.5%
## [1,] 0.0248147 0.001908464 0.02474562 0.02139599 0.0286891
## pv
##          0     1
## [1,] 0.978 0.022
## fd
##             mean          sd         50%        2.5%      97.5%
## [1,] 0.009099097 0.003474197 0.009206473 0.001540125 0.01577591

In order to see if the President tweets more about ‘fake news’ during prime time news hours, a logistic regression is performed. Different spellings of ‘fake news’ are searched for in his tweets while looking to see if they are tweeted during 8:00 - 11:00 P.M.

Looking at the first differences, there is a .009% chance that Donald Trump will tweet about this during those specific hours. I was expecting a much higher rate than than this as .009 is extremely minimal.

Does the President Use More Negative Rhetoric Late at Night?

trump_tweets <- trump_tweets %>%
  mutate(neg_wording = str_count(text, "phony|witch|fake|terrible|bad|collusion|failing|dishonest|disaster|crooked"))  %>%
  mutate(neg_wording_int = as.integer(str_detect(text,"phony|witch|fake|terrible|bad|collusion|failing|dishonest|disaster|crooked"))) %>%
  mutate(late_hours = as.integer(ifelse(hour %in% c(0,1,2,3,23),1,0)))

neg_word_model <- zelig(neg_wording_int ~ late_hours, model = 'logit', data = trump_tweets, cite = FALSE)
x.yes2 <- setx(neg_word_model, late_hours = 1)
x.no2 <- setx(neg_word_model, late_hours = 0)
lnm_out <- sim(neg_word_model, x=x.yes2,x1=x.no2)

summary(lnm_out)
## 
##  sim x :
##  -----
## ev
##            mean          sd        50%      2.5%      97.5%
## [1,] 0.04852798 0.004874823 0.04848171 0.0398416 0.05850006
## pv
##          0     1
## [1,] 0.956 0.044
## 
##  sim x1 :
##  -----
## ev
##            mean          sd        50%       2.5%      97.5%
## [1,] 0.07044923 0.003250367 0.07036118 0.06412234 0.07682094
## pv
##          0     1
## [1,] 0.928 0.072
## fd
##            mean          sd        50%       2.5%      97.5%
## [1,] 0.02192125 0.005822963 0.02190086 0.01043644 0.03312568

Another logistic regression is performed to see whether or not the President uses less friendly rhetoric when tweeting late at night. Late hours in which the President tweets frequently during are analyzed to see how often a certain set of words is said.

The first difference from this model tells me that Donald Trump is .02% more likely to tweet these words during late hours. While this is a positive correlation, the probability is close to 0%. I was expecting a much higher percentage as I believed late at night his vocabulary would include less positive wordage.

Summary

Analyzing the President’s tweets revealed some interesting information. I was surprised to see that the highest sentiment frequency was for positive words. However he still uses a high amount of negative sentiments, which was 3rd most. The word cloud revealed that the President is still tweeting about the ex-Democratic Presidential nominee, Hillary Clinton, at a high rate, 2 years after winning the presidency. As I was aware he frequently tweets about ‘Crooked Hilary’ I didn’t believe it was to this degree. I was also interested in the fact that ‘phony’ and ‘witch’ are words commonly used by him, which fall under the disgust sentiment.

The regression results indicate that the probabilty of Donald Trump tweeting about certain topics is not related to specific hours. I believed Donald Trump would be tweeting about ‘Fake News’ during prime time news hours to dispell stories and coverage on him when the largest audience is viewing. For using a certain vocabulary late at night, the regression model resulted in an extremely low positive probability of .02%.

With the ability to analyze text data from tweets, many research questions can be explored and answered.