Let’s load the libraries. Make sure these libraries are installed on your computer.

library(ggplot2)
library(lubridate)
library(scales)
library(tm)
library(stringr)
library(wordcloud)
library(syuzhet)
library(reshape2)
library(dplyr)
library(twitteR)

Set up Twitter API for data mining. Use your own API key and token.

## [1] "Using direct authentication"

With the API set up, run the following code to download Trump’s tweets. Let’s go with his latest 2,000 tweets. We will exclude his retweets, but include his replies to other users.

alltweets <- twListToDF(userTimeline("realdonaldtrump", n=2000, includeRts=FALSE, excludeReplies=FALSE))

Run the following code to clean up timestamps in the Trump’s tweets. Specifically, we will convert the timestamps of his tweets to the same time zone.

#extrct year, month, day, hour, minute and second from timestamps
alltweets$created <- ymd_hms(alltweets$created) 
#convert to Eastern Time Zone
alltweets$created <- with_tz(alltweets$created, "America/New_York") 

Conducy a quick sentiment analysis of the presidential tweets. We will use the R library called syuzhet to extract sentiments from tweets.

#remove special characters in the column called text and put the cleaned text in the new column called clean_text
alltweets$clean_text <- str_replace_all(alltweets$text, "@\\w+", "")
#extract sentiments from tweets. The sentiment information is stored in the dataframe called Sentiment
Sentiment <- get_nrc_sentiment(alltweets$clean_text)
#Combine two dataframes (alltweets and Sentiment) into a new one called alltweets_senti
alltweets_senti <- cbind(alltweets, Sentiment)

The previous step creates a dataframe called alltweets_senti. Check out the new dataframe and compare it to the old dataframe alltweets. See what new columns are created. Then run the following code to calculate daily average sentiment score.

#create a new column in alltweets_senti for date information
alltweets_senti$created_date <- as.Date(alltweets_senti$created)

#calculate the daily summary statistics. The summary daily statistics will be in the new dataframe called dailysentiment.
alltweets_senti$created_date <- as.factor(alltweets_senti$created_date)

dailysentiment <- alltweets_senti %>% group_by(created_date) %>% 
  summarise(anger = mean(anger), 
            anticipation = mean(anticipation), 
            disgust = mean(disgust), 
            fear = mean(fear), 
            joy = mean(joy), 
            sadness = mean(sadness), 
            surprise = mean(surprise), 
            trust = mean(trust)) %>% melt
## Using created_date as id variables
names(dailysentiment) <- c("day", "sentiment", "meanvalue")
dailysentiment$day <- as.Date(dailysentiment$day)

Let’s use the library ggplot2 to visualize the daily sentiments in Trump’s tweets.

ggplot(data = dailysentiment, aes(x = day, y = meanvalue, group = sentiment)) +
  geom_line(size = 1.5, alpha = 0.7, aes(color = sentiment)) +
  geom_point(size = 0.5) +
  ylim(0, NA) +
  theme(legend.title=element_blank(), axis.title.x = element_blank()) +
  ylab("Average sentiment score") + 
  ggtitle("Sentiments Over Time")

Want a better visual? Use the R library RCharts for interactive visualization.

To use RCharts, you must first install the library devtools. After devtools is installed. Run:

require(devtools)
## Loading required package: devtools
install_github('rCharts', 'ramnathv')
## Skipping install of 'rCharts' from a github remote, the SHA1 (479a4f98) has not changed since last install.
##   Use `force = TRUE` to force installation
library(rCharts)
dailysentiment$day_show <- as.character(dailysentiment$day)

h1 <- hPlot(x = "day_show", y = "meanvalue", data = dailysentiment, type = "line", group = "sentiment")
h1$print("chart5",include_assets = TRUE)

This tutorial is developed for COMM497DB Fall 2017, taught at UMass-Amherst.

If you find this tutorial helpful and would like to use it in your projects, please acknowledge the source:

Xu, Weiai W. (2017). How to Detect Sentiments from Donald Trump’s Tweets?. Amherst, MA: http://curiositybits.com