I wanted to take a look at the reactions of two of my preferred political Commentators, Bill Maher and Charles Blow. My newly developed skills in Twitter sentiment analysis comes in handy; I can get a sense of what these pundits have to say without thoroughly engaging with their material.it’s too soon for that.
@BillMaher America needs you more than ever, with me and all the rest of #TheResistance, until we can figure out how to really #MAGA! #WereStillHere
#load tweets and source
number_of_tweets <- 2000
RT <- userTimeline('@BillMaher', n = number_of_tweets)
RT_df <- twListToDF(RT)
RT_tweets <- RT_df %>%
select(id, statusSource, text)
Bill Maher mainly tweets from these sources. I’ve selected the top 10 to make sure I include all the platforms that he uses.
The frequency breakdown of the origin of his tweets:
# trim tweet to cleanly reveal status source and percentage of tweets from that source
RT_df$statusSource = substr(RT_df$statusSource,
regexpr('>', RT_df$statusSource) + 1,
regexpr('</a>', RT_df$statusSource) - 1)
RT_platform <- RT_df %>% group_by(statusSource) %>% summarise(n = n()) %>% mutate(percent = n/sum(n)) %>% arrange(desc(n))
kable(RT_platform %>% select(Origin_of_Tweet = statusSource, Number_of_tweets = n, Percent = percent) %>% top_n(10), digits = 2)
Origin_of_Tweet | Number_of_tweets | Percent |
---|---|---|
Twitter Web Client | 130 | 0.57 |
Twitter for iPhone | 43 | 0.19 |
Media Studio | 24 | 0.11 |
WhoSay | 17 | 0.07 |
7 | 0.03 | |
iOS | 4 | 0.02 |
SnapStream TV Search | 3 | 0.01 |
In order to do a sentiment analysis of the tweets, the words in the sentence or phrase need to be isolated.
Some of Bill Maher’s common words that will be matched to sentiments include:
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
RT_words <- RT_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
RT_words %>% count(word) %>% arrange(n) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=5, random.order=FALSE, rot.per=.15, colors=brewer.pal(9,"Dark2")))
#list of most common words used in tweets
kable(head(RT_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% select(Word = word, Number_of_tweets = n) %>% top_n(4)))
Word | Number_of_tweets |
---|---|
trump | 41 |
tonight | 20 |
hillary | 18 |
live | 15 |
In order to do a sentiment analysis, we need to load the sentiment categories.
nrc <- sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
With the sentiments loaded, we can categorize the words with a tble join.
Surprisingly, the most frequent sentiment expressed with Bill Maher’s words is:
RT_words_sentiments <- RT_words %>% inner_join(nrc, by = "word")
kable(RT_words_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% select(Sentiment = sentiment, Number_of_tweets = n) %>% top_n(1))
Sentiment | Number_of_tweets |
---|---|
positive | 172 |
Below is a slice of recent tweets with the word that aligns them in the “positive” sentiment.
# identify tweets that align with the 'positive' sentiment
pos_tw_ids <- RT_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)
kable(RT_df %>% inner_join(pos_tw_ids, by = "id") %>% select(Date_Time = created,Tweet = text, Word = word) %>% slice(1:4))
Date_Time | Tweet | Word |
---|---|---|
2016-11-20 00:59:35 | I’d give a week’s pay to hear that sermon! https://t.co/LEIb4gYGlv | pay |
2016-11-20 00:59:35 | I’d give a week’s pay to hear that sermon! https://t.co/LEIb4gYGlv | sermon |
2016-11-18 21:40:58 | Getting SO tired of hearing “the ppl voted for change”. Actually, she won. What has to change is “we win election, they get to be president” | president |
2016-11-17 19:54:18 | Doesn’t @Mike_Pence look like the guy the airlines hire to play the Captain in the pre-flight video? https://t.co/oGQiMJiXrh | hire |
#identify tweets that align with the 'negative' sentiment
neg_id_words <- RT_words_sentiments %>% filter(sentiment == "disgust") %>% distinct(id, word)
kable(RT_df %>% inner_join(neg_id_words, by = "id") %>% select(Date_Time = created,Tweet = text, Word = word) %>% slice(1:4))
Date_Time | Tweet | Word |
---|---|---|
2016-11-17 20:21:44 | Since Trump got elected-slash-normalized, I’ve had weird dreams - anybody? A big orange skyscraper is chasing me - what does it mean???!! | weird |
2016-11-08 21:17:44 | Shit just got real. #UseYourVote #Millennials https://t.co/nbNQ09pwPI | shit |
2016-11-08 14:41:11 | Pls vote for Hillary today. Even if you don’t like her, its necessary to block a dangerous lunatic ultimate power. #ThisTimeIsDifferent | lunatic |
2016-11-05 05:00:12 | Thank Trump for the one good thing he did. He exposed Evangelicals, who are his supporters as the shameless hippocr https://t.co/v0Mq26BXQ8 | shameless |
@CharlesMBlow “I’m always surprised when a column resonates with ppl bc I struggle so much to write them. Always worry that they’ll be bad.#TheResistance” Charles Blow writes a regular opinion column for the New York Times each Monday and Thursday.
America Elects a Bigot is his most recent
CB <- userTimeline('@CharlesMBlow', n = number_of_tweets)
CB_df <- twListToDF(CB)
CB_tweets <- CB_df %>%
select(created, id, statusSource, text)
Below are his most recent tweets.
kable(head(CB_tweets %>% select(Date_Time = created, Tweet = text)))
Date_Time | Tweet |
---|---|
2016-11-20 15:22:08 | Interesting https://t.co/Eao3BQvG2B |
2016-11-20 15:16:07 | Ugh https://t.co/o9FXORh4Ap |
2016-11-20 15:08:55 | Scheduled to be on @cnnreliable at 11 a.m. ET. Tune in if you can https://t.co/1WbDNGwM5H |
2016-11-20 05:01:59 | I. Can’t. Even https://t.co/gHg12YFCQ9 |
2016-11-20 01:32:57 | . @joehick58 I’m not scared Joe https://t.co/TWE0TXOZXj |
2016-11-20 01:22:25 | President-elect Trump, I’m just going to let the amazing Fannie Lou Hamer speak for me #NotFoolingAnybody https://t.co/fzt8ESoUyd |
He mainly tweets from one source, his iPhone. But he occasionally uses other platforms.
CB_df$statusSource = substr(CB_df$statusSource,
regexpr('>', CB_df$statusSource) + 1,
regexpr('</a>', CB_df$statusSource) - 1)
CB_platform <- CB_df %>% group_by(statusSource) %>% summarise(n = n()) %>% mutate(percent = n/sum(n)) %>% arrange(desc(n))
kable(CB_platform %>% select(Origin_of_Tweet = statusSource, Number_of_tweets = n, Percent = percent) %>% top_n(10), digits = 2)
Origin_of_Tweet | Number_of_tweets | Percent |
---|---|---|
Twitter for iPhone | 111 | 0.89 |
Twitter Web Client | 11 | 0.09 |
3 | 0.02 |
CB_words <- CB_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
CB_words %>% count(word) %>% arrange(desc(n)) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=5, random.order=FALSE, rot.per=.15, colors=brewer.pal(9,"Dark2")))
kable(CB_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% select(Word = word, Number_of_tweets = n) %>% top_n(4))
Word | Number_of_tweets |
---|---|
trump | 12 |
#electionnight | 8 |
#theresistance | 8 |
column | 8 |
Chales Blows’s most commonly occuring sentiment, once matched with the words is “negative”. I always see him as an optimist, but given the presidential election, his sentiments are probably dark.
CB_words_sentiments <- CB_words %>% inner_join(nrc, by = "word")
kable(CB_words_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% select(Sentiment = sentiment, Number_of_tweets = n)%>% top_n(1))
Sentiment | Number_of_tweets |
---|---|
negative | 59 |
Below are examples of the the tweet words that correlate with the “positive” and “disgust” sentiments.
pos_tw_ids <- CB_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)
kable(CB_df %>% inner_join(pos_tw_ids, by = "id") %>% select(Date_Time = created,Tweet = text, Word = word) %>% slice(1:4))
Date_Time | Tweet | Word |
---|---|---|
2016-11-20 01:22:25 | President-elect Trump, I’m just going to let the amazing Fannie Lou Hamer speak for me #NotFoolingAnybody https://t.co/fzt8ESoUyd | president |
2016-11-20 01:22:25 | President-elect Trump, I’m just going to let the amazing Fannie Lou Hamer speak for me #NotFoolingAnybody https://t.co/fzt8ESoUyd | elect |
2016-11-19 15:40:11 | Why is this man still on Twitter whining? I mean seriously. Aren’t you the president? Don’t you have some more raci https://t.co/2XQwz0cLHd | president |
2016-11-18 02:18:23 | Good lord, Armageddon is really near. Help us all… https://t.co/niV8mvGWyq | lord |
neg_id_words <- CB_words_sentiments %>% filter(sentiment == "disgust") %>% distinct(id, word)
kable(CB_df %>% inner_join(neg_id_words, by = "id") %>% select(Date_Time = created,Tweet = text, Word = word) %>% slice(1:4))
Date_Time | Tweet | Word |
---|---|---|
2016-11-19 04:32:27 | This whole thing is just a disaster. Everything Trump accused Hillary of he will soon be guilty of https://t.co/ZvggxHAHIq | disaster |
2016-11-18 02:18:23 | Good lord, Armageddon is really near. Help us all… https://t.co/niV8mvGWyq | lord |
2016-11-17 18:42:59 | Why aren’t more ppl aghast that Megan Kelly sat on all these accusations abt Team Trump until after voters couldn’t consider them?! #BadBiz | aghast |
2016-11-17 15:46:13 | And this is the man more Americans judged as “honest and trustworthy”?! Is this real life or am I in a dream sequen https://t.co/HUBpMvWY5j | honest |
RT_platform$Commentator <- "Bill Maher"
CB_platform$Commentator <- "Charles Blow"
RT_words_sentiments$Commentator <- "Bill Maher"
CB_words_sentiments$Commentator <- "Charles Blow"
platform2 <- rbind(RT_platform, CB_platform)
words_sentiments2 <- rbind(RT_words_sentiments, CB_words_sentiments)
joint_df <- words_sentiments2 %>% group_by(Commentator, sentiment) %>% summarise(n = n()) %>% mutate(frequency = n/sum(n))
ggplot(joint_df, aes(x = sentiment, y = frequency, fill = Commentator)) + geom_bar(stat = "identity", position = "dodge") + xlab("Sentiment") + ylab("Percent of tweets") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_fill_manual(values=c("deeppink3" , "darkturquoise")) + ggtitle("Frequency of Sentiment Expressed")
In the final visualization, I compare the source of each person’s tweets. It looks like Bill Maher writes from his computer, while Charles Blow composes his on his phone. For some reason this suprised me, since he writes a column for a living. If I could explore this further, I would like to know if Charles Blow writes more negative tweets because he is so comfortable sending them from the mobile platform. This way he can compose them in the “heat of the moment” and without taking an opportunity to diffuse his “anger”.
pf <- c("Twitter Web Client", "Twitter for iPhone", "Media Studio", "Instagram")
pf_df <- platform2 %>% filter(statusSource %in% pf)
ggplot(pf_df, aes(x = statusSource, y = percent, fill = Commentator)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Platform") +
ylab("Percent of tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_manual(values=c("deeppink3" , "darkturquoise"))+ ggtitle("Source of Tweets")