I wanted to take a look at the reactions of two of my preferred political commentators, Bill Maher and Charles Blow. My newly developed skills in Twitter sentiment analysis comes in handy; I can get a sense of what these pundits have to say without thoroughly engaging with their material…it’s too soon for that.
#load tweets and source
number_of_tweets <- 2000
RT <- userTimeline('@BillMaher', n = number_of_tweets)
RT_df <- twListToDF(RT)
RT_tweets <- RT_df %>%
select(id, statusSource, text)
Bill Maher mainly tweets from these sources:
#most frequent tweeting sources
kable(head(RT_df %>% group_by(statusSource)) %>%
summarise(n = n()) %>%
top_n(10))
statusSource | n |
---|---|
Twitter Web Client | 1 |
Twitter for iPhone | 1 |
Media Studio | 4 |
The breakdown of the origin of his tweets:
# trim tweet to cleanly reveal status source and percentage of tweets from that source
RT_df$statusSource = substr(RT_df$statusSource,
regexpr('>', RT_df$statusSource) + 1,
regexpr('</a>', RT_df$statusSource) - 1)
RT_platform <- RT_df %>% group_by(statusSource) %>% summarise(n = n()) %>% mutate(percent = n/sum(n)) %>% arrange(desc(n))
kable(RT_platform %>% top_n(10), digits = 2)
statusSource | n | percent |
---|---|---|
Twitter Web Client | 156 | 0.70 |
Twitter for iPhone | 26 | 0.12 |
Media Studio | 16 | 0.07 |
WhoSay | 13 | 0.06 |
5 | 0.02 | |
SnapStream TV Search | 4 | 0.02 |
iOS | 2 | 0.01 |
In order to do a sentiment analysis of the tweets, the words in the sentence or phrase need to be isolated.
Some of Bill Maher’s common words that will be matched to sentiments include:
#trim
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
RT_words <- RT_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
#word cloud
RT_words %>% count(word) %>% arrange(n) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=5, random.order=FALSE, rot.per=.15, colors=brewer.pal(9,"Dark2")))
#list of most common words used in tweets
kable(head(RT_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(4)))
word | n |
---|---|
trump | 43 |
hillary | 17 |
live | 13 |
tonight | 13 |
In order to do a sentiment analysis, we need to load the sentiment categories.
nrc <- sentiments %>% filter(lexicon == "nrc") %>% select(word, sentiment)
With the sentiments loaded, we can categorize the words with a tble join.
Surprisingly, the most frequent sentiment expressed with Bill Maher’s words is:
#most commonly occuring sentiment
RT_words_sentiments <- RT_words %>% inner_join(nrc, by = "word")
kable(RT_words_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% top_n(1))
sentiment | n |
---|---|
positive | 184 |
Below is a slice of recent tweets with the word that aligns them in the “positive” sentiment.
#identify tweets that align with the 'positive' sentiment
pos_tw_ids <- RT_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)
kable(RT_df %>% inner_join(pos_tw_ids, by = "id") %>% select(created,text, word) %>% slice(1:4))
created | text | word |
---|---|---|
2016-11-13 23:31:04 | “This is a moral 9/11. Only 9/11 was done to us from the outside and we did this to ourselves.” (@tomfriedman) https://t.co/E9pQXotETY | moral |
2016-11-12 06:19:32 | America needs you more than ever, with me and all the rest of #TheResistance, until we can figure out how to really https://t.co/KGBWkQz6XJ | rest |
2016-11-08 21:17:44 | Shit just got real. #UseYourVote #Millennials https://t.co/nbNQ09pwPI | real |
2016-11-08 14:41:11 | Pls vote for Hillary today. Even if you don’t like her, its necessary to block a dangerous lunatic ultimate power. #ThisTimeIsDifferent | vote |
And below is a sample of tweets with the word that corresponds to a sentiment of “disgust.”
#identify tweets that align with the 'negative' sentiment
neg_id_words <- RT_words_sentiments %>% filter(sentiment == "disgust") %>% distinct(id, word)
kable(RT_df %>% inner_join(neg_id_words, by = "id") %>% select(created,text, word) %>% slice(1:4))
created | text | word |
---|---|---|
2016-11-08 21:17:44 | Shit just got real. #UseYourVote #Millennials https://t.co/nbNQ09pwPI | shit |
2016-11-08 14:41:11 | Pls vote for Hillary today. Even if you don’t like her, its necessary to block a dangerous lunatic ultimate power. #ThisTimeIsDifferent | lunatic |
2016-11-05 05:00:12 | Thank Trump for the one good thing he did. He exposed Evangelicals, who are his supporters as the shameless hippocr https://t.co/v0Mq26BXQ8 | shameless |
2016-10-20 02:40:35 | Final thought: Hillary won the debate, but Alec Baldwin did a great job intensifying Trump’s insanity. That was Alec Baldwin, right? | insanity |
Charles Blow writes a regular opinion column for the New York Times each Monday and Thursday.
America Elects a Bigot is his most recent
CB <- userTimeline('@CharlesMBlow', n = number_of_tweets)
CB_df <- twListToDF(CB)
CB_tweets <- CB_df %>%
select(id, statusSource, text)
Below are his most recent tweets.
kable(head(CB_tweets %>% select(text)))
“NYT says subscriptions are up in response to Trump”
https://t.co/SYLQMFOigt
Thanks Lisa! #TeamFire https://t.co/Xp498zks6o
File my Monday columns on Friday. Which I had time to write a diff column abt this #Bannon announcement. #TheResistance
Over the course of this nearly 2-year campaign I haven’t heard Trump make a literary ref. Don’t believe he reads books. Scary thing # 53,478 So, this H. L. Mencken quote is again making the rounds. Of corse it was written pre-mass media, but still interest
https://t.co/tdHBhwLoVx What the
? https://t.co/oVvft9edx3
He mainly tweets from one source, his iPhone. But he occasionally uses other platforms.
CB_df$statusSource = substr(CB_df$statusSource,
regexpr('>', CB_df$statusSource) + 1,
regexpr('</a>', CB_df$statusSource) - 1)
CB_platform <- CB_df %>% group_by(statusSource) %>% summarise(n = n()) %>% mutate(percent = n/sum(n)) %>% arrange(desc(n))
kable(CB_platform %>% top_n(10), digits = 2)
statusSource | n | percent |
---|---|---|
Twitter for iPhone | 112 | 0.85 |
Twitter Web Client | 18 | 0.14 |
2 | 0.02 |
CB_words <- CB_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
CB_words %>% count(word) %>% arrange(desc(n)) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=5, random.order=FALSE, rot.per=.15, colors=brewer.pal(9,"Dark2")))
kable(CB_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(4))
word | n |
---|---|
column | 11 |
#electionnight | 10 |
ppl | 9 |
#theresistance | 8 |
Chales Blows’s Most commonly occuring sentiment, once matched with the words is “negative”. I always see him as an optimist, but given the presidential election, his sentiments are probably dark.
CB_words_sentiments <- CB_words %>% inner_join(nrc, by = "word")
kable(CB_words_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% top_n(1))
sentiment | n |
---|---|
negative | 82 |
Below are examples of the the tweet words that correlate with the “positive” and “disgust” sentiments.
pos_tw_ids <- CB_words_sentiments %>% filter(sentiment == "positive") %>% distinct(id, word)
kable(CB_df %>% inner_join(pos_tw_ids, by = "id") %>% select(created,text, word) %>% slice(1:4))
created | text | word |
---|---|---|
2016-11-13 19:25:21 | So, this H. L. Mencken quote is again making the rounds. Of corse it was written pre-mass media, but still interest https://t.co/tdHBhwLoVx | quote |
2016-11-13 18:54:11 | Proper punctuation dictates a question mark if that’s a question. Idiot. Don’t you have other things to worry about https://t.co/OVT5MeSgCp | proper |
2016-11-13 18:54:11 | Proper punctuation dictates a question mark if that’s a question. Idiot. Don’t you have other things to worry about https://t.co/OVT5MeSgCp | question |
2016-11-13 18:19:33 | Your life isn’t only measured by what happens in it (sometimes you can’t control that) but how you DEAL with what happens #TheResistance | measured |
neg_id_words <- CB_words_sentiments %>% filter(sentiment == "disgust") %>% distinct(id, word)
kable(CB_df %>% inner_join(neg_id_words, by = "id") %>% select(created,text, word) %>% slice(1:4))
created | text | word |
---|---|---|
2016-11-13 18:54:11 | Proper punctuation dictates a question mark if that’s a question. Idiot. Don’t you have other things to worry about https://t.co/OVT5MeSgCp | idiot |
2016-11-13 18:23:01 | I am now going back to read Jim Crow history and concentrating on how ppl sustained themselves against state hostility #TheResistance | hostility |
2016-11-12 05:18:39 | I think I’ve received as much response from this “America Elects a Bigot” column as any column I’ve ever written. Not sure how to process | bigot |
2016-11-11 23:02:32 | Oh no. The lord is still working on me. You can’t put you face and hands in my car. Not NEVER https://t.co/kbdtTfj8I1 | lord |
RT_platform$commentator <- "Bill Maher"
CB_platform$commentator <- "Charles Blow"
RT_words_sentiments$commentator <- "Bill Maher"
CB_words_sentiments$commentator <- "Charles Blow"
platform2 <- rbind(RT_platform, CB_platform)
words_sentiments2 <- rbind(RT_words_sentiments, CB_words_sentiments)
joint_df <- words_sentiments2 %>% group_by(commentator, sentiment) %>% summarise(n = n()) %>% mutate(frequency = n/sum(n))
ggplot(joint_df, aes(x = sentiment, y = frequency, fill = commentator)) + geom_bar(stat = "identity", position = "dodge") + xlab("Sentiment") + ylab("Percent of tweets") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_fill_manual(values=c("deeppink3" , "darkturquoise"))
In the final visualization, I compare the source of each person’s tweets. It looks like Bill Maher writes from his computer, while Charles Blow composes his on his phone. For some reason this suprised me, since he writes a column for a living. If I could explore this further, I would like to know if Charles Blow writes more negative tweets because he is so comfortable sending them from the mobile platform. This way he can compose them in the “heat of the moment.”
pf <- c("Twitter Web Client", "Twitter for iPhone", "Media Studio", "Instagram")
pf_df <- platform2 %>% filter(statusSource %in% pf)
ggplot(pf_df, aes(x = statusSource, y = percent, fill = commentator)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Platform") +
ylab("Percent of tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
scale_fill_manual(values=c("deeppink3" , "darkturquoise"))