##Overview
The following Tweet analysis is a deep dive into the content and public opinions surrounding one of my favorite Tweeters, Elon Musk. This lexical analysis is two-part: first is a categorical sentiment analysis surrounding Musk’s tweets and the tweets directed @ him, and second is a comparison of the use of the more extremely positive and extremely negative words used by both Musk and people who reach out to him via Twitter.
###Disclaimer
Before delving into my findings, I will disclose that in order to keep my analysis consistent, I wrote my twitteR calls into .csv files so I could use the same sample of tweets over the course of my project, instead of calling different samples every time I ran/knit my Markdown.
library(dplyr)
library(stringr)
library(tidytext)
library(twitteR)
library(ggplot2)
## [1] "Using direct authentication"
##Getting Started
First, I took a sample of 1000 tweets directed at Elon Musk. The step of converting the grabbed tweets into a dataframe and storing it into the directory is not shown here, because I called it once at the beginning of my work and have been using the subsequent .csv file instead, for reasons mentioned above. In this context, “mentions” and “responses” both refer to this same dataset.
num_tweets <- 1000
em_responses <- searchTwitter('@elonmusk', n = num_tweets)
em_df <- twListToDF(em_responses)
musk_mentions <- read.csv("em_mentions.csv", stringsAsFactors = FALSE)
head(musk_mentions)
Then I pulled 418 tweets sent out by Elon Musk’s official Twitter account.
em_tweets <- userTimeline('elonmusk', n = num_tweets)
emt_df <- twListToDF(em_tweets)
musk_tweets <- read.csv("em_tweets.csv", stringsAsFactors = FALSE)
head(musk_tweets)
##Analysis I: Sentiments Associated with Word Choices
###Isolating Words
Next, in order to perform lexical analysis, it is necessary to isolate and count the use of individual words from each of the tweets from both datasets. In order to do this, I used the code provided in the lecture notes for Unit 11 to grab the most used 25 words in tweets directed at Musk. I also took out the Twitter mention “@elonmusk” and “rt” because I wanted focus on counting the words that truly made up the content of tweets directed at Musk.
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
em_mentions_words <- musk_mentions %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]['@elonmusk']['rt']"))
em_mentions_words %>% group_by(word) %>%
summarise(n = n()) %>% arrange(desc(n)) %>% top_n(25)
Note that there are actually 28 rows here, due to the fact that some words were used an equal number of times (e.g. #bestthanksgiving and horrific, which were both used 151 times) in this tweet sample.
Then I constructed the same type of list again, except with tweets directly sent from Elon Musk.
em_tweets_words <- musk_tweets %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
em_tweets_words %>% group_by(word) %>%
summarise(n = n()) %>% arrange(desc(n)) %>% top_n(25)
Since this particular lexical analysis will be attempting to capture public opinion about Musk and Musk’s sentiments in his tweets, associating sentiment categories of tweets is best done using “nrc.”
Raw NRC Dataset:
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
head(nrc)
I then joined the NRC dataset with both Musk’s tweets and tweets directed at Musk.
Elon Musk’s tweet sentiments:
em_tweets_sentiments <- em_tweets_words %>% inner_join(nrc, by = "word")
em_tweets_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
Sentiments of tweets sent @ Elon Musk:
em_mentions_sentiments <- em_mentions_words %>% inner_join(nrc, by = "word")
em_mentions_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
My final step in this manipulation was combining the dataset to make one joined set from which a graph can be generated. In order to do this, I had to generate a new variable, “Musk,” which classified tweets based on whether or not they were tweeted by Elon Musk:
em_tweets_sentiments$Musk <- "Yes"
em_mentions_sentiments$Musk <- "No"
total_sentiments <- rbind(em_tweets_sentiments, em_mentions_sentiments)
em_sent_df <- total_sentiments %>%
group_by(Musk, sentiment) %>%
summarize(n = n()) %>%
mutate(perc_sent = 100*(n/sum(n)))
ggplot(em_sent_df, aes(x = sentiment, y = perc_sent, fill = Musk)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Sentiment") +
ylab("Percent of Tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
###Conclusion, Part I
It would appear that, as many of Musk’s Twitter followers would expect, he tends to be more excited about the future, more optimistic, less afraid, and less angry than those who tweet @ him. This does not come as a huge surprise to me, as in following his account for the past few years, I have certainly noticed that his updates tend to contain excitement surrounding the breakthroughs of his various tech companies.
##Analysis II: Extremely Emotional Words – Positive & Negative
Now, let’s take a look at the sentiment scoring of the most emotional words (words that scored a 3 or higher or a -3 or lower on the AFINN scale) used either by Elon Musk or in tweets directed @ Elon Musk.
First, I pulled the AFINN dataset and joined this with the two tweet words datasets used in the previous analysis.
finn <- sentiments %>%
filter(lexicon == "AFINN") %>%
select(word, score)
em_tweets_scored <- em_tweets_words %>% inner_join(finn, by = "word")
em_tweets_scored$Musk <- "Yes"
em_tweets_scored <- em_tweets_scored %>% filter(score <= -3 | score >= 3)
head(em_tweets_scored)
em_mentions_scored <- em_mentions_words %>% inner_join(finn, by = "word")
em_mentions_scored$Musk <- "No"
em_mentions_scored <- em_mentions_scored %>% filter(score <= -3 | score >= 3)
head(em_mentions_scored)
Then, prior to displaying the visual, I merged the two datasets and created a total word-score dataframe:
total_scores <- rbind(em_tweets_scored, em_mentions_scored)
Finally, below is the output of the graph containing words that AFINN categorizes as either highly positive or highly negative, along with a breakdown of from whom those words were typed:
total_scores_1 <- total_scores %>%
group_by(word, Musk) %>%
summarise(n = n()) %>%
arrange(desc(n))
ggplot(total_scores, aes(x = word, y = score, fill = Musk)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Word") +
ylab("Sentimental Score") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
###Conclusion, Part II
Looking at the most extremely positive and negative words used by either Musk (shown in blue) or people tweeting @ Musk (shown in red), it would appear that Musk himself makes up a larger share of extremely positive words, whereas folks mentioning him on Twitter are more likely to use highly negative words. I guess this goes to show that when you own and operate several of the most highly-recognized tehcnology companies in Silicon Valley, it’s hard to be too bummed out…However, the buzz sorrounding these companies seems to be a mixed bag of reviews.
####FYI In case you folks are curious about each of Musk’s companies, the following links direct you to each of these firms’ respective Twitter pages. Tesla and SpaceX seem to dominate the headlines, but The Boring Company and Neuralink are up to some neat things as well!