Assignment 3 Example

I used packages: tidyverse, twitteR, tidytext, stringr, reshape2, formattable, wordcloud, and lubridate.

## [1] FALSE

I chose to investigate the hashtag “#climatechange” on Twitter for this analysis.

num_tweets <- 1000
climate_tweets <- search_tweets('#climatechange', n = num_tweets, include_rts = FALSE)
#head(climate_tweets)

Analyze Tweet Sources

Most Tweets are posted using the Twitter web app, but many are also posted on mobile apps.

source	n	percent_of_tweets
Twitter Web App	325	0.32
Twitter for iPhone	153	0.15
Twitter for Android	127	0.13
Hootsuite Inc.	90	0.09
Twitter Web Client	53	0.05
TweetDeck	49	0.05
Buffer	41	0.04
Twitter for iPad	23	0.02
Sprout Social	13	0.01
Twitter Media Studio	9	0.01

Analyze account data

Do older Twitter accounts tend to have more posts and followers?

There is a slight relationship between age of an account and the number of followers, but there is a lot of variation. Some accounts have existed for a long time, but don’t have very many follwers.

user_data<-climate_tweets %>% 
  dplyr::select(account_created_at, favourites_count, statuses_count, followers_count, friends_count) %>% 
  mutate(accountage = lubridate::now(tzone = "EST") - account_created_at) %>% 
  mutate(accountage_num = as.numeric(accountage)) %>% 
  mutate(accountyears = accountage_num/8760) 

user_data %>% 
  filter(followers_count < 50000) %>% 
  ggplot(aes(accountyears, followers_count)) + geom_point(color = "#67a9cf") + 
  geom_smooth(method = "lm", color = "black") +
  labs(x = "Age of Twitter account (years)", y="Number of follwers", 
       title = "Relationship between age of account and number of follwers") +
  theme(panel.grid = element_blank(), axis.text = element_text(size=12), axis.title = element_text(size=13))

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
tweet_words <- climate_tweets %>% 
  dplyr::select(status_id, text) %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

Analyze tweet text

What kind of words appear in Tweets that use the hashtag #climatechange?

The hashtag #cop25 is trending right now because the event (the UN Climate Change Conference) is going on right now (Dec 2-13, 2019). There are a lot of other interesting words in this cloud like “debate”, “research”, “action”, and “environment.” Based on this word cloud, it seems like most tweets are focused on spreading awareness about climate change or debating issues related to it.

myrdbupal<-c("#67001f","#b2182b","#d6604d","#f4a582","#92c5de","#4393c3","#2166ac","#053061")

cloudwords<-as.vector(tweet_words$word)

wordcloud::wordcloud(cloudwords,min.freq = 2, scale=c(7,0.6),colors=myrdbupal,  
                     random.color= T, random.order = FALSE, max.words = 150)

Sentiment analysis of common words in climate change tweets

positive <- get_sentiments("bing") %>%
  filter(sentiment == "positive")

pos_words<-tweet_words %>%
  semi_join(positive) %>%
  count(word, sort = TRUE) %>% 
  filter(word != "warm", 
         word != "fast", 
         word != "trump", 
         word != "won", 
         word != "silent", 
         word != "hot") %>% 
  mutate(sentiment = "Positive")

negative <- get_sentiments("bing") %>%
  filter(sentiment == "negative")

neg_words<-tweet_words %>%
  semi_join(negative) %>%
  count(word, sort = TRUE) %>% 
  mutate(sentiment = "Negative")

There are a similar number of positive-associated and negative-associated words in these Tweets.

bind_rows(pos_words, neg_words) %>%
  filter(n > 4) %>% 
  mutate(n = ifelse(sentiment == "Negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n, fill = sentiment)) +
  geom_col() +
  coord_flip() +
  labs(y = "Contribution to sentiment", x="Word" , fill = "Sentiment", 
       title = "Common words in #climatechange Tweets") +
  theme(panel.grid = element_blank(), axis.text = element_text(size=13), axis.title = element_text(size=13)) +
  scale_fill_manual(values = c("#b2182b","#67a9cf"))