Assignment 4

Now that e-ciggarettes have been on the market for a couple of years we are starting to see more data about the negative health effects of their continued use. In light of this I wanted to see if the general sentiment towards these products has turned negative or if they are still discussed in a possitive light as they are often marketed as a healthier alternative to smoking.[1]

I searched for tweets containing the words “vaping”, “e-cigarette” or “juul”, one of the more popular brands of this type of product. The lang=‘en’ argument ensures that only tweets written in English are included.

tweets <- search_tweets("vaping OR e-cigarette OR juul", n = 1000,include_rts = FALSE, lang='en')

The words of the tweets were isolated using tidytext in preparation for the sentiment analysis.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
vaping_words <- tweets %>% select(status_id, text) %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

The bing lexicon was used for this sentiment analysis. The results of the analysis show that most of the tweets included from our search express some kind of negative sentiment. However, these negative sentiments could be from those who were discussing the negative health risks associated with vaping or those who were upset about the recently proposed plan to ban flavored vaping cartridges or those who are upset about the subsequent decision to backtrack on that plan. The analysis we’re trying to conduct here could definitely benefit from a way to extract more context from each tweet!

bing <- get_sentiments("bing") %>%
  select(word, sentiment)

tweet_themes<- vaping_words %>% inner_join(bing, by = "word")

tweet_sentiments<-tweet_themes %>% group_by(sentiment) %>% summarize(num = n()) %>% arrange(desc(num))
pander(tweet_sentiments)

sentiment	num
negative	881
positive	398

Despite this shortcoming, I plotted the results as a barchart.

ggplot(tweet_sentiments, aes(x=sentiment, y=num,fill=sentiment))+geom_col()+labs(title="                     General sentiments of tweets about vaping.",y="number of tweets")

Next I created a wordcloud of the isolated words to give us an idea of the most commonly used words in these tweets.

tweet_themes %>%
  anti_join(stop_words) %>%
  count(word) %>%
  with(wordcloud(word, n, max.words = 100))

Assignment 4

Veneka Mahomva

11/26/2019