A4_JJanis_Twitter.utf8.md

#Black Friday vs. Small Business Saturday: ##What Twitter is saying and feeling ###Jaclyn Janis
####MPH 676, University of Southern Maine, Fall 2018

###Purpose

Below is an analysis of Twitter data on Black Friday and Small Business Saturday. I was curious about differences in sentiment and geographic distribution. I fully admit that I approached this analysis biased - Black Friday makes me cringe, and Small Business Saturday feels refreshing, materialistic though it still may be.

It occurred to me that I know nothing about how Black Friday became a thing. I was surprised to find it has roots in Philadelphia, especially having lived there for 6 years. Then, after reading said history, I wasn’t surprised at all. I know Philadelphia is the City of Brotherly Love and all, but, well, ask anyone who’s lived there: it can be mean. Did you see what happened after the Super Bowl?

###Preparing the Data

I used the twitteR package for my initial tweet searches, converting lists to dataframes, and doing my sentiment analysis. I began by searching #BlackFriday on Black Friday. I converted the list of 1000 tweets to a dataframe with twListToDF(). The results of this code are suppressed since you all know what the data look like after being converted. The code is commented out so I wouldn’t accidentally rerun it and pull tweets after Black Friday. I did write my results to a CSV in case my R session were to restart, since I valued the fact that they were pulled on the actual day of the hashtag. I wouldn’t have thought to do this if I hadn’t made the mistake of not saving my time-sensitive Thanksgiving Day searches then losing the results when my battery died. (I was going to do a sentiment analysis on data pulled over the course of the day to see if sentiments changed on Thanksgiving as people got fuller, possibly more inebriated, and had more exposure time to family. There’s probably a way to rerun these searches with “since” and “until”, but due to the constraints of the basic developer account and our discussion on Piazza, I decided to just start over with a new topic.) I ended up having to use the CSV files when knitting to HTML.

#num_tweets <- 1000
#blackfriday <- searchTwitter('#blackfriday', n = num_tweets)
#blackfriday_df <- twListToDF(blackfriday)
#write.csv(blackfriday_df, "blackfridaysaved.csv")
blackfriday_df <- read.csv("blackfridaysaved.csv")

I pulled 1000 tweets that contained #SmallBusinessSaturday on Small Business Saturday.

#num_tweets <- 1000
#smbizsat <- searchTwitter('#smallbusinesssaturday', n = num_tweets)
#smbizsat_df <- twListToDF(smbizsat)
#write.csv(smbizsat_df, "smallbizsat.csv")
smbizsat_df <- read.csv("smallbizsat.csv")

I checked that all the tweets pulled in fact had dates on Black Friday or Small Business Saturday.

#head(blackfriday_df)
#tail(blackfriday_df)
#head(smbizsat_df)
#tail(smbizsat_df)

###Exploring the Data

I used the sample code from our unit lecture (and therefore from this article) to get the words to use in the sentiment analysis. I checked out the top 10:

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
bf_words <- blackfriday_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

bf_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)

I didn’t limit my initial tweet search to English, so this broader net that I cast taught me something that I did not know and which pains me to learn. I see several French and Spanish words in here, so I asked myself, “Is Black Friday not just an American thing?” (Tweets in Spanish alone would not have made me question - it was the French that got me.) So I did a little bit of searching, and turns out that Black Friday has spread like a virus to the rest of the unsuspecting world.

I did the same for Small Business Saturday, then decided there is a much more effective way of displaying common words (I chose 100).

BLACK FRIDAY WORD CLOUD

library(wordcloud2)
bf_cloud <- bf_words %>% group_by(word) %>% summarize(freq = n()) %>% top_n(50)
wordcloud2(data = bf_cloud, size = 3, color = 'random-light', backgroundColor = "black")

SMALL BUSINESS SATURDAY WORD CLOUD

smbiz_cloud <- smbiz_words %>% group_by(word) %>% summarize(freq = n()) %>% top_n(50)
wordcloud2(data = smbiz_cloud, size = 3, color = "random-dark")

Now onto the sentiment analysis. I joined both the Black Friday and Small Business Saturday words to the sentiments then to each other. I also added the variable, day, to flag those results that were Black Friday and those that were Small Business Saturday. I suppressed the tables here but checked them out to make sure everything looked okay.

Below shows Black Friday vs. Small Business Saturday sentiments. You can see greater anticipation, joy, positive, and trust for Small Business Saturday, whereas more anger, fear, negative, sadness, and surprise turn up for Black Friday. That feels right to me (no offense to those who love their deals).

bg <- ggplot(sent_df, aes(x = sentiment, y = frequency, fill = day)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

bg + scale_fill_manual(values = c("black", "cyan4"))

CONTINUE READING THE ASSIGNMENT HERE!