Thanksgiving is without a doubt the MOST gluttonous holiday of the year (though one may argue that Halloween could be a close contender as well). And sure, the turkey takes center stage with the accompanying sides of mashed potatoes, stuffing, cranberry sauce, etc. By the time dinner is finished, the top button of people’s paints are either on the verge of popping or are already undone leaving no room for dessert. However, to some, dessert is where Thanksgiving dinner truly shines with its quintessential pies. Our analysis today will take a look into the pies that will be in the race to becoming America’s most favorite Thanksgiving Pie (title pending).
In combing through approximately 1,000 tweets that reference #Thanksgiving our research focused on four pies in particular; Apple Pie, Blueberry Pie, Pumnpkin Pie, and Pecan Pie.
Combing through the tweets we were able to cultivate from the Twitter API, we began to measure initial response based on sentimentality of the tweets themselves. Finding that although Blueberry Pie has a particularly high share of “positive” mentions, it is worth noting that the actual number of Blueberry Pie tweets was significantly low (coming in at 4) - see Appendix Table 1.2. And just as joyous as people are to see a Blueberry Pie come out on Thanksgiving Day, they are just as likely to be overcome with sadness at the very sight of it.
Pumpkin Pie had a substantially strong representation in regard to the total number of positive mentions, the lowest negative response, and the highest scores in positivity as well (including anticipation, joy, and trust). However, as we began to look to the dispertion of sentimentality across all its tweets, we noticed that Pumpkin Pie fell behind in its share of positive mentions and barely slipped past Apple Pie on measures of anticipation.
#create nrc to allow for sentiment analysis
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
#compile sentiment word analysis per pie data frame
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
apple_words <- apple_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
blueberry_words <- blueberry_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
pumpkin_words <- pumpkin_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
pecan_words <- pecan_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))
apple_sentiments <- apple_words %>% inner_join(nrc, by = "word")
blueberry_sentiments <- blueberry_words %>% inner_join(nrc, by = "word")
pumpkin_sentiments <- pumpkin_words %>% inner_join(nrc, by = "word")
pecan_sentiments <- pecan_words %>% inner_join(nrc, by = "word")
#create an identifying pie column to bind into master data frame
apple_sentiments$pie <- "apple"
blueberry_sentiments$pie <- "blueberry"
pumpkin_sentiments$pie <- "pumpkin"
pecan_sentiments$pie <- "pecan"
master_sentiments <- rbind(apple_sentiments, blueberry_sentiments, pumpkin_sentiments, pecan_sentiments)
#plot number of twitter mentions by pie by sentiment
thanksgiving_df <- master_sentiments %>%
group_by(pie, sentiment) %>%
summarize(n = n()) %>%
mutate(frequency = n/sum(n))
ggplot(thanksgiving_df, aes(x = sentiment, y = n, fill = pie)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Sentiment") +
ylab("n") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
From our initial analysis it was clear that the top contenders were Apple and Pumpkin Pie. Therefore, to settle the score we mapped out a positive consumption journey (from the build-up to the pie reveal to post-consumption) as a measure of more clearly defining a winner.
We began by observing the anticipation people felt at the very idea that either Apple or Pumpkin Pie would be presented at the table this year, followed by the sense of surprise that followed through, the joy or elation that they’re wishes were met, the trust that it was going to taste as good as they imagined, and the positive experience at the conclusion. To visually see that positive consumption journey, please refer below to Table 1.3.
However, to more scientifically calculate the true winner, we applied the Pie-thagorean algorithm (a derivative of the Pythagorean Ranking algorithm) to determine which pie is most likely to impress and satisfy crowds at the Thanksgiving table this year. The theorem is calculated as:
With “affinity mentions” equal to the sum of anticipation, joy, positive, surprise, and trust. After running the calculations, the results are as follows …
#calculate the pie-thagorean score for Apple & Pumpkin Pies
pie_thagorean_df <- thanksgiving_df %>%
filter(pie == "apple" | pie == "pumpkin") %>%
filter(sentiment == "anticipation" | sentiment == "joy" | sentiment == "positive" | sentiment == "surprise" | sentiment == "trust") %>%
select("pie", "sentiment", "n")
pie_thagorean_pos_app_df <- pie_thagorean_df %>%
filter(pie == "apple") %>%
summarise(N = sum(n))
pie_thagorean_pos_pump_df <- pie_thagorean_df %>%
filter(pie == "pumpkin") %>%
summarise(N = sum(n))
pie_thagorean_df <- rbind(pie_thagorean_pos_app_df, pie_thagorean_pos_pump_df)
pie_thagorean_tot_app_df <- thanksgiving_df %>%
filter(pie == "apple") %>%
summarise(N = sum(n))
colnames(pie_thagorean_tot_app_df) <- c("pie", "total")
pie_thagorean_tot_pum_df <- thanksgiving_df %>%
filter(pie == "pumpkin") %>%
summarise(N = sum(n))
colnames(pie_thagorean_tot_pum_df) <- c("pie", "total")
pie_thagorean_tot_df <- rbind(pie_thagorean_tot_app_df, pie_thagorean_tot_pum_df)
pie_thagorean_df <- bind_cols(pie_thagorean_df, pie_thagorean_tot_df)
pie_thagorean_df <- pie_thagorean_df %>%
select("pie", "N", "total") %>%
mutate(loyalty_measure = (round((N^0.23)*((N^2.34)/(total^2.34)), digits=1)))
colnames(pie_thagorean_df) <- c("pie", "affinity sentiments", "total sentiments", "pie-thagorean score")
kable(pie_thagorean_df)
| pie | affinity sentiments | total sentiments | pie-thagorean score |
|---|---|---|---|
| apple | 228 | 279 | 2.2 |
| pumpkin | 1182 | 1372 | 3.6 |
There’s no denying that Pumpkin is Pumpking. Not only with its significantly larger volume of mentions during this time period but supported also by scientific calculations to affirm the strong affinity people have for its presence, Pumpkin Pie has been crowned America’s most favorite Thanksgiving Pie for 2017.
apple_app <- apple_sentiments %>% group_by(sentiment) %>% summarize(n = n())
apple_app <- apple_app %>%
mutate(pct_total = (round((n/sum(n)*100), digits=1)))
kable(apple_app)
| sentiment | n | pct_total |
|---|---|---|
| anger | 14 | 5.0 |
| anticipation | 37 | 13.3 |
| disgust | 2 | 0.7 |
| fear | 3 | 1.1 |
| joy | 51 | 18.3 |
| negative | 24 | 8.6 |
| positive | 104 | 37.3 |
| sadness | 8 | 2.9 |
| surprise | 11 | 3.9 |
| trust | 25 | 9.0 |
blueberry_app <- blueberry_sentiments %>% group_by(sentiment) %>% summarize(n = n())
blueberry_app <- blueberry_app %>%
mutate(pct_total = (round((n/sum(n)*100), digits=1)))
kable(blueberry_app)
| sentiment | n | pct_total |
|---|---|---|
| joy | 1 | 25 |
| positive | 2 | 50 |
| sadness | 1 | 25 |
pumpkin_app <- pumpkin_sentiments %>% group_by(sentiment) %>% summarize(n = n())
pumpkin_app <- pumpkin_app %>%
mutate(pct_total = (round((n/sum(n)*100), digits=1)))
kable(pumpkin_app)
| sentiment | n | pct_total |
|---|---|---|
| anger | 25 | 1.8 |
| anticipation | 181 | 13.2 |
| disgust | 17 | 1.2 |
| fear | 15 | 1.1 |
| joy | 303 | 22.1 |
| negative | 97 | 7.1 |
| positive | 462 | 33.7 |
| sadness | 36 | 2.6 |
| surprise | 51 | 3.7 |
| trust | 185 | 13.5 |
pecan_app <- pecan_sentiments %>% group_by(sentiment) %>% summarize(n = n())
pecan_app <- pecan_app %>%
mutate(pct_total = (round((n/sum(n)*100), digits=1)))
kable(pecan_app)
| sentiment | n | pct_total |
|---|---|---|
| anger | 7 | 1.6 |
| anticipation | 77 | 17.6 |
| disgust | 7 | 1.6 |
| fear | 12 | 2.7 |
| joy | 71 | 16.2 |
| negative | 55 | 12.6 |
| positive | 121 | 27.6 |
| sadness | 14 | 3.2 |
| surprise | 15 | 3.4 |
| trust | 59 | 13.5 |
num_tweets <- 1000
apple <- searchTwitter('#thanksgiving#applepie', n = num_tweets)
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the
## API can only return 113
blueberry <- searchTwitter('#thanksgiving#blueberrypie', n = num_tweets)
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the
## API can only return 2
pumpkin <- searchTwitter('#thanksgiving#pumpkinpie', n = num_tweets)
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the
## API can only return 468
pecan <- searchTwitter('#thanksgiving#pecanpie', n = num_tweets)
## Warning in doRppAPICall("search/tweets", n, params = params,
## retryOnRateLimit = retryOnRateLimit, : 1000 tweets were requested but the
## API can only return 124
apple_df <- twListToDF(apple)
blueberry_df <- twListToDF(blueberry)
pumpkin_df <- twListToDF(pumpkin)
pecan_df <- twListToDF(pecan)
```