I ended up distributing my own pizza survey last week (since I was out of town and couldn’t contribute to the group effort as much as I should’ve). The data has a definite Californian skew, but there we are!
Packages and data I’ll be using:
library(tidyverse)
-- Attaching packages ---------------------------------------------------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.0.0 v purrr 0.2.5
v tibble 1.4.2 v dplyr 0.7.6
v tidyr 0.8.1 v stringr 1.3.1
v readr 1.1.1 v forcats 0.3.0
-- Conflicts ------------------------------------------------------------------------------------- tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
library(ggrepel)
library(stringr)
setwd("C:/Users/monica/Documents/Courses/Q7_Fall2018/GEOG 208/week5_project")
column_names = c("submitted","name","age","gender","ethnicity","state","city","us_born","cntry_birth","pizza_freq","pizza_rating","pizza_1word","best_1","best_2","best_3","worst_1","worst_2","worst_3","add_ons","no_of_toppings","most_surprising","unacceptable","pineapple_q","diet","tacos","hamburgers","sushi","salad","bagels","steak","email")
untidy_pizza <- read_csv("the_pizza_survey.csv", skip = 1, col_names = column_names)
Parsed with column specification:
cols(
.default = col_character(),
age = col_integer(),
pizza_rating = col_integer(),
tacos = col_integer(),
hamburgers = col_integer(),
sushi = col_integer(),
salad = col_integer(),
bagels = col_integer(),
steak = col_integer()
)
See spec(...) for full column specifications.
head(untidy_pizza)
# A tibble: 6 x 31
submitted name age gender ethnicity state city us_born cntry_birth pizza_freq pizza_rating pizza_1word best_1 best_2
<chr> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <chr>
1 2018/10/~ <NA> 63 Male Asian Cali~ Sant~ No Philippines About onc~ 3 Acceptable Fresh~ Extra~
2 2018/10/~ Cass~ 20 Female White Cali~ San ~ Yes N/A About onc~ 4 Delicious Spina~ Garlic
3 2018/10/~ Moni~ 28 Female Asian Cali~ Los ~ Yes <NA> About onc~ 4 Delicious Mushr~ Peppe~
4 2018/10/~ Matt 25 Male Asian Cali~ Cost~ Yes <NA> Less than~ 4 Comfortabl~ Peppe~ Mushr~
5 2018/10/~ Soph~ 22 Female White;As~ Cali~ East~ Yes <NA> About onc~ 4 Delicious Mushr~ Pinea~
6 2018/10/~ ciro 29 Male Prefer n~ Cali~ Los ~ No Germany About onc~ 3 Delicious Sausa~ Mushr~
# ... with 17 more variables: best_3 <chr>, worst_1 <chr>, worst_2 <chr>, worst_3 <chr>, add_ons <chr>,
# no_of_toppings <chr>, most_surprising <chr>, unacceptable <chr>, pineapple_q <chr>, diet <chr>, tacos <int>,
# hamburgers <int>, sushi <int>, salad <int>, bagels <int>, steak <int>, email <chr>
Some preliminary cleaning: I know for certain that I don’t need the first or last columns, and I would like to add a unique ID (that may or may not end up being useful). Also, there was at least one prank entry, from a “Pete Za” who is 314 years old. Pete has to go.
pizza <- untidy_pizza[c(2:30)] %>%
mutate(participant_id = row_number()) %>%
filter(age <= 120)
Most of the survey participants live in the U.S., and most of them are in California, so instead of doing a geographic grouping by country or state, I’m going to go with the Census’ very basic regions.
census_region <- read_csv("census_state_regions.csv")
Parsed with column specification:
cols(
Region = col_integer(),
Region_name = col_character(),
Division = col_integer(),
State_FIPS = col_integer(),
State = col_character()
)
pizza_region <- pizza %>%
rename(State = state) %>%
left_join(census_region, by = "State")
pizza_region$Region_name[is.na(pizza_region$Region_name)] <- "Outside U.S."
pizza_region$Region_name <- ordered(pizza_region$Region_name, levels = c("Outside U.S.", "Midwest", "Northeast", "South", "West"))
table(pizza_region$Region_name)
Outside U.S. Midwest Northeast South West
3 5 21 23 93
1. Tippity Top Toppings
So, what does my population of 145 claim are the best toppings? I had participants choose their top 3 favorites from a pre-defined list, which I hoped would make the clean-up easier. To identify the top 10 toppings, we’ll attach a crude score to each topping (I asked participants to rank their top 3).
best_toppings <- pizza_region %>%
gather("best_rank", "best_toppings", 12:14) %>%
mutate(best_toppings = sub("Chicken \\(BBQ, buffalo, etc.\\)", "Chicken", best_toppings)) %>%
mutate(best_toppings = sub("Olives \\(black, green, kalamata, etc.\\)", "Olives", best_toppings)) %>%
mutate(best_toppings = sub("N/A - I don't eat pizza", "N/A", best_toppings)) %>%
mutate(best_rank = gsub("best_1", 1, best_rank)) %>%
mutate(best_rank = gsub("best_2", 2, best_rank)) %>%
mutate(best_rank = gsub("best_3", 3, best_rank))
best_toppings$score <- ifelse(
best_toppings$best_rank == 1, 3, (ifelse(
best_toppings$best_rank == 2, 2, ifelse(
best_toppings$best_rank == 3, 1, NA))))
best_sum <- best_toppings %>%
group_by(best_toppings) %>%
summarise(best_votes = n(),
best_score = sum(score)) %>%
rename(topping = best_toppings) %>%
arrange(desc(best_score))
top10_toppings <- best_sum$topping[1:10]
Let’s break down these top 10 by how frequently participants eat pizza:
ggplot(subset(best_toppings, best_toppings %in% top10_toppings), aes(x = best_toppings, fill = pizza_freq)) +
geom_bar(width = 0.75) +
coord_flip() +
theme_minimal() +
scale_fill_manual(values = c("#fdb87d", "#ff8364", "#ffe8d5", "#ff4d4d"), name = "How often do you \neat pizza?") +
labs(title = "Top 10 Pizza Toppings", y = "No. of Votes", x = "") +
theme(plot.title = element_text(vjust = 2, face = "bold"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

Though they don’t make up the majority of participants, those who most frequently eat pizza (both more than once a week and about once a week) tend to stick to some of the most popular toppings, including pepperoni, sausage, and mushrooms.
2. The Rejects
And who are the least loved among toppings?
worst_toppings <- pizza_region %>%
gather("worst_rank", "worst_toppings", 15:17) %>%
mutate(worst_toppings = sub("Chicken \\(BBQ, buffalo, etc.\\)", "Chicken", worst_toppings)) %>%
mutate(worst_toppings = sub("Olives \\(black, green, kalamata, etc.\\)", "Olives", worst_toppings)) %>%
mutate(worst_toppings = sub("N/A - I like it all", "N/A", worst_toppings)) %>%
mutate(worst_toppings = sub("Excessive cheese", "Extra cheese", worst_toppings)) %>%
mutate(worst_rank = gsub("worst_1", 1, worst_rank)) %>%
mutate(worst_rank = gsub("worst_2", 2, worst_rank)) %>%
mutate(worst_rank = gsub("worst_3", 3, worst_rank))
worst_toppings$score <- ifelse(
worst_toppings$worst_rank == 1, 3, (ifelse(
worst_toppings$worst_rank == 2, 2, ifelse(
worst_toppings$worst_rank == 3, 1, NA))))
worst_sum <- worst_toppings %>%
group_by(worst_toppings) %>%
summarise(worst_votes = n(),
worst_score = sum(score)) %>%
rename(topping = worst_toppings) %>%
arrange(desc(worst_score))
worst_sum_noNA <- worst_sum %>%
filter(topping != "N/A")
bottom10_toppings <- worst_sum_noNA$topping[1:10]
This time we’ll break up the bottom 10 by how much the participants like pizza.
ggplot(subset(worst_toppings, worst_toppings %in% bottom10_toppings), aes(x = worst_toppings, fill = as.character(pizza_rating))) +
geom_bar(width = 0.7) +
coord_flip() +
theme_minimal() +
scale_fill_manual(values = c("#fefea4","#ffdc76", "#d3504a", "#4a2c2c"), name = "How much do \nyou like pizza?\n(1-5 scale)") +
labs(title = "10 Least Loved Pizza Toppings", y = "No. of Votes", x = "") +
theme(plot.title = element_text(vjust = 2, face = "bold"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

Anchovies are a least favorite topping among even the most die-hard pizza lovers, followed by olives, pineapple, and eggplant. Participants who really like pizza but aren’t willing to call it their favorite food (rating = 4) follow a similar pattern.
3. Pizza Topping Agreement
Some toppings showed up on both lists - is everyone in agreement about these most loved and most loathed toppings? Let’s see! We’ll compare the number of “most favorite” votes each topping received to its “least favorite” votes. In this graph, points closer to the axes are more agreed-upon toppings.
best_no1 <- best_toppings %>%
filter(best_rank == 1) %>%
group_by(best_toppings) %>%
summarise(best_1 = n()) %>%
rename(topping = best_toppings)
topping_votes <- merge(best_no1, merge(best_sum, worst_sum, by = "topping", all.x = TRUE, all.y = TRUE), by = "topping", all.x = TRUE, all.y = TRUE)
topping_votes[is.na(topping_votes)] <- 0
rm(best_no1)
head(topping_votes)
topping best_1 best_votes best_score worst_votes worst_score
1 Anchovies 5 11 23 65 145
2 Artichokes 5 11 22 14 27
3 Bacon 8 25 47 10 20
4 Basil 4 17 30 2 2
5 Bell peppers 3 20 34 25 51
6 Chicken 7 16 36 22 43
ggplot(topping_votes, aes(x = best_votes, y = worst_votes)) +
geom_point(aes(size = best_1), shape = 21, stroke = 1.1, color = "#ff4d4d", fill = "#ff4d4d", alpha = 0.4) +
geom_text_repel(aes(label = topping), point.padding = 0.8) +
scale_size(name = "#1 Favorite \n Votes", range = c(2,14)) +
theme_minimal() +
labs(title = "Pizza Topping Agreement", subtitle = "Number of 'Favorite' vs. 'Least Favorite' votes received by topping", x = "Favorite Topping Votes", y = "Least Favorite Topping Votes") +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.title.x = element_text(vjust = -1),
axis.title.y = element_text(vjust = 2),
axis.text.y = element_text(hjust = 1.5))

Participants are mostly in agreement about the favorite (pepperoni, mushrooms, sausage) and least favorite (anchovies, eggplant) toppings, but there is more debate about toppings like pineapple (surprise, surprise), olives, and bell peppers.
4. A Regional Look at Pineapple
Pineapple appears to be one of the more divisive toppings. Is there a regional difference in its reception? …and can this chart be trusted when the responses from some of these regions was so teeny tiny…?
ggplot(pizza_region, aes(x = Region_name, fill = pineapple_q)) +
geom_bar(position = "fill", width = 0.25) +
geom_text(stat='count', aes(label=..count..), position = position_fill(vjust = 0.5)) +
scale_fill_manual(values = c("#ff7657", "#665c84", "#fbeed7", "#ffba5a"), name = "") +
coord_flip() +
labs(title = "Does pineapple belong on pizza?", subtitle = "Responses by U.S. region", x = "", y = "Percent of votes") +
theme_minimal() +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.title.x = element_text(vjust = -1, color = "gray60"))

It is disappointing to find that nearly 50% of participants from most U.S. regions have such silly taste in pizza. The Midwest appears to have its values in the right place (though there were very few responses from the region [n = 5]).
5. Pizza, and…
I collected some very important data regarding participants’ taste for other popular foods. How does their apapreciation for pizza compare to these other eats?
other_eats <- pizza_region %>%
select(participant_id, Region_name, pizza_rating, tacos:steak) %>%
group_by(Region_name) %>%
gather("eats", "rating", tacos:steak)
other_eats$eats <- str_to_title(other_eats$eats)
I had grand visions for the visualization of this data, but they have been reduced to this jitter plot:
ggplot(other_eats, aes(x = rating, y = pizza_rating, color = Region_name, fill = Region_name)) +
facet_wrap(. ~ eats) +
geom_jitter(width = 0.3, height = 0.3, seed = NULL, size = 1.5, shape = 21, alpha = 0.75) +
scale_color_manual(values = c("#182C61","#1B9CFC","#55E6C1","#FC427B","#EAB543"), name = "Region") +
scale_fill_manual(values = c("#182C61","#1B9CFC","#55E6C1","#FC427B","#EAB543"), name = "Region") +
theme_minimal() +
labs(title = "How much do you like _____?", subtitle = "Participants' ratings of pizza vs. ratings of other eats", x = "(1 = low rating, 5 = high rating)", y = "Pizza Rating") +
theme(plot.title = element_text(vjust = 2, face = "bold"),
plot.subtitle = element_text(color = "gray60"),
axis.text.y = element_text(hjust = 1.5),
axis.title.y = element_text(vjust = 3),
axis.title.x = element_text(vjust = -2, color = "gray60"),
strip.text.x = element_text(face = "bold"),
panel.margin = unit(1.5, "lines"))
Ignoring unknown parameters: seed`panel.margin` is deprecated. Please use `panel.spacing` property instead

A few observations can be drawn from this confetti here:
-
Tacos are the most beloved of these foods, far more participants gave tacos a 5 than they did pizza, and no one dares say they hate tacos (rating = 1).
-
Sushi is the most polarizing among the other eats, with a lot of pizza lovers (rating = 5) giving it a 1!
-
People aren’t as wild about steak as I thought. That’s kind of nice!
-
People aren’t as wild about bagels as I thought. I’m kind of bothered?
-
There are zero regional patterns that I can make out from this mess.
