I will say from the outset that I got less out of this week than I did last week. I was able to do a few new things, which were helpful, while I got a little frustrated and ultimately stymied by a few things I wanted to do but wasn’t able.
One thing I DID learn was the value of force-formatting responses. I think drop down menus, instead of letting anybody write things themeselves, is the way to go in this kind of response.
But to start off, the usual….
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5
## ✔ tibble 1.4.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(ggrepel)
RawData <- read_csv("208_rawdata_pizzapreferences.csv")
## Warning: Missing column names filled in: 'X18' [18]
## Parsed with column specification:
## cols(
## Timestamp = col_character(),
## `What is your age range?` = col_character(),
## `Which gender do you identify as?` = col_character(),
## `Which State do you Live in?` = col_character(),
## `Which County do you Live in?` = col_character(),
## `Which City/Town do you live in?` = col_character(),
## `Do you have children?` = col_character(),
## `How physically active do you consider your daily life?` = col_character(),
## `How much do you enjoy pizza? Scale 1-5` = col_character(),
## `Choose your three favorite pizza toppings:` = col_character(),
## `Choose your three least favorite pizza toppings:` = col_character(),
## `Favorite Crust Type` = col_character(),
## `How many times a month do you eat pizza?` = col_character(),
## `Do you eat frozen pizza or delivery pizza more often?` = col_character(),
## `What other types of cuisine do you enjoy?` = col_character(),
## `Are you gluten intolerant?` = col_character(),
## `Which state/city has the best pizza?` = col_character(),
## X18 = col_character()
## )
Obviously we need a pie chart. I decided to make mine by preferred crust type. When I printed out the table, I learned that, apparrently, “French Bread” is a kind of crust that MULTIPLE people stated as a preference. This is unaccetable, and had to be excised.
crusts <- RawData %>%
mutate(count = 1) %>%
group_by(`Favorite Crust Type`) %>%
filter(`Favorite Crust Type`!="French Bread") %>%
summarize(total = sum(count)) %>%
arrange(desc(total))
crusts
## # A tibble: 8 x 2
## `Favorite Crust Type` total
## <chr> <dbl>
## 1 Regular crust 142
## 2 Thin crust 127
## 3 Deep dish 82
## 4 Stuffed crust 82
## 5 Thick crust 48
## 6 Flatbread 22
## 7 No preference 16
## 8 Sicilian crust 14
Note the order of the table, as it becomes a point of confusion when I construct the plot, below.
library(RColorBrewer)
Issues here: I could not for the life of me figure out good labeling position. There were a few tutorials I found that involved adding an additional column and using the cumsum() verb to provide a label position, but it never seemed to work out properly. Time is (always) of the essence, so I decided to cut my losses and move on from that particular project. I think this is due to the fact that the geom seems to be implicitly ordering alphabetically, instead of numerically. I tried to change the stat = from “Identity” to “Count”, but then that makes it upset. I likewise tried using geom_bar(), but to no avail. Another additional, and kind of odd thing, is that I tried to then use the exact same aesthetics I managed from my toppings circular bar chart, below, where I could sort of fudge the label method, and somehow that wouldn’t translate to this chart. Oh well.
ggplot(crusts, aes(x = 1, y = total, fill = `Favorite Crust Type`)) +
geom_bar(width = 1, stat = "identity", color = "tan3", size = 1.5) +
coord_polar("y", start = 0) +
scale_fill_brewer(palette = "YlOrRd") +
theme_void() +
theme(axis.title = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.y = element_blank(),
axis.ticks.y = element_blank(),
axis.text.x = element_blank()) +
ggtitle("Pizza by the Crust")
I then wanted to visualize the ‘which city has the best pizza’ question.
locales <- RawData %>%
mutate(count = 1) %>%
group_by(`Which state/city has the best pizza?`) %>%
summarize(total = sum(count))
locales
## # A tibble: 5 x 2
## `Which state/city has the best pizza?` total
## <chr> <dbl>
## 1 California 58
## 2 Chicago 129
## 3 New Jersey 22
## 4 New York 293
## 5 Texas 49
At this point I learn that this is not actually a binary choice. People out there in the world seem to think “California” and “Texas” are legitimate answers. They’re not. So we had to reduce it to the places that matter.
locales_2 <- RawData %>%
mutate(count = 1) %>%
group_by(`Which state/city has the best pizza?`) %>%
filter(`Which state/city has the best pizza?` %in% c("New York", "Chicago")) %>%
summarize(total = sum(count))
locales_2
## # A tibble: 2 x 2
## `Which state/city has the best pizza?` total
## <chr> <dbl>
## 1 Chicago 129
## 2 New York 293
Now let’s show Chicago partisans how lonely they are. Similar to what I did last week, I have to fake a plot so I can show what I really want: Nobody likes Chicago Style pizza. I also wanted to make my graphics look kind of like pizza.
locales_3 <- locales_2 %>%
mutate(x_position = c(2.5, 3.5)) %>%
mutate(y_position = c(2.5, 2.5)) %>%
mutate(city = locales_2$`Which state/city has the best pizza?`)
ggplot(locales_3, aes(x = x_position, y = y_position, size = total)) +
geom_point(shape = 25, stroke = 4, fill = "gold2", color = "burlywood3", show.legend = FALSE) +
xlim(1.5, 4.5) +
ylim(0, 5) +
scale_size_continuous(range = c(30, 68.14)) +
theme_void() +
xlab("") +
ylab("") +
geom_label(aes(label = city), fill = "tomato2", size = 8, color = "black", label.size = 0, show.legend =FALSE) +
geom_label(aes(label = total), fill = "tomato2", size = 5, color = "black", nudge_y = -.5, label.size = 0, show.legend = FALSE) +
ggtitle("Whose Pizza Reigns Supreme?") +
labs(caption = "Sorry, Chicago. That's pie you've got, not pizza.")
I wanted to do SOMETHING geographic, so I used Melissa’s table that she kindly organized by locality.
country_data <- read.csv("pizzacountry.csv")
head(country_data)
## X Timestamp What.is.your.age.range.
## 1 1 2018/11/01 8:21:51 PM MDT 25-34
## 2 2 2018/11/01 8:35:42 PM MDT 18-24
## 3 3 2018/11/01 8:27:18 PM MDT 65+
## 4 4 2018/11/01 8:31:25 PM MDT 25-34
## 5 5 2018/11/01 8:26:17 PM MDT 25-34
## 6 6 2018/11/01 8:26:49 PM MDT 25-34
## Which.gender.do.you.identify.as. Which.State.do.you.Live.in.
## 1 Female Georgia
## 2 Male Virgina
## 3 Female Pennsylvania
## 4 Male Pennsylvania
## 5 Male USA
## 6 Male u.s
## Which.County.do.you.Live.in. Which.City.Town.do.you.live.in.
## 1 USA Acworth
## 2 Albemarle Charlottesville
## 3 Allegheny Pittsburgh
## 4 Allegheny Pittsburgh
## 5 AMERICA TX
## 6 america california
## Do.you.have.children.
## 1 Yes
## 2 No
## 3 Yes
## 4 No
## 5 No
## 6 Yes
## How.physically.active.do.you.consider.your.daily.life.
## 1 Sedentary
## 2 Active
## 3 Very Active
## 4 Active
## 5 Moderately active
## 6 Very Active
## How.much.do.you.enjoy.pizza..Scale.1.5
## 1 4 - I really enjoy eating pizza.
## 2 5 - I LOVE pizza, could eat pizza for every meal!
## 3 5 - I LOVE pizza, could eat pizza for every meal!
## 4 4 - I really enjoy eating pizza.
## 5 3 - I neither like nor dislike pizza
## 6 4 - I really enjoy eating pizza.
## Choose.your.three.favorite.pizza.toppings.
## 1 Pepperoni;Sausage
## 2 Garlic;Spinach;Bell Pepper;Onion;Chicken;Pepperoni;Alfredo Sauce;Pesto Sauce;Balsamic Glaze;BBQ Sauce
## 3 Black Olives;Mushroom;Sausage
## 4 Mushroom;Pepperoni;Sausage
## 5 Mushroom;Spinach;Pepperoni;Sausage
## 6 Chicken
## Choose.your.three.least.favorite.pizza.toppings.
## 1 Black Olives;Mushroom;Spinach;Bell Pepper;Artichoke;Pesto Sauce;Balsamic Glaze
## 2 Black Olives;Mushroom
## 3 Ham;Pepperoni;BBQ Sauce
## 4 Bell Pepper;Pesto Sauce;Balsamic Glaze
## 5 Black Olives;Chicken;Alfredo Sauce
## 6 Mushroom
## Favorite.Crust.Type How.many.times.a.month.do.you.eat.pizza.
## 1 Regular crust 4+ times month
## 2 Stuffed crust 2+ times a month
## 3 Thin crust 2+ times a month
## 4 Deep dish Less than once a month
## 5 Deep dish 4+ times month
## 6 Thick crust 4+ times month
## Do.you.eat.frozen.pizza.or.delivery.pizza.more.often.
## 1 Delivery!
## 2 Delivery!
## 3 Frozen!
## 4 Frozen!
## 5 Delivery!
## 6 Delivery!
## What.other.types.of.cuisine.do.you.enjoy. Are.you.gluten.intolerant.
## 1 Italian;Mexican No
## 2 Italian;Mexican No
## 3 Chinese;Greek;Italian;Mexican;Thai No
## 4 Chinese;Greek;Italian;Mexican;Sushi;Thai No
## 5 None of the above Yes
## 6 Thai Yes
## Which.state.city.has.the.best.pizza. sides country
## 1 New York Breadsticks us
## 2 Chicago Wings! us
## 3 New York Breadsticks us
## 4 Chicago Breadsticks us
## 5 New York Wings! us
## 6 California Wings! us
I decided to look at California’s preferences. Note: we apparrently prefer black olives over pepperoni, which bucks the global trend. Bunch of hippies.
California_toppings <- country_data %>%
filter(Which.State.do.you.Live.in. == "California") %>%
mutate(toppings = strsplit(as.character(Choose.your.three.favorite.pizza.toppings.), ";")) %>%
unnest(toppings) %>%
mutate(count = 1) %>%
group_by(toppings) %>%
summarize(total = sum(count)) %>%
arrange(desc(total))
ggplot(California_toppings, aes(x = toppings, y = total)) +
geom_bar(stat = "identity", show.legend = FALSE, fill = "gold2", color = "tomato2", size = 3) +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
xlab("") +
ylab("") +
ggtitle("What do Californians Want?")
Below is something I WANTED to do, but didn’t get to. I wanted to make a correlation between lifestyle (sedentary, active, etc), their love of pizza (lots, not that much), and how often they consume it. But I short-circuited the project before I really got going as applying numeric values to the individual characters bogged me down. Here’s the table, but no corresponding visualization.
desire_frequency <- RawData %>%
select(`How physically active do you consider your daily life?`, `How much do you enjoy pizza? Scale 1-5`, `How many times a month do you eat pizza?`) %>%
group_by(`How physically active do you consider your daily life?`)
head(desire_frequency)
## # A tibble: 6 x 3
## # Groups: How physically active do you consider your daily life? [3]
## `How physically active d… `How much do you enjoy … `How many times a mo…
## <chr> <chr> <chr>
## 1 Sedentary 4 - I really enjoy eati… 4+ times month
## 2 Active 5 - I LOVE pizza, could… 2+ times a month
## 3 Moderately active 5 - I LOVE pizza, could… 4+ times month
## 4 Sedentary 5 - I LOVE pizza, could… 4+ times month
## 5 Active 4 - I really enjoy eati… 4+ times month
## 6 Active 5 - I LOVE pizza, could… 2+ times a month
Now I get into the question of toppings. The thing I learned here is the strsplit() and unnest(). Great way to separate the choices people put grouping their toppings.
favorite_toppings <- RawData %>%
mutate(toppings = strsplit(as.character(`Choose your three favorite pizza toppings:`), ";")) %>%
unnest(toppings)
toppings_count <- favorite_toppings %>%
mutate(count = 1) %>%
group_by(toppings) %>%
summarize(total = sum(count)) %>%
arrange(desc(total))
I wanted to make a circular bar chart last week, but the values I was trying to show were too all over the place. This week’s data was more amenable, though.
The labeling process I found difficult. R-Graph Gallery has some examples, but involved something a bit complicated that I didn’t quite follow. I found this to be a somewhat elegant solution. It doesn’t look bad, but I couldn’t figure out how to stop the farther right labels from getting cut off.
ggplot(toppings_count, aes(x = as.factor(toppings), y = total)) +
geom_bar(stat = "identity", show.legend = FALSE, fill = alpha("coral2", .5)) +
ylim(-20, 270) +
coord_polar(start = -0) +
theme_minimal() +
theme(axis.title = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.y = element_blank(),
axis.text.x = element_text(size = 8))
I needed another visualization, so I used the same data as the previous graphic and made it more pizza-y. I’m a hack, I know.
ggplot(toppings_count, aes(x = toppings, y = total)) +
geom_segment(aes(x = toppings, xend = toppings, y = 0, yend = total), color = "tomato2", show.legend = FALSE) +
geom_point(size = 10, shape = 25, stroke = 2, fill = "gold2", color = "burlywood3", show.legend = FALSE) +
geom_point(size = 2, color = "tomato2", show.legend = FALSE) +
theme(panel.grid.major.x = element_blank(),
panel.border = element_blank(),
axis.ticks.x = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1),
axis.ticks.y = element_blank(),
panel.background = element_blank()) +
xlab("") +
ylab("") +
ggtitle("Look! I made little pizzas!") +
labs(caption = "Apparently it's all about mushrooms and pepperoni")
My wishlist of things I’d LIKE to have done this week, but didn’t get around to or figure out: translate the lifestyle habits into a correlelogram, to see if there’s a relationship between people’s activity levels and love of pizza with how often they eat it, figure out the labeling on pie charts, and, finally, how to customize the order things are displayed in in bar charts. They seem to do so in alphabetically order automatically, despite the fact that the table is arranged by value. How? HOW?