Hello, today we will be examining data related to Pizza preferences, both in the United States, and across the World. The first step will be to examine pizza crust preference by type, and by age.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
Pizzadata<-read.csv("C:/Users/bcole/Documents/Pizzadata.csv")
preference<-Pizzadata %>%
mutate(count = 1) %>%
group_by(crust, Ages) %>%
summarise(total= sum(count)) %>%
arrange(desc(total))
ggplot(preference, aes(x=Ages, y=total))+ geom_bar(aes(fill=crust), stat="identity", position="dodge")+ggtitle("Crust Preference and Age")+theme(plot.title = element_text(hjust = 0.5))+xlab("Age Group")+ylab("Total")
This grouped bar chart allows us to see the preference for Pizza by the age groups chosen for the mechanical turk survey. Of note are the large number of responses in the 25-34 age range category. This age bias seems to be a problem for Mturk. Also of note are the preferences for thin crust Pizza for the older responses, and a higher preference for Deep dish and Stuffed Crust among younger survey respondents. Perhaps younger Pizza consumers are seeking out different types of Pizza.
The next chart will be similar to the first, as this seems to be a good way to analyze the crosstabs of our data. We will be examining Pizza enjoyment between the two largest countries in the sample, India, and the US.
enjoyment<-Pizzadata %>%
filter(Country %in% c("us", "india")) %>%
mutate(count = 1) %>%
group_by(pizzalike, Country) %>%
summarise(total= sum(count)) %>%
arrange(desc(total))
ggplot(enjoyment, aes(x=Country, y=total))+ geom_bar(aes(fill=pizzalike), stat="identity", )+ggtitle("Pizza enjoyment by Country")+theme(plot.title = element_text(hjust = 0.5))+xlab("Country")+ylab("Total")+scale_fill_discrete("Pizza Enjoyment")+scale_fill_brewer(palette = "spectral")+guides(fill=guide_legend(title="Pizza Enjoyment"))
## Warning in pal_name(palette, type): Unknown palette spectral
## Scale for 'fill' is already present. Adding another scale for 'fill',
## which will replace the existing scale.
This chart shows us the differences in pizza preference between India and the US. In general, this charts helps to show that there were many more responses in the United States, and that the vast majority of people in both countries seem to really enjoy eating Pizza.
In the next excercise, I will use Pie charts to examine comparisons in the amount of Pizza eaten, and the amount of exercize a survey respondent reported.
activity<-Pizzadata %>%
mutate(count = 1) %>%
group_by(eatnumber, activity) %>%
summarise(total= sum(count)) %>%
arrange(desc(total)) %>%
mutate(var= total/sum(total))
ggplot(activity, aes(x="", y=var, fill= activity))+ geom_bar(stat="identity")+ggtitle("Pizza Eaten and Physical Activity")+theme(plot.title = element_text(hjust = 0.5))+scale_fill_discrete("Pizza Enjoyment")+theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.y = element_blank())+facet_grid(facets= ~eatnumber)+coord_polar(theta = "y")+ylab("Pizza Consumption")+guides(fill=guide_legend(title="Activity"))
These bar charts examine the relationships between physical activity and pizza consumption. In general, those who eat less pizza tend to be have higher values for the “active category”. 0 pizza consumption is an outlier, but there were only three responses for this value. In general this method of faceting charts creates interesting data, but it was difficult to execute.
The next chart will be a simple bar graph that examines pizza preference for toppings. The difficulty comes in sorting out the toppings since respondents were given three options. However, this is easily done using tools contained in the tidyr package.
newpizza <- separate_rows(Pizzadata,threefavs,sep=";\\s*")
favorites<-newpizza %>%
mutate(count = 1) %>%
group_by(threefavs) %>%
summarise(total= sum(count)) %>%
arrange(desc(total))
ggplot(favorites, aes(x=threefavs, y=total, fill=threefavs))+ geom_bar(stat="identity")+theme(axis.text.x = element_text(angle=90), axis.ticks.x = element_blank(), axis.title.x = element_blank(), legend.position = "none", plot.title = element_text(hjust = 0.5))+ylab("Total Votes")+ggtitle("Favorite Pizza Toppings")
The top 3 pizza toppings are Chicken, Mushroom, and Pepperoni. This makes sense because those are the two most common proteins and a popular vegetable topping. Sauces seemed to be less popular choices for favorite topping, which would make sense, because they are a niche item that some wouldn’t consider toppings.
For the next graph, I will compare the least favorite Pizza toppings between the two largest countries in my sample: USA, and India. From this, I should be able to make a comparison between the two countries food tastes. This exercise also required use of tidyr, as well as ggplot faceting.
newpizza <- separate_rows(Pizzadata,three.least.favs,sep=";\\s*")
least<-newpizza %>%
filter(Country %in% c("us", "india")) %>%
mutate(count = 1) %>%
group_by(three.least.favs, Country) %>%
summarise(total= sum(count)) %>%
arrange(desc(total))
ggplot(least, aes(x="", y=total, fill=three.least.favs))+ geom_bar(stat="identity")+theme(axis.text.x = element_text(angle=90), axis.ticks.x = element_blank(), axis.title.x = element_blank(), plot.title = element_text(hjust = 0.5))+ylab("Total Votes")+ggtitle("Least Favorite Pizza Toppings")+facet_grid(facets= ~Country)
From this chart, it is clear there were many more responses from Americans when compared to Indians. Also of note is the high proportion of Americans who dislike Artichoke, and Black Olives. It would have been interesting to see where Anchovies fell on this list, because they are a stereotypically “bad” topping.
Overall, this dataset was interesting to work with, though using only categorical data definitely made it trickier to create graphs.