Box plot with jittered points using the “Fastfood” Dataset from
openintro packag
Dataset contains nutrition info. of 515 fast food items, broken down
by restaurant
library(openintro)
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library(ggplot2)
ggplot(fastfood, aes(x = calories, y = restaurant, color = restaurant))+
geom_boxplot(alpha = 0)+
geom_jitter(alpha = 0.4, height = 0.3)+
guides(color = guide_legend(reverse =TRUE))+
labs(title = "Calories in Fast Food Restaurant Items", x = "Calories", y = "Restaurant",
color = "Restaurant")+
theme(plot.title = element_text(size = 22),
axis.text.y = element_text(face = "bold", size = 18, angle = 10),
axis.text.x = element_text(face = "bold", size = 18),
axis.title.y = element_text(size = 20, face = "bold"),
axis.title.x = element_text(size = 20, face = "bold"),
legend.key.size = unit(1.1, "cm"),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15),
panel.border = element_rect(fill = "transparent",
color = "black",
linewidth = 2))

Violin plots with points on top
#install.packages("ggbeeswarm")
library(ggbeeswarm)
ggplot(fastfood, aes(x = calories, y = restaurant, color = restaurant))+
geom_violin()+
geom_jitter(alpha = 0.3)+
guides(color = guide_legend(reverse =TRUE))+
labs(title = "Calories in Fast Food Restaurant Items", x = "Calories", y = "Restaurant",
color = "restaurant")+
theme(plot.title = element_text(size = 22),
axis.text.y = element_text(face = "bold", size = 18, angle = 10),
axis.text.x = element_text(face = "bold", size = 18),
axis.title.y = element_text(size = 20, face = "bold"),
axis.title.x = element_text(size = 20, face = "bold"),
legend.key.size = unit(1.1, "cm"),
legend.text = element_text(size = 15),
legend.title = element_text(size = 15),
panel.background = element_rect(fill = "seashell"),
panel.grid= element_line(color = "gray90", linetype = 2, linewidth = 0.8),
panel.border = element_rect(fill = "transparent",
color = "black",
linewidth = 2))

Analysis
The original side-by-side boxplots show the median number of
calories in the items provided in the dataset for each restaurant
clearly. We can see that Sonic, Arby’s and Burger King seem to have the
three highest median values for number of calories in menu items. Out of
these three restaurants, Arby’s seems to have the least amount of
variance in the number of calories in its food options. One advantange
to the boxplots is that they easily convey the variation in number of
calories among the restaurant’s options. Overall, McDonald’s seems to
have a noticeably higher amount of variation than the other restaurants,
given by the fact that we can see it has the most outliers and a
considerable spread in the middle 50% of distribution of the calories in
its food. It has the item with the maximum number of calories out of all
the menu options in the dataset, at almost 2,500 calories, but its
median is lower than the previously three mentioned restaurants. Within
the middle 50% of the data for Subway’s distribution, we can observe
noticeably higher spread than any other restuarant, suggesting that the
center of the distribution of calories in Subway’s menu items have more
varaition than the central portion of the distributions of the other
restaurants.
Adding in the points in the second plot helps provide previously
missing information, including that Chick-Fil-A has less data points
than the other restaurants. Since each restaurant is clearly not evenly
represented in the data, since some of the boxplots have more overlaid
points than others, this helps us take caution when comparing the two
distributions. While not a precise measure of the accuracy of the data,
the viewer can still gain some insight. For example,Taco Bell has a
pretty dense distribution in the number of calories, and has more data
points than some other restaurants, so we may have a better sense of the
distribution for Taco Bell than some other restaurants, like
Chick-Fil-A.
While the violin without the boxplot does not show the medians
anymore, they do provide the value of showing the outlines of density
for the points in the dataset. We can see areas where the number of
calories in menu items for the restaurants are more and less populated
by looking at the width of the violins. For example, Dairy Queen’s
distribution appears roughly symmetric about the median, with less data
points branching out above and below the center value. Also, for Burger
King, we get a clearer picture that the mean would likely be higher than
the median, and that the data is skewed slightly right, as the violin
plot is wider for lower values of calories, and then thins out to the
right, just like McDonald’s. It is overall easier to judge skewness from
the violin plot than the box plot.