Load packages:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.0 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.1 ✔ tibble 3.2.0
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggrepel)
library(cowplot)
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(viridis)
## Loading required package: viridisLite
library(knitr)
Load in the data_fastfood_sales.csv and data_fastfood_calories.csv files into R:
sales <- read.csv("data_fastfood_sales.csv")
calories <- read.csv("data_fastfood_calories.csv")
Add variable prop_franchised_stores to the sales
dataset:
sales$prop_franchised_stores = sales$num_franchised_stores / sales$unit_count
Create a scatter plot with column us_sales along the
x-axis and the column unit_count along the y-axis.
Each axis should be transformed to a log10 scale and
should be appropriately labeled.
Color each point by the proportion of franchised stores
(i.e. num_franchised_stores divided by
unit_count).
Label each point with the name of the fast food restaurant using the
ggrepel package.
Use the classic dark-on-light ggplot2 theme.
Rename the legend appropriately.
ggplot(data = sales, aes(x = us_sales, y = unit_count,
color = prop_franchised_stores,
label = restaurant)) +
geom_point() +
geom_text_repel(color = "black", nudge_y = 0.02) +
scale_x_continuous(trans = "log10") +
scale_y_continuous(trans = "log10") +
labs(y = "Total number of stores (log10 scale)",
x = "U.S. sales in millions (log10 scale)",
color = "Proportion of stores\nfranchised") +
theme_bw()
Create a bar plot with the average_sales on the x-axis
and restaurant on the y-axis (Hint: consider using the
coord_flip() function).
The order of restaurants on the y-axis should be in decreasing order of average sales with the restaurant with the largest average sales at the top and the restaurant with the smallest average sales at the bottom.
Add text to each bar on the plot with the average sales (in the thousands) for each restaurant.
Each axis should be appropriately labeled.
Along the x-axis, transform the text labels to include a dollar sign in front of each number.
Use the classic ggplot2 theme.
ggplot(data = sales, aes(x = reorder(restaurant, average_sales), y = average_sales)) +
geom_bar(stat = "identity") +
coord_flip() +
geom_text(aes(label = paste("$", round(average_sales), sep = '')),
hjust= -0.05, color="black", size = 4,
position = position_dodge(0.6)) +
labs(y = "Average sales per unit store (in thousands)",
x = "Restaurant") +
scale_y_continuous(limits = c(0, 3000),
labels = c("$0", "$1,000", "$2,000", "$3,000")) +
theme_classic()
Create a scatter plot with the column calories along the
x-axis and the column sodium along the y-axis.
Each restaurant should have its own scatter plot (Hint: consider the facet functions).
Add a horizontal line at y=2300 in each scatter plot.
Each axis of the scatter plot should have an appropriately labeled x-axis and y-axis.
For all food items with a sodium level of greater than 2300 (mg) (the
maximum daily intake from the Centers for Disease Control), add a text
label each point with the name of the entree food item using the
ggrepel package.
Use the classic dark-on-light ggplot2 theme.
Rename the legend.
ggplot(data = calories, aes(x = calories, y = sodium)) +
geom_point() +
geom_hline(yintercept = 2300, color = "black") +
geom_text_repel(data = filter(calories, sodium > 2300),
aes(label = item),
nudge_y = 10,
max.overlaps = 100,
hjust = 0, vjust = 1,
size = 2,
direction = 'y') +
facet_wrap(~restaurant) +
labs(x = "Calories",
y = "Sodium (mg)") +
theme_bw()
Create a new column titled is_salad that contains a
TRUE or FALSE value of whether or not the name
of entree food item contains the character string “salad” in it.
calories$is_salad <- ifelse(grepl("Salad", calories$item), TRUE, FALSE)
calories$restaurant <- factor(calories$restaurant)
calories$is_salad <- factor(calories$is_salad)
For each restaurant calculate the median calories in each item.
median_df <- calories%>%
group_by(restaurant)%>%
dplyr::summarize(
median = median(calories)
)
kable(median_df)
| restaurant | median |
|---|---|
| Arbys | 550 |
| Burger King | 555 |
| Chick Fil-A | 390 |
| Dairy Queen | 485 |
| Mcdonalds | 540 |
| Sonic | 570 |
| Subway | 460 |
| Taco Bell | 420 |
calories_median <- merge(x = calories, y = median_df, all.x = TRUE)
Create boxplots with calories on the x-axis and
restaurant on the y-axis.
The order of restaurants on the y-axis should be in decreasing order of calories with the restaurant with the median calories at the top and the restaurant with the smallest median calories at the bottom.
Hide any outliers in the boxplots.
On top of the boxplots add a set of jittered points representing each food item.
Each point should be colored based on whether it is an item with the word “salad” in it or not.
Each axis should be appropriately labeled, the legend should be appropriately labeled, and the x-axis should be transformed to a log10 scale.
Use the classic dark-on-light ggplot2 theme.
ggplot(data = calories_median) +
geom_boxplot(mapping = aes(x = calories, y = reorder(restaurant, median)),
outlier.shape = NA) +
geom_point(aes(x = calories, y = reorder(restaurant, median), color = is_salad), position = "jitter") +
scale_color_discrete(labels = c("Not a salad", "Salad")) +
scale_x_continuous(trans = "log10") +
labs(x = "Calories (log10 scale)",
y = "Restaurant",
color = "Is the entree\na salad?") +
theme_bw()
Using the data_fastfood_calories.csv, remove rows that contain the Taco Bell restaurant
calories_no_taco <- filter(calories, restaurant != "Taco Bell")
For each restaurant calculate the median amount of sugar in each entree item.
median_df2 <- calories_no_taco%>%
group_by(restaurant)%>%
dplyr::summarize(
median = median(sugar)
)
kable(median_df2)
| restaurant | median |
|---|---|
| Arbys | 6.0 |
| Burger King | 7.5 |
| Chick Fil-A | 4.0 |
| Dairy Queen | 6.0 |
| Mcdonalds | 9.0 |
| Sonic | 7.0 |
| Subway | 8.0 |
calories_no_taco_median <- merge(x = calories_no_taco, y = median_df2, all.x = TRUE)
Using this summarized dataset, combine this summarized dataset with the data_fastfood_sales.csv dataset. The combined dataset should only include restaurants that are included in both datasets.
calories_sales <- merge(x = calories_no_taco_median, y = sales, all = FALSE)
Using this new dataset, create a bar plot with
restaurant on the x-axis and on the us_sales
y-axis.
The order of restaurants on the x-axis should be in increasing order of US sales with the restaurant with the largest average sales on the right and the restaurant with the smallest US sales on the left.
Color the bars by the median amount of sugar in the entree items from that restaurant.
Each axis should be appropriately labeled.
Use the classic ggplot2 theme.
ggplot(data = calories_sales, aes(x = reorder(restaurant, us_sales), y = us_sales, fill = median)) +
geom_bar(stat = "identity") +
scale_fill_viridis() +
theme_classic() +
labs(x = "Restaurant",
y = "U.S. sales (in millions)",
fill = "Median sugar (grams)\nin fast food entrees")