Fast Food Restaurant Information
### Set working directory
setwd("C:/Users/sassle03/Desktop/Coursera/Tidyverse/Visualizing Data in the Tidyverse")
### Import Calories Dataset
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
cal <- read_csv("data_fastfood_calories.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## restaurant = col_character(),
## item = col_character(),
## calories = col_double(),
## cal_fat = col_double(),
## total_fat = col_double(),
## sat_fat = col_double(),
## trans_fat = col_double(),
## cholesterol = col_double(),
## sodium = col_double(),
## total_carb = col_double(),
## fiber = col_double(),
## sugar = col_double(),
## protein = col_double(),
## vit_a = col_double(),
## vit_c = col_double(),
## calcium = col_double()
## )
### Import Sales Dataset
sales <- read_csv("data_fastfood_sales.csv")
##
## -- Column specification --------------------------------------------------------
## cols(
## restaurant = col_character(),
## average_sales = col_double(),
## us_sales = col_double(),
## num_company_stores = col_double(),
## num_franchised_stores = col_double(),
## unit_count = col_double()
## )
###install.packages("skimr")
library(skimr)
skim(cal)
| Name | cal |
| Number of rows | 515 |
| Number of columns | 16 |
| _______________________ | |
| Column type frequency: | |
| character | 2 |
| numeric | 14 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| restaurant | 0 | 1 | 5 | 11 | 0 | 8 | 0 |
| item | 0 | 1 | 5 | 63 | 0 | 505 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| calories | 0 | 1.00 | 530.91 | 282.44 | 20 | 330.0 | 490.0 | 690 | 2430 | ▇▆▁▁▁ |
| cal_fat | 0 | 1.00 | 238.81 | 166.41 | 0 | 120.0 | 210.0 | 310 | 1270 | ▇▃▁▁▁ |
| total_fat | 0 | 1.00 | 26.59 | 18.41 | 0 | 14.0 | 23.0 | 35 | 141 | ▇▃▁▁▁ |
| sat_fat | 0 | 1.00 | 8.15 | 6.42 | 0 | 4.0 | 7.0 | 11 | 47 | ▇▃▁▁▁ |
| trans_fat | 0 | 1.00 | 0.47 | 0.84 | 0 | 0.0 | 0.0 | 1 | 8 | ▇▁▁▁▁ |
| cholesterol | 0 | 1.00 | 72.46 | 63.16 | 0 | 35.0 | 60.0 | 95 | 805 | ▇▁▁▁▁ |
| sodium | 0 | 1.00 | 1246.74 | 689.95 | 15 | 800.0 | 1110.0 | 1550 | 6080 | ▇▆▁▁▁ |
| total_carb | 0 | 1.00 | 45.66 | 24.88 | 0 | 28.5 | 44.0 | 57 | 156 | ▅▇▂▁▁ |
| fiber | 12 | 0.98 | 4.14 | 3.04 | 0 | 2.0 | 3.0 | 5 | 17 | ▇▅▂▁▁ |
| sugar | 0 | 1.00 | 7.26 | 6.76 | 0 | 3.0 | 6.0 | 9 | 87 | ▇▁▁▁▁ |
| protein | 1 | 1.00 | 27.89 | 17.68 | 1 | 16.0 | 24.5 | 36 | 186 | ▇▂▁▁▁ |
| vit_a | 214 | 0.58 | 18.86 | 31.38 | 0 | 4.0 | 10.0 | 20 | 180 | ▇▁▁▁▁ |
| vit_c | 210 | 0.59 | 20.17 | 30.59 | 0 | 4.0 | 10.0 | 30 | 400 | ▇▁▁▁▁ |
| calcium | 210 | 0.59 | 24.85 | 25.52 | 0 | 8.0 | 20.0 | 30 | 290 | ▇▁▁▁▁ |
skim(sales)
| Name | sales |
| Number of rows | 19 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| restaurant | 0 | 1 | 3 | 15 | 0 | 19 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| average_sales | 0 | 1 | 1189.88 | 541.52 | 360.72 | 857.50 | 1130.00 | 1470.10 | 2670.32 | ▆▇▆▁▁ |
| us_sales | 0 | 1 | 7592.69 | 8007.32 | 606.00 | 3499.88 | 4476.41 | 9539.12 | 37480.67 | ▇▃▁▁▁ |
| num_company_stores | 0 | 1 | 839.00 | 1875.80 | 0.00 | 53.50 | 276.00 | 677.50 | 8222.00 | ▇▁▁▁▁ |
| num_franchised_stores | 0 | 1 | 5998.53 | 5894.51 | 0.00 | 2583.00 | 4055.00 | 6497.50 | 25908.00 | ▇▅▂▁▁ |
| unit_count | 0 | 1 | 6838.05 | 5997.13 | 2231.00 | 3034.50 | 4332.00 | 7394.00 | 25908.00 | ▇▁▂▁▁ |
The codebook for the data includes 6 variables:
Field name Description Data type restaurant Name of the restaurant character average_sales Average US sales per unit (store) in thousands in 2018 numeric us_sales U.S. sales in millions in 2018 numeric num_company_stores Number of company / corporate-owned stores in 2018 numeric num_franchised_stores Number of franchised stores in 2018 numeric unit_count Total number of stores (unit counts) in 2018 numeric
The codebook for the data includes 16 variables:
Field name Description Data type restaurant Name of the restaurant character item Name of entree item character calories Calories numeric cal_fat Calories from fat numeric total_fat Total fat (g) numeric sat_fat Saturated fat (g) numeric trans_fat Trans_ fat (g) numeric cholesterol Cholesterol (mg) numeric sodium Sodium (mg) numeric total_carb Total Carbohydrate (g) numeric fiber Dietary fiber (g) numeric sugar Total sugar (g) numeric protein Protein (g) numeric vit_a Vitamin A (mcg) numeric vit_c Vitamin C (mcg) numeric calcium Calcium (mg) numeric
In this problem, re-create provided plot. Tasks: - Use sales dataset above - Create a scatter plot with column us_sales along the x-axis and the column unit_count along the y-axis.
- Each axis should be transformed to a log10 scale and should be appropriately labeled.
- Color each point by the proportion of franchised stores (i.e. num_franchised_stores divided by unit_count).
- Label each point with the name of the fast food restaurant using the ggrepel package.
- Use the classic dark-on-light ggplot2 theme.
- Rename the legend appropriately.
library(ggplot2)
library(ggrepel)
sales <- sales %>%
mutate(proportion = num_franchised_stores/unit_count)
us_sales <- ggplot(sales, aes(us_sales, unit_count, color = proportion,
label = restaurant)) +
geom_text_repel(color = "black", size = 3) +
geom_point() +
scale_y_continuous(trans = "log10") +
scale_x_continuous(trans = "log10") +
labs(y = "Total number of stores (log10 scale)",
x = "U.S. sales in millions (log10 scale)",
color = "Proportion of stores\nfranchised") +
theme_bw() +
theme(axis.title = element_text(size = 8, face = "bold"),
legend.title = element_text(size = 10, face = "bold"))
us_sales
# add names and repel and change new of legend
In this problem, re-create provided plot. Tasks: - Use sales dataset above - Create a bar plot with the average_sales on the x-axis and restaurant on the y-axis (Hint: consider using the coord_flip() function). - The order of restaurants on the y-axis should be in decreasing order of average sales with the restaurant with the largest average sales at the top and the restaurant with the smallest average sales at the bottom.
- Add text to each bar on the plot with the average sales (in the thousands) for each restaurant. - Each axis should be appropriately labeled. - Along the x-axis, transform the text labels to include a dollar sign in front of each number.
- Use the classic ggplot2 theme.
###install.packages("directlabels")
library(directlabels)
us_bar <- sales %>%
ggplot(aes(x = reorder(restaurant, average_sales), y = average_sales)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Restaurant",
y = "Average sales per unit store (in thousands)") +
scale_y_continuous(labels = scales::label_dollar()) +
geom_dl(aes(label=paste("$", average_sales, sep = ""), size = 0.5),
method= c(list("last.points"),
cex = 0.5)) +
theme_classic() +
theme(axis.title = element_text(size = 8, face = "bold"))
us_bar
In this problem, re-create provided plot Tasks: - Use cal dataset above - Create a scatter plot with the column calories along the x-axis and the column sodium along the y-axis. - Each restaurant should have its own scatter plot (Hint: consider the facet functions). - Add a horizontal line at y=2300 in each scatter plot. - Each axis of the scatter plot should have an appropriately labeled x-axis and y-axis.
- For all food items with a sodium level of greater than 2300 (mg) (the maximum daily intake from the Centers for Disease Control), add a text label each point with the name of the entree food item using the ggrepel package.
- Use the classic dark-on-light ggplot2 theme. - Rename the legend.
salt <- cal %>%
ggplot(aes(x = calories, y = sodium, label = item)) +
geom_point() +
facet_wrap(~restaurant) +
geom_text_repel(data = subset(cal, sodium >= 2300), size = 1,
nudge_y = 1, nudge_x = 3,
max.overlaps = Inf) +
geom_hline(yintercept = 2300) +
labs(x = "Calories", y = "Sodium (mg)") +
theme_bw() +
theme(axis.title = element_text(size = 8, face = "bold"))
salt
In this problem, re-create provided plot Tasks: - Use cal dataset above - Create a new column titled is_salad that contains a TRUE or FALSE value of whether or not the name of entree food item contains the character string “salad” in it.
- Create boxplots with calories on the x-axis and restaurant on the y-axis. - The order of restaurants on the y-axis should be in decreasing order of calories with the restaurant with the median calories at the top and the restaurant with the smallest median calories at the bottom. - Hide any outliers in the boxplots. - On top of the boxplots add a set of jittered points representing each food item. - Each point should be colored based on whether it is an item with the word “salad” in it or not. - Each axis should be appropriately labeled, the legend should be appropriately labeled, and the x-axis should be transformed to a log10 scale. Use the classic dark-on-light ggplot2 theme.
### Salad or not salad column
library(stringr)
cal$is_salad <- str_detect(cal$item, "Salad")
### Determine the order of restaurants based on median calories
order <- cal %>%
group_by(restaurant) %>%
summarise(med = median(calories)) %>%
arrange(med)
## `summarise()` ungrouping output (override with `.groups` argument)
order
## # A tibble: 8 x 2
## restaurant med
## <chr> <dbl>
## 1 Chick Fil-A 390
## 2 Taco Bell 420
## 3 Subway 460
## 4 Dairy Queen 485
## 5 Mcdonalds 540
## 6 Arbys 550
## 7 Burger King 555
## 8 Sonic 570
medians <- order$restaurant
medians
## [1] "Chick Fil-A" "Taco Bell" "Subway" "Dairy Queen" "Mcdonalds"
## [6] "Arbys" "Burger King" "Sonic"
### Creation of the boxplot
boxes <- cal %>%
ggplot(aes(factor(x = restaurant, level = medians),
y = calories)) +
geom_boxplot(outlier.shape = NA) +
geom_point(position = "jitter", aes(color = factor(is_salad))) +
scale_y_continuous(trans = "log10") +
coord_flip() +
labs(x = "Restaurant",
y = "Calories (log10 scale)",
color = "Is the entree\na salad?") +
scale_color_discrete(labels = c("Not a salad", "Salad")) +
theme_bw() +
theme(axis.title = element_text(size = 10, face = "bold"),
legend.title = element_text(size = 10, face = "bold"))
boxes
In this problem, re-create provided plot Tasks: - Use cal dataset above - Remove rows that contain the Taco Bell restaurant - For each restaurant calculate the median amount of sugar in each entree item. - Using this summarized dataset, combine this summarized dataset with the data_fastfood_sales.csv dataset. The combined dataset should only include restaurants that are included in both datasets. - Using this new dataset, create a bar plot with restaurant on the x-axis and on the us_sales y-axis. - The order of restaurants on the x-axis should be in increasing order of US sales with the restaurant with the largest average sales on the right and the restaurant with the smallest US sales on the left. - Color the bars by the median amount of sugar in the entree items from that restaurant. - Each axis should be appropriately labeled. - Use the classic ggplot2 theme.
### Create a new data frame (called cal2) that takes the cal df and removes
### Taco Bell
cal2 <- cal %>%
group_by(restaurant) %>%
filter(!any(restaurant == "Taco Bell"))
### Determine the median sugar per restaurant.
mediansugar <- cal2 %>%
group_by(restaurant) %>%
summarize(medsugar = median(sugar))
## `summarise()` ungrouping output (override with `.groups` argument)
mediansugar
## # A tibble: 7 x 2
## restaurant medsugar
## <chr> <dbl>
## 1 Arbys 6
## 2 Burger King 7.5
## 3 Chick Fil-A 4
## 4 Dairy Queen 6
## 5 Mcdonalds 9
## 6 Sonic 7
## 7 Subway 8
### Join the new median sugar calories dataset with the sales dataset, keeping only ###restaurants in both datasets.
join <-inner_join(sales, mediansugar, by = "restaurant")
### Color palette viridis for blue to yellow spectrum
####install.packages("viridis")
library(viridis)
## Loading required package: viridisLite
### Recreate plot
joinedplot <- join %>%
ggplot(aes(x= reorder(restaurant, us_sales), y = us_sales,
fill = medsugar)) +
geom_bar(stat= "identity") +
labs(x = "Restaurant",
y = "U.S. sales (in millions)") +
scale_fill_viridis(option = "D",
name = "Median sugar (grams)\nin fast food entrees") +
theme_classic() +
theme(axis.title = element_text(size = 10, face = "bold"),
legend.title = element_text(size = 10, face = "bold"))
joinedplot