Introduction

we will create data visualizations using two data sets about fast food restaurants. the overall goal is to explore the nutrition of entree items and the sales of fast food restaurants in 2018. The task is to reconstruct the a set of plots, all of which were constructed using packages in the Tidyverse or packages that integrate with the Tidyverse. The source data is in : https://hubworks.com/blog/statistics-about-americas-biggest-fast-food-chains.html.

Question 1: Us Sales vs Total number of stores

Loading required package:

setwd("D:/coursera/TIDYVERSE/Visualizing Data in the Tidyverse/proyecto final")
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.4

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggrepel)

## Warning: package 'ggrepel' was built under R version 4.0.4

Obtain the data:

sales<-read_csv("data_fastfood_sales.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   restaurant = col_character(),
##   average_sales = col_double(),
##   us_sales = col_double(),
##   num_company_stores = col_double(),
##   num_franchised_stores = col_double(),
##   unit_count = col_double()
## )

cal <- read_csv("data_fastfood_calories.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   restaurant = col_character(),
##   item = col_character(),
##   calories = col_double(),
##   cal_fat = col_double(),
##   total_fat = col_double(),
##   sat_fat = col_double(),
##   trans_fat = col_double(),
##   cholesterol = col_double(),
##   sodium = col_double(),
##   total_carb = col_double(),
##   fiber = col_double(),
##   sugar = col_double(),
##   protein = col_double(),
##   vit_a = col_double(),
##   vit_c = col_double(),
##   calcium = col_double()
## )

Summarizing the calories dataset

##install.packages("skimr")
library(skimr)

## Warning: package 'skimr' was built under R version 4.0.4

skim(cal)

Data summary
Name	cal
Number of rows	515
Number of columns	16
_______________________
Column type frequency:
character	2
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
restaurant	0	1	5	11	0	8	0
item	0	1	5	63	0	505	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
calories	0	1.00	530.91	282.44	20	330.0	490.0	690	2430	▇▆▁▁▁
cal_fat	0	1.00	238.81	166.41	0	120.0	210.0	310	1270	▇▃▁▁▁
total_fat	0	1.00	26.59	18.41	0	14.0	23.0	35	141	▇▃▁▁▁
sat_fat	0	1.00	8.15	6.42	0	4.0	7.0	11	47	▇▃▁▁▁
trans_fat	0	1.00	0.47	0.84	0	0.0	0.0	1	8	▇▁▁▁▁
cholesterol	0	1.00	72.46	63.16	0	35.0	60.0	95	805	▇▁▁▁▁
sodium	0	1.00	1246.74	689.95	15	800.0	1110.0	1550	6080	▇▆▁▁▁
total_carb	0	1.00	45.66	24.88	0	28.5	44.0	57	156	▅▇▂▁▁
fiber	12	0.98	4.14	3.04	0	2.0	3.0	5	17	▇▅▂▁▁
sugar	0	1.00	7.26	6.76	0	3.0	6.0	9	87	▇▁▁▁▁
protein	1	1.00	27.89	17.68	1	16.0	24.5	36	186	▇▂▁▁▁
vit_a	214	0.58	18.86	31.38	0	4.0	10.0	20	180	▇▁▁▁▁
vit_c	210	0.59	20.17	30.59	0	4.0	10.0	30	400	▇▁▁▁▁
calcium	210	0.59	24.85	25.52	0	8.0	20.0	30	290	▇▁▁▁▁

Summarizing the sales dataset

skim(sales)

Data summary
Name	sales
Number of rows	19
Number of columns	6
_______________________
Column type frequency:
character	1
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
restaurant	0	1	3	15	0	19	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
average_sales	1	1189.88	541.52	360.72	857.50	1130.00	1470.10	2670.32	▆▇▆▁▁
us_sales	1	7592.69	8007.32	606.00	3499.88	4476.41	9539.12	37480.67	▇▃▁▁▁
num_company_stores	1	839.00	1875.80	0.00	53.50	276.00	677.50	8222.00	▇▁▁▁▁
num_franchised_stores	1	5998.53	5894.51	0.00	2583.00	4055.00	6497.50	25908.00	▇▅▂▁▁
unit_count	1	6838.05	5997.13	2231.00	3034.50	4332.00	7394.00	25908.00	▇▁▂▁▁

Generating Plot

sales_w1<-sales %>%
 mutate(prop=num_franchised_stores/unit_count)
 
p <- ggplot(sales_w1, aes(x=us_sales, y=unit_count)) + 
 geom_point(aes(color=prop)) + 
 scale_x_log10() + 
 scale_y_log10() +
 theme_bw() +
 geom_text_repel(aes(label=restaurant)) +
 labs(x="Total number of stores (log10 scale)", 
 y="U.S. sales in millions (log10 scale)",
 color="Proportion of stores franchised")
p

Question 2: Bar plot of average sales per restaurant

Using sales dataset above , create a bar plot with the average_sales on the x-axis and restaurant on the y-axis (Hint: consider using the coord_flip() function). The order of restaurants on the y-axis should be in decreasing order of average sales with the restaurant with the largest average sales at the top and the restaurant with the smallest average sales at the bottom. Add text to each bar on the plot with the average sales (in the thousands) for each restaurant. - Each axis should be appropriately labeled. Along the x-axis, transform the text labels to include a dollar sign in front of each number. Use the classic ggplot2 theme.

library(forcats)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

q2 <- ggplot(sales, aes(x=average_sales, y=fct_reorder(restaurant, average_sales))) +
 geom_bar(stat="identity") +
 theme_classic() + 
 geom_text(aes(label=paste0("$", round(average_sales,0))), hjust=-0.1) +
 scale_x_continuous(labels=dollar_format()) +
 labs(x="Average sales per unit store (in thousands)", 
 y="Restaurant")

q2

Question 3: Calories vs Sodium

Relationship between calories and sodium. for different entrees and we will label entree items with more sodium than 2300 mg. Here create a scatter plot with the column calories along the xaxis and the column sodium along the y-axis. - Each restaurant should have its own scatter plot (Hint: consider the facet functions). - Add a horizontal line at y=2300 in each scatter plot. - Each axis of the scatter plot should have an appropriately labeled x-axis and yaxis. - For all food items with a sodium level of greater than 2300 (mg) (the maximum daily intake from the Centers for Disease Control), add a text label each point with the name of the entree food item using the ggrepel package. - Use the classic dark-on-light ggplot2 theme. - Rename the legend.

 q3 <- ggplot(cal, aes(x=calories, y=sodium)) + 
 facet_wrap(~restaurant) +
 geom_hline(yintercept=2300) + 
 theme_bw() + 
 labs(x="Calories", y="Sodium (mg)") +
 geom_text_repel(data=cal %>% filter(sodium>2300), 
 aes(label=item), 
 direction="y",
 nudge_y=750,
 nudge_x=750,
 size=3) +
 geom_point(size=2)
q3

## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

Question 4: Calories vs Restaurant

Calories per entree at Fast Food Restaurants. Here use cal dataset above . Create a new column titled is_salad that contains a TRUE or FALSE value of whether or not the name of entree food item contains the character string “salad” in it. Create boxplots with calories on the x-axis and restaurant on the y-axis. - The order of restaurants on the y-axis should be in decreasing order of calories with the restaurant with the median calories at the top and the restaurant with the smallest median calories at the bottom. Hide any outliers in the boxplots. On top of the boxplots add a set of jittered points representing each food item. - Each point should be colored based on whether it is an item with the word “salad” in it or not. - Each axis should be appropriately labeled, the legend should be appropriately labeled, and the x-axis should be transformed to a log10 scale. Use the classic dark-on-light ggplot2 theme.

 calories_q4<-cal %>%
 mutate(is_salad=str_detect(item, "[Ss]alad"))

q4 <- ggplot(calories_q4, aes(x=fct_reorder(restaurant, calories), y=calories)) +  geom_boxplot(outlier.shape = NA) + 
 geom_point(aes(color=is_salad),
 position=position_jitterdodge()) +
 scale_y_log10() +
 coord_flip() +
 theme_bw() +
 labs(x="Restaurant", y="Calories (log10 scale)", color="Is the entree\n a salad?") +
 scale_color_discrete(labels=c("Not a salad", "Salad"))
q4

Question 5: Restaurant vs Us Sales

Comparision of US Sales and Sugar Content at Fast Food Restaurants. Here use cal dataset above .Remove rows that contain the Taco Bell restaurant - For each restaurant calculate the median amount of sugar in each entree item. Using this summarized dataset, combine this summarized dataset with the data_fastfood_sales.csv dataset. The combined dataset should only include restaurants that are included in both datasets. Using this new dataset, create a bar plot with restaurant on the x-axis and on the us_sales y-axis. The order of restaurants on the x-axis should be in increasing order of US sales with the restaurant with the largest average sales on the right and the restaurant with the smallest US sales on the left. Color the bars by the median amount of sugar in the entree items from that restaurant. Each axis should be appropriately labeled. - Use the classic ggplot2 theme.

calories_q5<-cal %>%
 filter(restaurant!="Taco Bell") %>%
 group_by(restaurant) %>%
 summarise(median_sugar=median(sugar)) %>%
 filter(restaurant %in% sales$restaurant) %>%
 inner_join(sales[,c(1,3)], by="restaurant")
ggplot(calories_q5, aes(x=fct_reorder(restaurant, us_sales), y=us_sales)) +
 geom_bar(aes(fill=median_sugar), stat="identity") +
 scale_fill_viridis_c() +
 labs(x="Restaurant", y="U.S. sales (in millions)", fill="Median sugar (grams)\n in fast food
entrees") + 
 theme_classic()

Conclutions

In this project we have seen the power of graphics when it comes to displaying data analysis results. GGPLOT2 is a powerful and useful package to be used in data science.

Visualizing Data in the Tidyverse - Fast Food Restaurant Plots: Final Course Project

Giovanni Barrero Ortiz

4/3/2021