Setup

You’ll need the following packages:

## Loading required package: tidyverse
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Once you upload your data set, you should end up with a table in your workspace as we did in class. Change the file name to your file’s name, and remove the # from the start of the line.

fastfood <- read_csv("fastfood.csv")

For your midterm take-home assignment, you will carry out an exploratory data analysis on the data set you’ve chosen. While there is no one way to perform EDA, your analysis will be structured in a way similar to what we’ve seen in class.

Submit your responses to these questions as a PDF which you knit from this file. Be sure to include the code you use. This assignment is due on Canvas at the assignment labeled “Midterm 1 Take-Home”.

Using RStudio, you will produce plots, calculate summary statistics to go along with the plots, and interpret the results. For each plot, you will also generate some questions and answers.

Variables and Types, Missing Data

What type of variation occurs within variables?

fastfood %>% ggplot(aes(x=calories))+geom_histogram(bins=10)

 fastfood%>% summarize(mean(calories))
## # A tibble: 1 × 1
##   `mean(calories)`
##              <dbl>
## 1             531.
 fastfood%>% summarize(median(calories))
## # A tibble: 1 × 1
##   `median(calories)`
##                <dbl>
## 1                490
 fastfood%>% summarize(sd(calories))
## # A tibble: 1 × 1
##   `sd(calories)`
##            <dbl>
## 1           282.
fastfood%>%ggplot(aes(x=restaurant))+geom_bar()

fastfood %>% filter(restaurant =="Taco Bell")%>% ggplot(aes(x=calories))+geom_histogram(bins=10)

What relationships do you see between variables?

fastfood%>% ggplot(mapping=aes(x=calories, y=sodium))+geom_point()+geom_smooth(method="lm",se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'

fastfood%>% ggplot(aes(x=as.factor(restaurant), y=calories))+geom_boxplot()

fastfood%>%group_by(restaurant) %>% summarize(median(calories))
## # A tibble: 8 × 2
##   restaurant  `median(calories)`
##   <chr>                    <dbl>
## 1 Arbys                      550
## 2 Burger King                555
## 3 Chick Fil-A                390
## 4 Dairy Queen                485
## 5 Mcdonalds                  540
## 6 Sonic                      570
## 7 Subway                     460
## 8 Taco Bell                  420

Conclusion: On your own

After showing you can use some basic data manipulation and visualization tools, I’ll want to know what observations you can draw from these visualizations and summary statistics. Here is your chance to shine - tell me what you know about the dataset you chose! Here are some prompts to help frame this discussion.

fastfood %>%ggplot(mapping = aes(x = sugar, y =sodium, color = calories)) +geom_point() +facet_grid(. ~ restaurant)