## # A tibble: 6 × 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Artisan G… 380 60 7 2 0 95
## 2 Mcdonalds Single Ba… 840 410 45 17 1.5 130
## 3 Mcdonalds Double Ba… 1130 600 67 27 3 220
## 4 Mcdonalds Grilled B… 750 280 31 10 0.5 155
## 5 Mcdonalds Crispy Ba… 920 410 45 12 0.5 120
## 6 Mcdonalds Big Mac 540 250 28 10 1 80
## # ℹ 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>, sugar <dbl>,
## # protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>
mcdonalds <- fastfood %>%
filter(restaurant == "Mcdonalds")
dairy_queen <- fastfood %>%
filter(restaurant == "Dairy Queen")
summary(mcdonalds$cal_fat)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 50.0 160.0 240.0 285.6 320.0 1270.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 160.0 220.0 260.5 310.0 670.0
By looking at the McDonald’s historgram compared to the dairy queen histogram, I can tell that McDonald’s has more highly caloric in fat food items than dairy queen. Dairy queen has about two items that is over 700 calories while McDonald’s has at least 4 items that range from 700 to 1200 calories. The histogram of McDonald’s is more skewed to the left, meaning they have more items that range from 0 to 400 calories, while dairy queen is more evenly distributed and had a bell shaped look, although they have slightly more items ranging from 0 to the mean which is 400
Based on this plot, this nearly does look like a normal distribution, but as I said in Exercise 1, the bell curve does seem to be slightly skewed to the left
dqmean <- mean(dairy_queen$cal_fat)
dqsd <- sd(dairy_queen$cal_fat)
ggplot(data = dairy_queen, aes(x = cal_fat)) +
geom_blank() +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, args = c(mean = dqmean, sd = dqsd), col = "tomato")sim_norm. Do all of
the points fall on the line? How does this plot compare to the
probability plot for the real data? (Since sim_norm is not
a data frame, it can be put directly into the sample
argument and the data argument can be dropped.)Not all the points fall on the same line, I can say that this line is even more closer to following the line than the line with the real data, so I will say that the sim_norm data is closer to being normally distributed than the real data
sim_norm <- rnorm(n = nrow(dairy_queen), mean = dqmean, sd = dqsd)
ggplot(data = NULL, aes(sample = sim_norm)) +
geom_line(stat = "qq")The normal probability plots for the calories from the fat look similar to the simulated data, you only see the differences really towards the right end of the plot. I believe the plots however, provide evidence that the calories are nearly normal.
#1 What is the probability that a randomly chosen item has fewer than 300 calories from fat in mcdonalds and dairy queen? Theoretical calculation mcdonalds
mcmean <- mean(mcdonalds$cal_fat)
mcsd <- sd(mcdonalds$cal_fat)
1 - pnorm(q = 300, mean = mcmean, sd = mcsd)## [1] 0.4740374
Empirical Calculation using the data set mcdonalds
## # A tibble: 1 × 1
## percent
## <dbl>
## 1 0.632
There is a 47.4 percent chance based off of the theoretical evaluation and a 63.2 percent chance empirically that a randomly chosen item has fewer than 300 calories from fat. That is a 15.8% difference in data.
Dairy queen Theoretical
## [1] 0.4002993
Empirical
## # A tibble: 1 × 1
## percent
## <dbl>
## 1 0.667
There is a 40.0 percent chance based off of the theoretical evaluation and a 66.7 percent chance empirically that a randomly chosen item has fewer than 300 calories from fat. That is a 26.7% difference in data. Based off of this data, I can conclude that mcdonalds had the closer agreement by 10.9% ## More Practice
Dairy Queen
Sonic
Subway
Taco Bell
Chick Fil-A
Burger King
Arbys
The reason for the stepwise pattern may be due to the varrying types of food across the menu, a few items may be high in sodium while another may low sodium
Based on this normal probability plot, I find that this plot is symmetric
historgram
ggplot(data = bk, aes(x = total_carb)) +
geom_blank() +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, args = c(mean = bkmean, sd = bksd), col = "tomato")