install.packages(“ggpubr”)
Make a plot (or plots) to visualize the distributions of the amount of calories from fat of the options from these two restaurants. How do their centers, shapes, and spreads compare?
Both the plots for calories from fat of mcdonlads and dairy queens appears right skewed. Both hit there most frequent points at around 200 calories[appear most often]. Dairy queen has a max reach for calories from fat at about 700. McDonalds reaches well over 1000 calories from fat.
## # A tibble: 6 x 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Arti~ 380 60 7 2 0 95
## 2 Mcdonalds Sing~ 840 410 45 17 1.5 130
## 3 Mcdonalds Doub~ 1130 600 67 27 3 220
## 4 Mcdonalds Gril~ 750 280 31 10 0.5 155
## 5 Mcdonalds Cris~ 920 410 45 12 0.5 120
## 6 Mcdonalds Big ~ 540 250 28 10 1 80
## # ... with 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>,
## # sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>,
## # salad <chr>
## # A tibble: 515 x 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mcdonalds Arti~ 380 60 7 2 0 95
## 2 Mcdonalds Sing~ 840 410 45 17 1.5 130
## 3 Mcdonalds Doub~ 1130 600 67 27 3 220
## 4 Mcdonalds Gril~ 750 280 31 10 0.5 155
## 5 Mcdonalds Cris~ 920 410 45 12 0.5 120
## 6 Mcdonalds Big ~ 540 250 28 10 1 80
## 7 Mcdonalds Chee~ 300 100 12 5 0.5 40
## 8 Mcdonalds Clas~ 510 210 24 4 0 65
## 9 Mcdonalds Doub~ 430 190 21 11 1 85
## 10 Mcdonalds Doub~ 770 400 45 21 2.5 175
## # ... with 505 more rows, and 9 more variables: sodium <dbl>, total_carb <dbl>,
## # fiber <dbl>, sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>,
## # calcium <dbl>, salad <chr>
mcdonalds <- fastfood %>%
dplyr::filter(restaurant == "Mcdonalds")
dairy_queen <- fastfood %>%
dplyr::filter(restaurant == "Dairy Queen")
mcdonalds %>%
ggplot() +
geom_histogram(aes(x = cal_fat), bins = 6) +
ggtitle("McDonalds fat calories") +
xlab("Calories from Fat") +
ylab("Frequency")
dairy_queen %>%
ggplot() +
geom_histogram(aes(x = cal_fat), bins = 6) +
ggtitle("Dairy queen fat calories") +
xlab("Calories from Fat") +
ylab("Frequency")
dqmean <- mean(dairy_queen$cal_fat)
dqsd <- sd(dairy_queen$cal_fat)
ggplot(data = dairy_queen, aes(sample = cal_fat)) +
geom_line(stat = "qq")
Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data? (Since sim_norm is not a dataframe, it can be put directly into the sample argument and the data argument can be dropped.)
Upon observing the plots, not all of the points fall on the line. When using qqnormsim function for a cal_fat sample for mcdonalds data, we are able to compare the original data with 8 other sim data plots.
Does the normal probability plot for the calories from fat look similar to the plots created for the simulated data? That is, do the plots provide evidence that the female heights are nearly normal? We witness that the normal probability plot for the calories from the fat look similar to the plots created for the simulated data. Overall we do see some dissimilarities as the cal_fat amount increases though
Using the same technique, determine whether or not the calories from McDonald’s menu appear to come from a normal distribution. Similar to above question, we do see that the calories from McDonald’s menu appear to come from close to a normal distribution. We do see on the original data plot a upward position further right, however overall it is similar to the sim1-8
Write out two probability questions that you would like to answer about any of the restaurants in this dataset. Calculate those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which one had a closer agreement between the two methods?
Probability for cholestrol greater than 20 at burger king?
## # A tibble: 6 x 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Burger Ki~ Amer~ 1550 1134 126 47 8 805
## 2 Burger Ki~ Baco~ 1000 585 65 24 3 200
## 3 Burger Ki~ Baco~ 330 140 16 7 0 55
## 4 Burger Ki~ Baco~ 290 120 14 6 0.5 40
## 5 Burger Ki~ Baco~ 1040 630 48 28 2.5 220
## 6 Burger Ki~ Baco~ 730 351 39 9 0 90
## # ... with 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>,
## # sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>,
## # salad <chr>
chols_mean <- mean(burgerking$cholesterol)
chols_sd <- sd(burgerking$cholesterol)
1-pnorm(20, mean = chols_mean, sd = chols_sd)
## [1] 0.7743965
#empirical
burgerking %>%
dplyr::filter(cholesterol > 20) %>%
dplyr::summarise(percent = n() / nrow(burgerking))
## # A tibble: 1 x 1
## percent
## <dbl>
## 1 0.914
Probability for sat_fat greater than 5 at burger king?
## # A tibble: 6 x 17
## restaurant item calories cal_fat total_fat sat_fat trans_fat cholesterol
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Burger Ki~ Amer~ 1550 1134 126 47 8 805
## 2 Burger Ki~ Baco~ 1000 585 65 24 3 200
## 3 Burger Ki~ Baco~ 330 140 16 7 0 55
## 4 Burger Ki~ Baco~ 290 120 14 6 0.5 40
## 5 Burger Ki~ Baco~ 1040 630 48 28 2.5 220
## 6 Burger Ki~ Baco~ 730 351 39 9 0 90
## # ... with 9 more variables: sodium <dbl>, total_carb <dbl>, fiber <dbl>,
## # sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>,
## # salad <chr>
sat_fat_mean <- mean(burgerking$sat_fat)
sat_fat_sd <- sd(burgerking$sat_fat)
2-pnorm(5, mean = sat_fat_mean, sd = sat_fat_sd)
## [1] 1.758486
#empirical
burgerking %>%
dplyr::filter(sat_fat > 5) %>%
dplyr::summarise(percent = n() / nrow(burgerking))
## # A tibble: 1 x 1
## percent
## <dbl>
## 1 0.7
Now let’s consider some of the other variables in the dataset. Out of all the different restaurants, which ones’ distribution is the closest to normal for sodium? Based on the plots below, Burger King had the distributions closest to normal for sodium. For me, taco bell would possibly be included in closest to normal, although this may be reconsidered.
fastfood %>%
group_by(restaurant) %>%
ggplot() +
geom_histogram(aes(x = sodium), bins = 11) +
ggtitle("Rest. Sodium") +
xlab("Sodium") +
ylab("Freq") +
facet_wrap(. ~restaurant)
fastfood %>%
group_by(restaurant) %>%
ggplot(aes(sample = sodium)) +
geom_line(stat = "qq") +
facet_wrap(.~restaurant)
Note that some of the normal probability plots for sodium distributions seem to have a stepwise pattern. why do you think this might be the case? My assumption would possibly be that the stepwise pattern may be in correlation to different food items offered. Certain foods offered may be highly produced then others. For example comparing BK cheeseburger to rodeo kings, we see a huge difference in sodium value
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for the total carbohydrates from a restaurant of your choice. Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.
We see a left skew in our burger plot for total carb. Upon viewing the histogram we do confirm this
burgerkingplot <- burgerking %>%
ggplot() +
geom_line(aes(sample = total_carb), stat = "qq") +
ggtitle("burgerking - Carbohydrates")
burgerkingplot
burgerking_hist <- burgerking%>%
ggplot() +
geom_histogram(aes(x = total_carb), binwidth = 11) +
xlab("total carbohydrates") +
ylab("frequency") +
ggtitle("burgerking Carbohydrates")
burgerking_hist
…