ggplot(data = dairy_queen, aes(x = cal_fat)) +geom_blank() +geom_histogram(aes(y = ..density..), fill ="light green") +stat_function(fun = dnorm, args =c(mean = dqmean, sd = dqsd),color ="purple")+ggtitle("Histogram and Density Curve of Calories from Fat in Dairy Queen Products")+theme_bw()
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Creating McDonalds histogram with overlayed distribution curve for calories from fat
ggplot(data = mcdonalds, aes(x = cal_fat)) +geom_blank() +geom_histogram(aes(y =stat(density)), fill ="light green") +stat_function(fun = dnorm, args =c(mean = dqmean, sd = dqsd), col ="purple")+theme_bw() +ggtitle("Histogram of Calories from Fat from McDonalds Products")
Warning: `stat(density)` was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
It appears that the distribution of calories from fat for Dairy Queen products is relatively normal, but those from McDonalds are strongly skewed right
Exercise 2
Based on the this plot, does it appear that the data follow a nearly normal distribution?
Answer: The distribution of the calories from fat of Dairy Queen’s items is close to normal (bell shaped), whereas the distribution of the calories from fat of McDonalds’ items is not normal (this isn’t as apparent in the histograms but is made much more clear in by the density curve). The center of McDonalds’ curve is around 280 calories from fat while the center of Dairy Queen’s curve is around 250 calories from fat. McDonalds’ curve is heavily skewed to the right, with a hefty max value of over 1250 calories from fat. Dairy Queen’s curve is far less skewed, with a small skew to the right and a max value around 675 calories from fat.
Evaluating the Normal Distribution
Constructing normal probablility plot of Dairy Queen’s “cal_fat”
ggplot(data= dairy_queen, aes(sample= cal_fat)) +geom_line(stat="qq", color ="purple") +stat_qq_line()+ggtitle("Quantile Plot of Calories from Fat from Dairy Queen Products") +theme_bw()
Simulating data from a normal distribution
sim_norm <-rnorm(n =nrow(dairy_queen), mean = dqmean, sd = dqsd)
Exercise 3
Creating normal probability plot of “sim_norm”
ggplot(data=NULL, aes(sample= sim_norm)) +geom_line(stat="qq", color ="purple") +stat_qq_line() +theme_bw()
Answer: The points do not directly fall on the x= y diagonal line, but this plot is more closely aligned to the diagonal line, so it’s closer to a normal distribution than the Dairy Queen data. Notably, the values here tend to be above the diagonal line, while most of the values in our Dairy Queen distribution fall below the diagonal line.
Create many Q-Q plot simulations against Dairy Queen data
qqnormsim(sample = cal_fat, data = dairy_queen) +theme_bw()
Exercise 4
Does the normal probability plot for the calories from fat look similar to the plots created for the simulated data?
Answer: Yes, the Dairy Queen “cal_fat” normal probability plot is pretty closely aligned with all the simulated data probability plots, although it curves slightly below the y=x line while the simulations did not. The simulations are generally more closely aligned with the y=x line than our Dairy Queen data, although sim 2 is arguably around the same closeness to the Dairy Queen data. Sim 2 also has a clear “s” shape that other sims and our Dairy Queen data generally don’t have.
Exercise 5
Using the same technique, determine whether or not the calories from McDonald’s menu appear to come from a normal distribution.
Create many Q-Q plot simulations against McDonalds data
Answer: The McDonalds data does not appear to come from a normal distribution. It more resembles an exponential growth curve or cubic growth curve.
Calculate Z score
Answer “What is the probability that a randomly chosen Dairy Queen product has more than 600 calories from fat?” while assuming normal distribution (theoretical probability)
Answer: There is also a 0% chance of randomly selecting a Taco Bell item above 600 calories from fat.
Both empirical probabilities very closely matched the theoretical probabilities, mainly because I chose a relatively high number of calories from fat (neither Taco Bell nor Chick Fil-A appear to have any items with over 600 calories from fat). I chose 600 so I could compare Chick Fil-A and Taco Bell with McDonalds and Dairy Queen.
ggplot(data= sonic, aes(sample= sodium)) +geom_line(stat="qq", color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Sonic Products")+theme_bw()
Create Q-Q plot for Arbys sodium data
ggplot(data= arbys, aes(sample= sodium)) +geom_line(stat="qq", color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Arbys Products")+theme_bw()
Create Q-Q plot for Burger King sodium data
ggplot(data= burger_king, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Burger King Products")+theme_bw()
Create Q-Q plot for Subway sodium data
ggplot(data= subway, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Subway Products")+theme_bw()
Creating Q-Q plot for McDonalds sodium data
ggplot(data= mcdonalds, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in McDonalds Products")+theme_bw()
Creating Q-Q plot for Dairy Queen sodium data
ggplot(data= dairy_queen, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Dairy Queen Products")+theme_bw()
Creating Q-Q plot for Taco Bell sodium data
ggplot(data= taco_bell, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Taco Bell Products")+theme_bw()
Creating Q-Q plot for Chick Fil-A sodium data
ggplot(data= chick_fil_a, aes(sample= sodium)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Sodium Content in Chic Fil A Products")+theme_bw()
Arbys and Burger King appear to have the closest to normal distributions for their sodium data.
Exercise 8
Explore histograms of Taco Bell and Burger King sodium data since they both showed stepwise patterns in their Q-Q plots
ggplot(data= taco_bell, aes(x= sodium)) +geom_blank() +geom_histogram(aes(y= ..density..), bins=7, fill ="Light Green") +stat_function(fun= dnorm, args=c(mean=mean(taco_bell$sodium),sd=sd(taco_bell$sodium)), col="purple")+ggtitle("Histogram and Density Curve of Sodium Content in Taco Bell Products")+theme_bw()
ggplot(data= burger_king, aes(x= sodium)) +geom_blank() +geom_histogram(aes(y= ..density..), bins=7, fill ="light green") +stat_function(fun= dnorm, args=c(mean=mean(burger_king$sodium),sd=sd(burger_king$sodium)),col="purple")+ggtitle("Histogram and Density Curve of Sodium Content in Burger King Products")+theme_bw()
The restaurants with the most normal distributions appear to be the ones with stepwise distributions. Perhaps there is a correlation?
Exercise 9
Create a normal probability plot for the total carbs of Taco Bell items
ggplot(data= taco_bell, aes(sample= total_carb)) +geom_line(stat="qq",color ="purple")+stat_qq_line() +ggtitle("QQPlot of Total Carbs Content in Taco Bell A Products")+theme_bw()
Create density histogram of the total carbs of Taco Bell items with normal distribution curve