library(tidyverse)
library(openintro)

Exercise 1

Make a plot (or plots) to visualize the distributions of the amount of calories from fat of the options from these two restaurants. How do their centers, shapes, and spreads compare?

ggplot(data = mcdonalds, aes(x = cal_fat)) +
  geom_bar(stat="count", width=5.0, fill="steelblue")+ 
  ggtitle("Mcdonald's: Calories from Fat")

ggplot(data = dairy_queen, aes(x = cal_fat)) +
  geom_bar(stat="count", width=5.0, fill="steelblue")+ 
  ggtitle("Dairy Queen: Calories from Fat")

The McDonald’s data is skewed further to the right than the Dairy Queen data.

Exercise 2

Based on the this plot, does it appear that the data follow a nearly normal distribution?

dqmean <- mean(dairy_queen$cal_fat)
dqsd   <- sd(dairy_queen$cal_fat)
ggplot(data = dairy_queen, aes(x = cal_fat)) +
        geom_blank() +
        geom_histogram(aes(y = ..density..)) +
        stat_function(fun = dnorm, args = c(mean = dqmean, sd = dqsd), col = "tomato")

Using the density plot the distribution curve nearly follows a normal distribution.

Exercise 3

Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data? (Since sim_norm is not a data frame, it can be put directly into the sample argument and the data argument can be dropped.)
When comparing the dairy queen density plot to a simulated normal distribution it can be seen that the dairy queen plot does match match a normal distribution. Within the 1st, 2nd, 3rd quantile the dairy queen is close but 4th is deviates from the normal line. Looking at the simulated normal plot it produced a similar result.

ggplot(data = dairy_queen, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")

sim_norm <- rnorm(n = nrow(dairy_queen), mean = dqmean, sd = dqsd)
qqnorm(sim_norm)
qqline(sim_norm)

Exercise 4

qqnormsim(sample = cal_fat, data = dairy_queen)

Does the normal probability plot for the calories from fat look similar to the plots created for the simulated data? That is, do the plots provide evidence that the calories are nearly normal?
Looking at multiple simulations it can be seen that the dairy queen data is most similar to simulation 5. This suggests that the data is nearly normal.

Exercise 5

Using the same technique, determine whether or not the calories from McDonald’s menu appear to come from a normal distribution.
It can be said the data set from McDonald’s is not a normal distribution

ggplot(data = mcdonalds, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")

Exercise 6

Write out two probability questions that you would like to answer about any of the restaurants in this dataset. Calculate those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which one had a closer agreement between the two methods?
Do the calories from fat at Arby’s follow a normal distribution? Do the calories from fat at Sonic follow a normal distribution?

Arbys <- fastfood %>%
  filter(restaurant == "Arbys")
Sonic <- fastfood %>%
  filter(restaurant == "Sonic")

ggplot(data = Arbys, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Arby's")

ggplot(data = Sonic, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")

After reviewing the QQ plot for Arbys and Sonic, it can be seen that only Arby’s data follows closely a normal distribution

Exercise 7

Now let’s consider some of the other variables in the dataset. Out of all the different restaurants, which ones’ distribution is the closest to normal for sodium?
In reviewing the distribution of data from all restaurants, it was found that Burger King and .

ChickFilA <- fastfood %>%
  filter(restaurant == "Chick Fil-A")

BurgerKing <- fastfood %>%
  filter(restaurant == "Burger King")

Subway <- fastfood %>%
  filter(restaurant == "Subway")

TacoBell <- fastfood %>%
  filter(restaurant == "Taco Bell")

ggplot(data = Arbys, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Arby's")

ggplot(data = BurgerKing, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Burger King")

ggplot(data = mcdonalds, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("McDonald's")

ggplot(data = Sonic, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")

ggplot(data = ChickFilA, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Chick Fil-A")

ggplot(data = dairy_queen, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Dairy Queen")

Exercise 8

Note that some of the normal probability plots for sodium distributions seem to have a stepwise pattern. why do you think this might be the case?
This plot seems to indicate that sodium for foods at the restaurants are left skewed.

Exercise 9

As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for the total carbohydrates from a restaurant of your choice. Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.

ggplot(data = Sonic, aes(sample = total_carb)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")

This plot seems to indicate that the carbohydrate for foods at Sonic are left skewed.

---
title: "Lab 4: The Normal Distribution"
author: "Euclide N. Rodriguez"
date: "`r Sys.Date()`"
output: openintro::lab_report
---

```{r load-packages, message=FALSE}
library(tidyverse)
library(openintro)
```

```{r, message=FALSE, include=FALSE}
data("fastfood", package='openintro')
head(fastfood)
fastfood
```


```{r, message=FALSE, include=FALSE}

mcdonalds <- fastfood %>%
  filter(restaurant == "Mcdonalds")
dairy_queen <- fastfood %>%
  filter(restaurant == "Dairy Queen")
```

### Exercise 1
Make a plot (or plots) to visualize the distributions of the amount of calories from fat of the options from these two restaurants. How do their centers, shapes, and spreads compare?
```{r}
ggplot(data = mcdonalds, aes(x = cal_fat)) +
  geom_bar(stat="count", width=5.0, fill="steelblue")+ 
  ggtitle("Mcdonald's: Calories from Fat")

ggplot(data = dairy_queen, aes(x = cal_fat)) +
  geom_bar(stat="count", width=5.0, fill="steelblue")+ 
  ggtitle("Dairy Queen: Calories from Fat")

```
**The McDonald's data is skewed further to the right than the Dairy Queen data.**  

### Exercise 2

Based on the this plot, does it appear that the data follow a nearly normal distribution?
```{r, message=FALSE}
dqmean <- mean(dairy_queen$cal_fat)
dqsd   <- sd(dairy_queen$cal_fat)
ggplot(data = dairy_queen, aes(x = cal_fat)) +
        geom_blank() +
        geom_histogram(aes(y = ..density..)) +
        stat_function(fun = dnorm, args = c(mean = dqmean, sd = dqsd), col = "tomato")
```
**Using the density plot the distribution curve nearly follows a normal distribution.** 

### Exercise 3
Make a normal probability plot of sim_norm. Do all of the points fall on the line? How does this plot compare to the probability plot for the real data? (Since sim_norm is not a data frame, it can be put directly into the sample argument and the data argument can be dropped.)    
**When comparing the dairy queen density plot to a simulated normal distribution it can be seen that the dairy queen plot does match match a normal distribution.  Within the 1st, 2nd, 3rd quantile the dairy queen is close but 4th is deviates from the normal line.  Looking at the simulated normal plot it produced a similar result.**

```{r}
ggplot(data = dairy_queen, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")

sim_norm <- rnorm(n = nrow(dairy_queen), mean = dqmean, sd = dqsd)

```

```{r}
qqnorm(sim_norm)
qqline(sim_norm)
```

### Exercise 4



```{r, message=FALSE}
qqnormsim(sample = cal_fat, data = dairy_queen)
```

Does the normal probability plot for the calories from fat look similar to the plots created for the simulated data? That is, do the plots provide evidence that the calories are nearly normal?  
**Looking at multiple simulations it can be seen that the dairy queen data is most similar to simulation 5.  This suggests that the data is nearly normal.**

```{r, message=FALSE, include=FALSE}
1 - pnorm(q = 600, mean = dqmean, sd = dqsd)

dairy_queen %>% 
  filter(cal_fat > 600) %>%
  summarise(percent = n() / nrow(dairy_queen))
```



### Exercise 5
Using the same technique, determine whether or not the calories from McDonald’s menu appear to come from a normal distribution.  
**It can be said the data set from McDonald's is not a normal distribution**
```{r}
ggplot(data = mcdonalds, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")
```

### Exercise 6
Write out two probability questions that you would like to answer about any of the restaurants in this dataset. Calculate those probabilities using both the theoretical normal distribution as well as the empirical distribution (four probabilities in all). Which one had a closer agreement between the two methods?  
**Do the calories from fat at Arby's follow a normal distribution? Do the calories from fat at Sonic follow a normal distribution?**

```{r}
Arbys <- fastfood %>%
  filter(restaurant == "Arbys")
Sonic <- fastfood %>%
  filter(restaurant == "Sonic")

ggplot(data = Arbys, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Arby's")

ggplot(data = Sonic, aes(sample = cal_fat)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")
```
**After reviewing the QQ plot for Arbys and Sonic, it can be seen that only Arby's data follows closely a normal distribution**

### Exercise 7
Now let’s consider some of the other variables in the dataset. Out of all the different restaurants, which ones’ distribution is the closest to normal for sodium?  
**In reviewing the distribution of data from all restaurants, it was found that Burger King and .**

```{r}
ChickFilA <- fastfood %>%
  filter(restaurant == "Chick Fil-A")

BurgerKing <- fastfood %>%
  filter(restaurant == "Burger King")

Subway <- fastfood %>%
  filter(restaurant == "Subway")

TacoBell <- fastfood %>%
  filter(restaurant == "Taco Bell")

ggplot(data = Arbys, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Arby's")

ggplot(data = BurgerKing, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Burger King")

ggplot(data = mcdonalds, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("McDonald's")

ggplot(data = Sonic, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")

ggplot(data = ChickFilA, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Chick Fil-A")

ggplot(data = dairy_queen, aes(sample = sodium)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Dairy Queen")

```

### Exercise 8
Note that some of the normal probability plots for sodium distributions seem to have a stepwise pattern. why do you think this might be the case?  
**This plot seems to indicate that sodium for foods at the restaurants are left skewed.**

### Exercise 9
As you can see, normal probability plots can be used both to assess normality and visualize skewness. Make a normal probability plot for the total carbohydrates from a restaurant of your choice. Based on this normal probability plot, is this variable left skewed, symmetric, or right skewed? Use a histogram to confirm your findings.  
```{r}
ggplot(data = Sonic, aes(sample = total_carb)) + 
  geom_line(stat = "qq")+ 
  ggtitle("Sonic")

```

**This plot seems to indicate that the carbohydrate for foods at Sonic are left skewed.**
