First, load the packages.

Next, import the data set Cereal_Data.xslx from Canvas and display the first 6 rows of the data set.

load("~/Cereal_Data_1_.RData")
 head(Cereal_Data_1_)
## # A tibble: 6 x 15
##   Shelf Name  Manufacturer Type  Calories Protein   Fat Sodium Fiber
##   <chr> <chr> <chr>        <chr>    <dbl>   <dbl> <dbl>  <dbl> <dbl>
## 1 Top   100%… N            C           70       4     1    130  10  
## 2 Top   100%… Q            C          120       3     5     15   2  
## 3 Top   All-… K            C           70       4     1    260   9  
## 4 Top   All-… K            C           50       4     0    140  14  
## 5 Top   Almo… R            C          110       2     2    200   1  
## 6 Bott… Appl… G            C          110       2     2    180   1.5
## # … with 6 more variables: Carbohydrates <dbl>, Sugars <dbl>,
## #   Potassium <dbl>, Vitamins <dbl>, `Weight (of One Serving Cup)` <dbl>,
## #   `Cups in Serving` <dbl>

2. Consider the variables in the data set. Identify the variables that are qualitative and those that are quantitative.

str(Cereal_Data_1_)
## Classes 'tbl_df', 'tbl' and 'data.frame':    77 obs. of  15 variables:
##  $ Shelf                      : chr  "Top" "Top" "Top" "Top" ...
##  $ Name                       : chr  "100%_Bran" "100%_Natural_Bran" "All-Bran" "All-Bran_with_Extra_Fiber" ...
##  $ Manufacturer               : chr  "N" "Q" "K" "K" ...
##  $ Type                       : chr  "C" "C" "C" "C" ...
##  $ Calories                   : num  70 120 70 50 110 110 110 130 90 90 ...
##  $ Protein                    : num  4 3 4 4 2 2 2 3 2 3 ...
##  $ Fat                        : num  1 5 1 0 2 2 0 2 1 0 ...
##  $ Sodium                     : num  130 15 260 140 200 180 125 210 200 210 ...
##  $ Fiber                      : num  10 2 9 14 1 1.5 1 2 4 5 ...
##  $ Carbohydrates              : num  5 8 7 8 14 10.5 11 18 15 13 ...
##  $ Sugars                     : num  6 8 5 0 8 10 14 8 6 5 ...
##  $ Potassium                  : num  280 135 320 330 NA 70 30 100 125 190 ...
##  $ Vitamins                   : num  25 0 25 25 25 25 25 25 25 25 ...
##  $ Weight (of One Serving Cup): num  1 1 1 1 1 1 1 1.33 1 1 ...
##  $ Cups in Serving            : num  0.33 1 0.33 0.5 0.75 0.75 1 0.75 0.67 0.67 ...

Qualitative variables are the shelf, name, manufacturer, and type Quantitative are calories, protein, fat, sodium, fiber, carbohyrates, sugars, potassium, vitamins, weight, and cups in serving. — #### 3. Consider the variable Shelf. This variable is the shelf position of the cereal (bottom, middle, top) starting from the floor up. To see whether the shelf position is associated with one measure of nutritive value, the amount of sugar, look at the data for the variable Sugars. Compare the sugar content of cereals on each shelf by making a separate histogram for the sugar content of the cereals on each shelf: a total of three histograms. Use the sugar content values as they are - do not factor in the serving size. (The data for one of the cereals, Quaker Oatmeal, is missing. Just continue with what is available. That’s the way it is in real life - values are missing, files are incomplete, etc.)

topshelf <- subset(Cereal_Data_1_, Shelf == "Top")
middleshelf <- subset(Cereal_Data_1_, Shelf == "Middle")
bottomshelf <- subset(Cereal_Data_1_, Shelf == "Bottom")
gf_histogram(~Sugars, title = "A Histogram for the Sugar Quanity on the Top Shelf", ylab = "Cereals", data=topshelf, binwidth = 2, breaks=seq(0,16, by =2), color="blue", fill="green")

gf_histogram(~Sugars, title = "A Histogram for the Sugar Quanity on the Middle Shelf", ylab = "Cereals",data=middleshelf, binwidth = 2, breaks=seq(0,16, by =2), color="pink", fill="blue")

gf_histogram(~Sugars, title = "A Histogram for the Sugar Quanity on the Bottom Shelf", ylab = "Cereals", data=bottomshelf, binwidth = 2, breaks=seq(0,16, by =2), color="blue", fill="purple")
## Warning: Removed 1 rows containing non-finite values (stat_bin).

4. Briefly describe the distribution in each histogram with respect to shape. Based on your histograms, which shelf position has cereals with the most sugar?

The top shelf is symetrical which means the amount of sugar on the top shelf is even. The middle shelf is skewed left so the majority of the cerals on that shelf have high sugar content. The bottom shelf is skewed right so the the majority of cereals on this shelf are not as sugary.

It is not in order from most to least amount of sugar. The top shelf has a more diverse amount of sugar than the other shelves. It is more symmetrical and not all sugary or not not sugary. The middle shelf mostly has all the sugary cereals and the bottom has the least sugary ones. So I believe that is it somewhat realted to the sugar content but that there has to be another confounding variable. The middle shelf may have more sugar since it is eye contact to kids, so they would be drawn to it more.

6. Find the five-number-summary, mean, and standard deviation of the variable “Fiber”.

favstats(Cereal_Data_1_$Fiber)
##  min Q1 median Q3 max     mean       sd  n missing
##    0  1      2  3  14 2.151948 2.383364 77       0