This homework is due by 11:55 pm on Sunday, October 24th, 2021. To complete this assignment, follow these steps:
  1. Download the coding_assignment_2.Rmd file from LMS.

  2. Open coding_assignment_2.Rmd in RStudio.

  3. Replace the “Your Name Here” text in the author: field with your own name.

  4. Supply your solutions to the homework by editing coding_assignment_2.Rmd.

  5. When you have completed the homework and have checked that your code both runs in the Console and knits correctly when you click Knit HTML, rename the R Markdown file to coding_assignment_2_YourNameHere.Rmd, and submit BOTH your rmarkdown AND html file on LMS (YourNameHere should be changed to your own name.)

Homework tips:
  1. Recall the following useful RStudio hotkeys.
Keystroke Description
<tab> Autocompletes commands and filenames, and lists arguments for functions.
<up> Cycles through previous commands in the console prompt
<ctrl-up> Lists history of previous commands matching an unfinished one
<ctrl-enter> Runs current line from source window to Console. Good for trying things out ideas from a source file.
<ESC> Aborts an unfinished command and get out of the + prompt

Note: Shown above are the Windows/Linux keys. For Mac OS X, the <ctrl> key should be substituted with the <command> (⌘) key.

  1. Instead of sending code line-by-line with <ctrl-enter>, you can send entire code chunks, and even run all of the code chunks in your .Rmd file. Look under the menu of the Source panel.

  2. Run your code in the Console and Knit HTML frequently to check for errors.

  3. You may find it easier to solve a problem by interacting only with the Console at first.

Problem 1: Exploring Dataframes

We’ll start by loading the cereal dataset that has been provided to you LMS. Call your dataset cereal. Once you have loaded your dataset. Rename the variables in your dataset using the following mapping:

original_name new_name
<name> name
<mfr> manufacturer
<type> hot_cold
<calories> calories
<protein> protein
<fat> fat
<sodium> sodium
<fiber> fiber
<carbo> carbohydrates
<sugars> sugars
<potass> potassium
<vitamins> vitamins
<shelf> display_shelf
<weight> weight_ounces
<cups> cups_in_serving
<rating> rating
cereal <-read.csv("E:/work/semester 3/Telling stories with data/assignments/coding assignment 2/cereal.csv")
cols <- c("name",
          "manufacturer",
          "hot_cold", 
          "calories", 
          "protein", 
          "fat", 
          "sodium", 
          "fiber", 
          "carbohydrates", 
          "sugars", 
          "potassium", 
          "vitamins",
          "display_shelf",
          "weight_ounces",
          "cups_in_serving",
          "rating")
colnames(cereal) <- cols
head(cereal)
##                        name manufacturer hot_cold calories protein fat sodium
## 1                 100% Bran            N     Cold       70       4   1    130
## 2         100% Natural Bran            Q     Cold      120       3   5     15
## 3                  All-Bran            K     Cold       70       4   1    260
## 4 All-Bran with Extra Fiber            K     Cold       50       4   0    140
## 5            Almond Delight            R     COLD      110       2   2    200
## 6   Apple Cinnamon Cheerios            G     Cold      110       2   2    180
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1  10.0           5.0      6       280       25             3             1
## 2   2.0           8.0      8       135        0             3             1
## 3   9.0           7.0      5       320       25             3             1
## 4  14.0           8.0      0       330       25             3             1
## 5   1.0          14.0      8        -1       25             3             1
## 6   1.5          10.5     10        70       25             1             1
##   cups_in_serving   rating
## 1            0.33 68.40297
## 2            1.00 33.98368
## 3            0.33 59.42551
## 4            0.50 93.70491
## 5            0.75 34.38484
## 6            0.75 29.50954
(a) Exploring the dataset

Run a few commands to get a feel of the dataset.

head(cereal) 
##                        name manufacturer hot_cold calories protein fat sodium
## 1                 100% Bran            N     Cold       70       4   1    130
## 2         100% Natural Bran            Q     Cold      120       3   5     15
## 3                  All-Bran            K     Cold       70       4   1    260
## 4 All-Bran with Extra Fiber            K     Cold       50       4   0    140
## 5            Almond Delight            R     COLD      110       2   2    200
## 6   Apple Cinnamon Cheerios            G     Cold      110       2   2    180
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1  10.0           5.0      6       280       25             3             1
## 2   2.0           8.0      8       135        0             3             1
## 3   9.0           7.0      5       320       25             3             1
## 4  14.0           8.0      0       330       25             3             1
## 5   1.0          14.0      8        -1       25             3             1
## 6   1.5          10.5     10        70       25             1             1
##   cups_in_serving   rating
## 1            0.33 68.40297
## 2            1.00 33.98368
## 3            0.33 59.42551
## 4            0.50 93.70491
## 5            0.75 34.38484
## 6            0.75 29.50954
tail(cereal)
##                   name manufacturer hot_cold calories protein fat sodium fiber
## 72   Total Whole Grain            G     Cold      100       3   1    200     3
## 73             Triples            G     Cold      110       2   1    250     0
## 74                Trix            G     Cold      110       1   1    140     0
## 75          Wheat Chex            R     Cold      100       3   1    230     3
## 76            Wheaties            G     Cold      100       3   1    200     3
## 77 Wheaties Honey Gold            G     cold      110       2   1    200     1
##    carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 72            16      3       110      100             3             1
## 73            21      3        60       25             3             1
## 74            13     12        25       25             2             1
## 75            17      3       115       25             1             1
## 76            17      3       110       25             1             1
## 77            16      8        60       25             1             1
##    cups_in_serving   rating
## 72            1.00 46.65884
## 73            0.75 39.10617
## 74            1.00 27.75330
## 75            0.67 49.78744
## 76            1.00 51.59219
## 77            0.75 36.18756
cereal[1]
##                                      name
## 1                               100% Bran
## 2                       100% Natural Bran
## 3                                All-Bran
## 4               All-Bran with Extra Fiber
## 5                          Almond Delight
## 6                 Apple Cinnamon Cheerios
## 7                             Apple Jacks
## 8                                 Basic 4
## 9                               Bran Chex
## 10                            Bran Flakes
## 11                           Cap'n'Crunch
## 12                               Cheerios
## 13                  Cinnamon Toast Crunch
## 14                               Clusters
## 15                            Cocoa Puffs
## 16                              Corn Chex
## 17                            Corn Flakes
## 18                              Corn Pops
## 19                          Count Chocula
## 20                     Cracklin' Oat Bran
## 21                 Cream of Wheat (Quick)
## 22                                Crispix
## 23                 Crispy Wheat & Raisins
## 24                            Double Chex
## 25                            Froot Loops
## 26                         Frosted Flakes
## 27                    Frosted Mini-Wheats
## 28 Fruit & Fibre Dates; Walnuts; and Oats
## 29                          Fruitful Bran
## 30                         Fruity Pebbles
## 31                           Golden Crisp
## 32                         Golden Grahams
## 33                      Grape Nuts Flakes
## 34                             Grape-Nuts
## 35                     Great Grains Pecan
## 36                       Honey Graham Ohs
## 37                     Honey Nut Cheerios
## 38                             Honey-comb
## 39            Just Right Crunchy  Nuggets
## 40                 Just Right Fruit & Nut
## 41                                    Kix
## 42                                   Life
## 43                           Lucky Charms
## 44                                  Maypo
## 45       Muesli Raisins; Dates; & Almonds
## 46      Muesli Raisins; Peaches; & Pecans
## 47                   Mueslix Crispy Blend
## 48                   Multi-Grain Cheerios
## 49                       Nut&Honey Crunch
## 50              Nutri-Grain Almond-Raisin
## 51                      Nutri-grain Wheat
## 52                   Oatmeal Raisin Crisp
## 53                  Post Nat. Raisin Bran
## 54                             Product 19
## 55                            Puffed Rice
## 56                           Puffed Wheat
## 57                     Quaker Oat Squares
## 58                         Quaker Oatmeal
## 59                            Raisin Bran
## 60                        Raisin Nut Bran
## 61                         Raisin Squares
## 62                              Rice Chex
## 63                          Rice Krispies
## 64                         Shredded Wheat
## 65                 Shredded Wheat 'n'Bran
## 66              Shredded Wheat spoon size
## 67                                 Smacks
## 68                              Special K
## 69                Strawberry Fruit Wheats
## 70                      Total Corn Flakes
## 71                      Total Raisin Bran
## 72                      Total Whole Grain
## 73                                Triples
## 74                                   Trix
## 75                             Wheat Chex
## 76                               Wheaties
## 77                    Wheaties Honey Gold
cereal["manufacturer"]
##    manufacturer
## 1             N
## 2             Q
## 3             K
## 4             K
## 5             R
## 6             G
## 7             K
## 8             G
## 9             R
## 10            P
## 11            Q
## 12            G
## 13            G
## 14            G
## 15            G
## 16            R
## 17            K
## 18            K
## 19            G
## 20            K
## 21            N
## 22            K
## 23            G
## 24            R
## 25            K
## 26            K
## 27            K
## 28            P
## 29            K
## 30            P
## 31            P
## 32            G
## 33            P
## 34            P
## 35            P
## 36            Q
## 37            G
## 38            P
## 39            K
## 40            K
## 41            G
## 42            Q
## 43            G
## 44            A
## 45            R
## 46            R
## 47            K
## 48            G
## 49            K
## 50            K
## 51            K
## 52            G
## 53            P
## 54            K
## 55            Q
## 56            Q
## 57            Q
## 58            Q
## 59            K
## 60            G
## 61            K
## 62            R
## 63            K
## 64            N
## 65            N
## 66            N
## 67            K
## 68            K
## 69            N
## 70            G
## 71            G
## 72            G
## 73            G
## 74            G
## 75            R
## 76            G
## 77            G
(b) Using the commands you ran in problem (a) report the following metrics on this dataset:
  1. How many rows and how many columns?
  2. What is the minimum and maximum rating of cereals in this dataset? What is your guess about what the rating is out of?
  3. What is the mean and median of the sodium variable? Do you think the distribution of the sodium variable is skewed towards the right or the left?
  4. What type of variable is hot_cold? Is it ordered?

  1. 77 rows and 16 columns
max(cereal$ rating)
## [1] 93.70491
min(cereal$ rating)
## [1] 18.04285

The maximum is 93.70491 and minimum is 18.04285. I think the rating is out of 100.

mean<- mean (cereal $ sodium)
paste0("the mean is: ", mean )
## [1] "the mean is: 159.675324675325"
median <- median(cereal $ sodium)
paste0("the median is: ", median)
## [1] "the median is: 180"
mean <= median
## [1] TRUE
sprintf("Since the mean is less than the median, the sodium variable is skewed to the left")
## [1] "Since the mean is less than the median, the sodium variable is skewed to the left"
is.ordered(cereal $ hot_cold)
## [1] FALSE
paste0("The column hot_cold is not ordered. The variable is a : ", typeof(cereal$hot_cold) )
## [1] "The column hot_cold is not ordered. The variable is a : character"

Problem 2: Working with Factors

For this problem you will need the <plyr> and <dplyr> packages. Run the code chunk below to load them into your working space.
Note: Your will need to install them if you haven’t already.

library(plyr)
library(dplyr)
(a) Checking Levels

Write code that checks the levels of the manufacturer column.

levels(cereal$ manufacturer)
## NULL
paste0("There are ", length(unique(cereal $ manufacturer)), " levels in manufaturer column")
## [1] "There are 7 levels in manufaturer column"
unique(cereal $ manufacturer)
## [1] "N" "Q" "K" "R" "G" "P" "A"
(b) Mapvalues

Let’s give the manufacturer variable more meaningful names. Write code that alters the manufacturer column to reflect the following mapping:

Manufacturer of cereal

A = American Home Food Products
G = General Mills  
K = Kelloggs  
N = Nabisco  
P = Post  
Q = Quaker Oats  
R = Ralston Purina  
# cereal_1 <- cereal[cereal$manufacturer == "A" , "American Home Food Products"] 
# cereal_1
# cereal_1 <-replace(cereal$manufacturer, cereal$manufacture == "A","American Home Food Products")
# cereal_1
cereal$manufacturer[cereal$manufacturer== "A"] <- "American Food Products"
cereal$manufacturer[cereal$manufacturer== "G"] <- "General Mills"
cereal$manufacturer[cereal$manufacturer== "K"] <- "Kelloggs"
cereal$manufacturer[cereal$manufacturer== "N"] <- "Nabisco"
cereal$manufacturer[cereal$manufacturer== "P"] <- "Post"
cereal$manufacturer[cereal$manufacturer== "Q"] <- "Quaker Oats"
cereal$manufacturer[cereal$manufacturer== "R"] <- "Ralston Purina"
head(cereal)
##                        name   manufacturer hot_cold calories protein fat sodium
## 1                 100% Bran        Nabisco     Cold       70       4   1    130
## 2         100% Natural Bran    Quaker Oats     Cold      120       3   5     15
## 3                  All-Bran       Kelloggs     Cold       70       4   1    260
## 4 All-Bran with Extra Fiber       Kelloggs     Cold       50       4   0    140
## 5            Almond Delight Ralston Purina     COLD      110       2   2    200
## 6   Apple Cinnamon Cheerios  General Mills     Cold      110       2   2    180
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1  10.0           5.0      6       280       25             3             1
## 2   2.0           8.0      8       135        0             3             1
## 3   9.0           7.0      5       320       25             3             1
## 4  14.0           8.0      0       330       25             3             1
## 5   1.0          14.0      8        -1       25             3             1
## 6   1.5          10.5     10        70       25             1             1
##   cups_in_serving   rating
## 1            0.33 68.40297
## 2            1.00 33.98368
## 3            0.33 59.42551
## 4            0.50 93.70491
## 5            0.75 34.38484
## 6            0.75 29.50954
(c) Cleaning up

Now write code that checks the levels of the hot_cold column. Use the space below to describe a problem with this column.

cereal$hot_cold <- as.factor(cereal$hot_cold)
class(cereal$hot_cold)
## [1] "factor"
levels(cereal$hot_cold)
## [1] "cold" "Cold" "CoLD" "COLD" "Hot"

Description of the problem:

The reason why we couldn’t check the levels of this column at first was that the observations were in strings and not factors. After converting the column into a factor column and checking the levels, we see that R is case sensitive and reads cold, CoLD and COLD as different factor levels even if they are the same thing, so this means that the capitalization or small leteers should be consistent in order to be counted in the same factor level.

(d) Cleaning up

Use the mapvalues and mutate functions to fix the hot_cold column by mapping all of the lowercase and mixed case instances to a consistent case. Create a new dataset called cereal2, and leave the dataset you created in part (b) untouched. Make sure you check your remapping was successful.

# library(magrittr)
# library(dplyr)
# cereal12 <- cereal
# cereal12 %>% mutate(hot_cold=recode(hot_cold,
#                                     `CoLD`="Cold",
#                                     `COLD`="Cold", 
#                                     `cold`= "Cold")) ##using recode
# cereal12
# levels(cereal12$hot_cold)


## using mutate and mapvalues
library(plyr)
library(magrittr)
cereal12 <- cereal
cereal12 %<>% mutate (hot_cold=plyr::mapvalues(hot_cold, c("CoLD","cold","COLD"), c("Cold","Cold","Cold")))
head(cereal12)
##                        name   manufacturer hot_cold calories protein fat sodium
## 1                 100% Bran        Nabisco     Cold       70       4   1    130
## 2         100% Natural Bran    Quaker Oats     Cold      120       3   5     15
## 3                  All-Bran       Kelloggs     Cold       70       4   1    260
## 4 All-Bran with Extra Fiber       Kelloggs     Cold       50       4   0    140
## 5            Almond Delight Ralston Purina     Cold      110       2   2    200
## 6   Apple Cinnamon Cheerios  General Mills     Cold      110       2   2    180
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1  10.0           5.0      6       280       25             3             1
## 2   2.0           8.0      8       135        0             3             1
## 3   9.0           7.0      5       320       25             3             1
## 4  14.0           8.0      0       330       25             3             1
## 5   1.0          14.0      8        -1       25             3             1
## 6   1.5          10.5     10        70       25             1             1
##   cups_in_serving   rating
## 1            0.33 68.40297
## 2            1.00 33.98368
## 3            0.33 59.42551
## 4            0.50 93.70491
## 5            0.75 34.38484
## 6            0.75 29.50954
levels(cereal12$hot_cold)
## [1] "Cold" "Hot"
(e) Isn’t there an easier way to do this?

The toupper() function takes an array of character strings and converts all letters to uppercase. Alternatively, tolower() can take an array of character strings and convert all the letters to lowercase.

Use toupper() OR tolower() and mutate to perform the same data cleaning task as in part (d) on the dataset from part(b). Save the results in a new column called <hot_cold_new> Check if the remapping was successful.
Note: Make sure you turn the new column into a factor

cereal %<>% mutate (hot_cold_new= tolower(cereal$hot_cold))
cereal$hot_cold_new <- as.factor(cereal$hot_cold_new)
levels(cereal$hot_cold_new)
## [1] "cold" "hot"
head(cereal)
##                        name   manufacturer hot_cold calories protein fat sodium
## 1                 100% Bran        Nabisco     Cold       70       4   1    130
## 2         100% Natural Bran    Quaker Oats     Cold      120       3   5     15
## 3                  All-Bran       Kelloggs     Cold       70       4   1    260
## 4 All-Bran with Extra Fiber       Kelloggs     Cold       50       4   0    140
## 5            Almond Delight Ralston Purina     COLD      110       2   2    200
## 6   Apple Cinnamon Cheerios  General Mills     Cold      110       2   2    180
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1  10.0           5.0      6       280       25             3             1
## 2   2.0           8.0      8       135        0             3             1
## 3   9.0           7.0      5       320       25             3             1
## 4  14.0           8.0      0       330       25             3             1
## 5   1.0          14.0      8        -1       25             3             1
## 6   1.5          10.5     10        70       25             1             1
##   cups_in_serving   rating hot_cold_new
## 1            0.33 68.40297         cold
## 2            1.00 33.98368         cold
## 3            0.33 59.42551         cold
## 4            0.50 93.70491         cold
## 5            0.75 34.38484         cold
## 6            0.75 29.50954         cold

Problem 3: Exploring Dataframes

Work on the dataset returned by Problem 2 part (e).

(a) Subsetting

Write code that creates a dataframe for cereals manufacturered ONLY by Quaker Oats or Kellogs and have less than a 100 calories.
Check the first 3 rows of this new dataframe.

cereal_restricted <- subset(cereal, manufacturer == "Quaker Oats"| manufacturer=="Kelloggs")
cereal_restricted <- subset(cereal_restricted, calories <100)
cereal_restricted[1:3,]
##                         name manufacturer hot_cold calories protein fat sodium
## 3                   All-Bran     Kelloggs     Cold       70       4   1    260
## 4  All-Bran with Extra Fiber     Kelloggs     Cold       50       4   0    140
## 51         Nutri-grain Wheat     Kelloggs     Cold       90       3   0    170
##    fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 3      9             7      5       320       25             3             1
## 4     14             8      0       330       25             3             1
## 51     3            18      2        90       25             3             1
##    cups_in_serving   rating hot_cold_new
## 3             0.33 59.42551         cold
## 4             0.50 93.70491         cold
## 51            1.00 59.64284         cold
(b) More practice with factors

Run the code below to create a variable called calorie bins that ranks cereals based on the number of calories it has. Once the new variable <calorie_bins>. has been created, write code to order it from low, medium, high.

Note: This code chunk has been set to eval = FALSE, set this to TRUE before you knit your html.

## Creating Bins 
b <- c(-Inf, 100, 120, Inf )

#Create a vector of names for break points:
names <- c("Low", "Medium", "High")
 
cereal$calorie_bins <- cut(cereal$calories, breaks = b, labels = names)

levels(cereal$calorie_bins)
## [1] "Low"    "Medium" "High"
cereal[order(cereal$calorie_bins),]
(c) Produce a summary for cereals by calorie bins.

How many cereals in our dataset fall into the high calorie bin we defined?

cereal_high_calorie <- cereal[cereal$calorie_bins == "High",]
head(cereal_high_calorie)
##                                 name   manufacturer hot_cold calories protein
## 8                            Basic 4  General Mills     Cold      130       3
## 40            Just Right Fruit & Nut       Kelloggs     Cold      140       3
## 45  Muesli Raisins; Dates; & Almonds Ralston Purina     Cold      150       4
## 46 Muesli Raisins; Peaches; & Pecans Ralston Purina     Cold      150       4
## 47              Mueslix Crispy Blend       Kelloggs     Cold      160       3
## 50         Nutri-Grain Almond-Raisin       Kelloggs     Cold      140       3
##    fat sodium fiber carbohydrates sugars potassium vitamins display_shelf
## 8    2    210     2            18      8       100       25             3
## 40   1    170     2            20      9        95      100             3
## 45   3     95     3            16     11       170       25             3
## 46   3    150     3            16     11       170       25             3
## 47   2    150     3            17     13       160       25             3
## 50   2    220     3            21      7       130       25             3
##    weight_ounces cups_in_serving   rating hot_cold_new calorie_bins
## 8           1.33            0.75 37.03856         cold         High
## 40          1.30            0.75 36.47151         cold         High
## 45          1.00            1.00 37.13686         cold         High
## 46          1.00            1.00 34.13976         cold         High
## 47          1.50            0.67 30.31335         cold         High
## 50          1.33            0.67 40.69232         cold         High
length(cereal_high_calorie)
## [1] 18

Eighteen cereals fall into our high calorie factor level.

Problem 4: Data Exploration

(a) Computing mean by group

Write code to find the mean calories for cereals produced by Quaker Oats and the mean calories for cereals produced by Nabisco. Between these two options, which manufacturer do you think makes healthier cereals?

quaker <- cereal[cereal$manufacturer=="Quaker Oats",]
mean(quaker$calories)
## [1] 95
nabisco <- cereal[cereal$manufacturer=="Nabisco",]
mean(nabisco$calories)
## [1] 86.66667

The more calories a cereal provides, the healthier it is. The mean of calories provided by all cereals of Quaker oats is 95 and that of Nabisco is 86.6. So comparitively Quaker oats is healthier.

(b) New Variables

Create a new variable high_sugar which takes the value 1 if sugars are greater than 10 and zero otherwise.

# cereal <- cereal %<>% mutate(high_sugar= 1 if sugars> 10 else 0)
cereal <- cereal %>% 
  mutate(high_sugar = if_else(sugars > 10, 1, 0))
cereal[6:9,]
##                      name   manufacturer hot_cold calories protein fat sodium
## 6 Apple Cinnamon Cheerios  General Mills     Cold      110       2   2    180
## 7             Apple Jacks       Kelloggs     Cold      110       2   0    125
## 8                 Basic 4  General Mills     Cold      130       3   2    210
## 9               Bran Chex Ralston Purina     Cold       90       2   1    200
##   fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 6   1.5          10.5     10        70       25             1          1.00
## 7   1.0          11.0     14        30       25             2          1.00
## 8   2.0          18.0      8       100       25             3          1.33
## 9   4.0          15.0      6       125       25             1          1.00
##   cups_in_serving   rating hot_cold_new calorie_bins high_sugar
## 6            0.75 29.50954         cold       Medium          0
## 7            1.00 33.17409         cold       Medium          1
## 8            0.75 37.03856         cold         High          0
## 9            0.67 49.12025         cold          Low          0
(c) More evolved group analysis

Write code to find the mean calories for cereals produced by Quaker Oats and the mean calories for cereals produced by Kellogs, but this time also add an additional constraint for whether or not the cereal is high sugar.

Does the number of sugars in a cereal have any impact on its calories?

quaker <- cereal[cereal$manufacturer=="Quaker Oats",]
quaker$ high_sugar == 1
## [1] FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
mean(quaker$calories)
## [1] 95
nabisco <- cereal[cereal$manufacturer=="Nabisco",]
nabisco$ high_sugar ==1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
mean(nabisco$calories)
## [1] 86.66667

Quaker Oats has manufactured more cereals that are high in sugar whereas nabisco comparitively has none. The higher calories might be due to high sugar but a higher sugar index means less healthy cereal.

Problem 5: Basic Functions and Loops

(a) Write a basic conversion function

Notice that the dataset contains each cereal’s weight in ounces. Write a function take converts weight in ounces to weight in grams, rounded off to the nearest 2 decimal places. Make sure you test the function you create.

HINT: You may find it useful to google the ounce to gram conversion before you implement in your code.

# Here's a function skeleton to get you started

# This function converts ounces to grams
install.packages("measurements",repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/HP/Documents/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'measurements' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\HP\AppData\Local\Temp\RtmpacNGMh\downloaded_packages
library(measurements)
conversion <- conv_unit(cereal$weight_ounces, from= "oz", to= "g")
ounces_to_grams <- conversion
(b) Applying your function

Use the function you created above to add a new column called weight_grams to your cereal dataset, to each cereal’s weight in ounces.

conversion <- conv_unit(cereal$weight_ounces, from= "oz", to= "g")
ounces_to_grams <- conversion
cereal <- cereal %<>% mutate(weight_grams= c(ounces_to_grams))
cereal
##                                      name           manufacturer hot_cold
## 1                               100% Bran                Nabisco     Cold
## 2                       100% Natural Bran            Quaker Oats     Cold
## 3                                All-Bran               Kelloggs     Cold
## 4               All-Bran with Extra Fiber               Kelloggs     Cold
## 5                          Almond Delight         Ralston Purina     COLD
## 6                 Apple Cinnamon Cheerios          General Mills     Cold
## 7                             Apple Jacks               Kelloggs     Cold
## 8                                 Basic 4          General Mills     Cold
## 9                               Bran Chex         Ralston Purina     Cold
## 10                            Bran Flakes                   Post     Cold
## 11                           Cap'n'Crunch            Quaker Oats     Cold
## 12                               Cheerios          General Mills     Cold
## 13                  Cinnamon Toast Crunch          General Mills     Cold
## 14                               Clusters          General Mills     Cold
## 15                            Cocoa Puffs          General Mills     Cold
## 16                              Corn Chex         Ralston Purina     Cold
## 17                            Corn Flakes               Kelloggs     Cold
## 18                              Corn Pops               Kelloggs     Cold
## 19                          Count Chocula          General Mills     Cold
## 20                     Cracklin' Oat Bran               Kelloggs     Cold
## 21                 Cream of Wheat (Quick)                Nabisco      Hot
## 22                                Crispix               Kelloggs     Cold
## 23                 Crispy Wheat & Raisins          General Mills     Cold
## 24                            Double Chex         Ralston Purina     Cold
## 25                            Froot Loops               Kelloggs     Cold
## 26                         Frosted Flakes               Kelloggs     Cold
## 27                    Frosted Mini-Wheats               Kelloggs     Cold
## 28 Fruit & Fibre Dates; Walnuts; and Oats                   Post     Cold
## 29                          Fruitful Bran               Kelloggs     Cold
## 30                         Fruity Pebbles                   Post     Cold
## 31                           Golden Crisp                   Post     Cold
## 32                         Golden Grahams          General Mills     Cold
## 33                      Grape Nuts Flakes                   Post     Cold
## 34                             Grape-Nuts                   Post     Cold
## 35                     Great Grains Pecan                   Post     Cold
## 36                       Honey Graham Ohs            Quaker Oats     Cold
## 37                     Honey Nut Cheerios          General Mills     Cold
## 38                             Honey-comb                   Post     Cold
## 39            Just Right Crunchy  Nuggets               Kelloggs     Cold
## 40                 Just Right Fruit & Nut               Kelloggs     Cold
## 41                                    Kix          General Mills     Cold
## 42                                   Life            Quaker Oats     Cold
## 43                           Lucky Charms          General Mills     Cold
## 44                                  Maypo American Food Products      Hot
## 45       Muesli Raisins; Dates; & Almonds         Ralston Purina     Cold
## 46      Muesli Raisins; Peaches; & Pecans         Ralston Purina     Cold
## 47                   Mueslix Crispy Blend               Kelloggs     Cold
## 48                   Multi-Grain Cheerios          General Mills     Cold
## 49                       Nut&Honey Crunch               Kelloggs     Cold
## 50              Nutri-Grain Almond-Raisin               Kelloggs     Cold
## 51                      Nutri-grain Wheat               Kelloggs     Cold
## 52                   Oatmeal Raisin Crisp          General Mills     Cold
## 53                  Post Nat. Raisin Bran                   Post     Cold
## 54                             Product 19               Kelloggs     Cold
## 55                            Puffed Rice            Quaker Oats     Cold
## 56                           Puffed Wheat            Quaker Oats     Cold
## 57                     Quaker Oat Squares            Quaker Oats     Cold
## 58                         Quaker Oatmeal            Quaker Oats      Hot
## 59                            Raisin Bran               Kelloggs     Cold
## 60                        Raisin Nut Bran          General Mills     Cold
## 61                         Raisin Squares               Kelloggs     Cold
## 62                              Rice Chex         Ralston Purina     Cold
## 63                          Rice Krispies               Kelloggs     Cold
## 64                         Shredded Wheat                Nabisco     CoLD
## 65                 Shredded Wheat 'n'Bran                Nabisco     Cold
## 66              Shredded Wheat spoon size                Nabisco     Cold
## 67                                 Smacks               Kelloggs     Cold
## 68                              Special K               Kelloggs     Cold
## 69                Strawberry Fruit Wheats                Nabisco     Cold
## 70                      Total Corn Flakes          General Mills     Cold
## 71                      Total Raisin Bran          General Mills     Cold
## 72                      Total Whole Grain          General Mills     Cold
## 73                                Triples          General Mills     Cold
## 74                                   Trix          General Mills     Cold
## 75                             Wheat Chex         Ralston Purina     Cold
## 76                               Wheaties          General Mills     Cold
## 77                    Wheaties Honey Gold          General Mills     cold
##    calories protein fat sodium fiber carbohydrates sugars potassium vitamins
## 1        70       4   1    130  10.0           5.0      6       280       25
## 2       120       3   5     15   2.0           8.0      8       135        0
## 3        70       4   1    260   9.0           7.0      5       320       25
## 4        50       4   0    140  14.0           8.0      0       330       25
## 5       110       2   2    200   1.0          14.0      8        -1       25
## 6       110       2   2    180   1.5          10.5     10        70       25
## 7       110       2   0    125   1.0          11.0     14        30       25
## 8       130       3   2    210   2.0          18.0      8       100       25
## 9        90       2   1    200   4.0          15.0      6       125       25
## 10       90       3   0    210   5.0          13.0      5       190       25
## 11      120       1   2    220   0.0          12.0     12        35       25
## 12      110       6   2    290   2.0          17.0      1       105       25
## 13      120       1   3    210   0.0          13.0      9        45       25
## 14      110       3   2    140   2.0          13.0      7       105       25
## 15      110       1   1    180   0.0          12.0     13        55       25
## 16      110       2   0    280   0.0          22.0      3        25       25
## 17      100       2   0    290   1.0          21.0      2        35       25
## 18      110       1   0     90   1.0          13.0     12        20       25
## 19      110       1   1    180   0.0          12.0     13        65       25
## 20      110       3   3    140   4.0          10.0      7       160       25
## 21      100       3   0     80   1.0          21.0      0        -1        0
## 22      110       2   0    220   1.0          21.0      3        30       25
## 23      100       2   1    140   2.0          11.0     10       120       25
## 24      100       2   0    190   1.0          18.0      5        80       25
## 25      110       2   1    125   1.0          11.0     13        30       25
## 26      110       1   0    200   1.0          14.0     11        25       25
## 27      100       3   0      0   3.0          14.0      7       100       25
## 28      120       3   2    160   5.0          12.0     10       200       25
## 29      120       3   0    240   5.0          14.0     12       190       25
## 30      110       1   1    135   0.0          13.0     12        25       25
## 31      100       2   0     45   0.0          11.0     15        40       25
## 32      110       1   1    280   0.0          15.0      9        45       25
## 33      100       3   1    140   3.0          15.0      5        85       25
## 34      110       3   0    170   3.0          17.0      3        90       25
## 35      120       3   3     75   3.0          13.0      4       100       25
## 36      120       1   2    220   1.0          12.0     11        45       25
## 37      110       3   1    250   1.5          11.5     10        90       25
## 38      110       1   0    180   0.0          14.0     11        35       25
## 39      110       2   1    170   1.0          17.0      6        60      100
## 40      140       3   1    170   2.0          20.0      9        95      100
## 41      110       2   1    260   0.0          21.0      3        40       25
## 42      100       4   2    150   2.0          12.0      6        95       25
## 43      110       2   1    180   0.0          12.0     12        55       25
## 44      100       4   1      0   0.0          16.0      3        95       25
## 45      150       4   3     95   3.0          16.0     11       170       25
## 46      150       4   3    150   3.0          16.0     11       170       25
## 47      160       3   2    150   3.0          17.0     13       160       25
## 48      100       2   1    220   2.0          15.0      6        90       25
## 49      120       2   1    190   0.0          15.0      9        40       25
## 50      140       3   2    220   3.0          21.0      7       130       25
## 51       90       3   0    170   3.0          18.0      2        90       25
## 52      130       3   2    170   1.5          13.5     10       120       25
## 53      120       3   1    200   6.0          11.0     14       260       25
## 54      100       3   0    320   1.0          20.0      3        45      100
## 55       50       1   0      0   0.0          13.0      0        15        0
## 56       50       2   0      0   1.0          10.0      0        50        0
## 57      100       4   1    135   2.0          14.0      6       110       25
## 58      100       5   2      0   2.7          -1.0     -1       110        0
## 59      120       3   1    210   5.0          14.0     12       240       25
## 60      100       3   2    140   2.5          10.5      8       140       25
## 61       90       2   0      0   2.0          15.0      6       110       25
## 62      110       1   0    240   0.0          23.0      2        30       25
## 63      110       2   0    290   0.0          22.0      3        35       25
## 64       80       2   0      0   3.0          16.0      0        95        0
## 65       90       3   0      0   4.0          19.0      0       140        0
## 66       90       3   0      0   3.0          20.0      0       120        0
## 67      110       2   1     70   1.0           9.0     15        40       25
## 68      110       6   0    230   1.0          16.0      3        55       25
## 69       90       2   0     15   3.0          15.0      5        90       25
## 70      110       2   1    200   0.0          21.0      3        35      100
## 71      140       3   1    190   4.0          15.0     14       230      100
## 72      100       3   1    200   3.0          16.0      3       110      100
## 73      110       2   1    250   0.0          21.0      3        60       25
## 74      110       1   1    140   0.0          13.0     12        25       25
## 75      100       3   1    230   3.0          17.0      3       115       25
## 76      100       3   1    200   3.0          17.0      3       110       25
## 77      110       2   1    200   1.0          16.0      8        60       25
##    display_shelf weight_ounces cups_in_serving   rating hot_cold_new
## 1              3          1.00            0.33 68.40297         cold
## 2              3          1.00            1.00 33.98368         cold
## 3              3          1.00            0.33 59.42551         cold
## 4              3          1.00            0.50 93.70491         cold
## 5              3          1.00            0.75 34.38484         cold
## 6              1          1.00            0.75 29.50954         cold
## 7              2          1.00            1.00 33.17409         cold
## 8              3          1.33            0.75 37.03856         cold
## 9              1          1.00            0.67 49.12025         cold
## 10             3          1.00            0.67 53.31381         cold
## 11             2          1.00            0.75 18.04285         cold
## 12             1          1.00            1.25 50.76500         cold
## 13             2          1.00            0.75 19.82357         cold
## 14             3          1.00            0.50 40.40021         cold
## 15             2          1.00            1.00 22.73645         cold
## 16             1          1.00            1.00 41.44502         cold
## 17             1          1.00            1.00 45.86332         cold
## 18             2          1.00            1.00 35.78279         cold
## 19             2          1.00            1.00 22.39651         cold
## 20             3          1.00            0.50 40.44877         cold
## 21             2          1.00            1.00 64.53382          hot
## 22             3          1.00            1.00 46.89564         cold
## 23             3          1.00            0.75 36.17620         cold
## 24             3          1.00            0.75 44.33086         cold
## 25             2          1.00            1.00 32.20758         cold
## 26             1          1.00            0.75 31.43597         cold
## 27             2          1.00            0.80 58.34514         cold
## 28             3          1.25            0.67 40.91705         cold
## 29             3          1.33            0.67 41.01549         cold
## 30             2          1.00            0.75 28.02576         cold
## 31             1          1.00            0.88 35.25244         cold
## 32             2          1.00            0.75 23.80404         cold
## 33             3          1.00            0.88 52.07690         cold
## 34             3          1.00            0.25 53.37101         cold
## 35             3          1.00            0.33 45.81172         cold
## 36             2          1.00            1.00 21.87129         cold
## 37             1          1.00            0.75 31.07222         cold
## 38             1          1.00            1.33 28.74241         cold
## 39             3          1.00            1.00 36.52368         cold
## 40             3          1.30            0.75 36.47151         cold
## 41             2          1.00            1.50 39.24111         cold
## 42             2          1.00            0.67 45.32807         cold
## 43             2          1.00            1.00 26.73451         cold
## 44             2          1.00            1.00 54.85092          hot
## 45             3          1.00            1.00 37.13686         cold
## 46             3          1.00            1.00 34.13976         cold
## 47             3          1.50            0.67 30.31335         cold
## 48             1          1.00            1.00 40.10596         cold
## 49             2          1.00            0.67 29.92429         cold
## 50             3          1.33            0.67 40.69232         cold
## 51             3          1.00            1.00 59.64284         cold
## 52             3          1.25            0.50 30.45084         cold
## 53             3          1.33            0.67 37.84059         cold
## 54             3          1.00            1.00 41.50354         cold
## 55             3          0.50            1.00 60.75611         cold
## 56             3          0.50            1.00 63.00565         cold
## 57             3          1.00            0.50 49.51187         cold
## 58             1          1.00            0.67 50.82839          hot
## 59             2          1.33            0.75 39.25920         cold
## 60             3          1.00            0.50 39.70340         cold
## 61             3          1.00            0.50 55.33314         cold
## 62             1          1.00            1.13 41.99893         cold
## 63             1          1.00            1.00 40.56016         cold
## 64             1          0.83            1.00 68.23588         cold
## 65             1          1.00            0.67 74.47295         cold
## 66             1          1.00            0.67 72.80179         cold
## 67             2          1.00            0.75 31.23005         cold
## 68             1          1.00            1.00 53.13132         cold
## 69             2          1.00            1.00 59.36399         cold
## 70             3          1.00            1.00 38.83975         cold
## 71             3          1.50            1.00 28.59278         cold
## 72             3          1.00            1.00 46.65884         cold
## 73             3          1.00            0.75 39.10617         cold
## 74             2          1.00            1.00 27.75330         cold
## 75             1          1.00            0.67 49.78744         cold
## 76             1          1.00            1.00 51.59219         cold
## 77             1          1.00            0.75 36.18756         cold
##    calorie_bins high_sugar weight_grams
## 1           Low          0     28.34952
## 2        Medium          0     28.34952
## 3           Low          0     28.34952
## 4           Low          0     28.34952
## 5        Medium          0     28.34952
## 6        Medium          0     28.34952
## 7        Medium          1     28.34952
## 8          High          0     37.70487
## 9           Low          0     28.34952
## 10          Low          0     28.34952
## 11       Medium          1     28.34952
## 12       Medium          0     28.34952
## 13       Medium          0     28.34952
## 14       Medium          0     28.34952
## 15       Medium          1     28.34952
## 16       Medium          0     28.34952
## 17          Low          0     28.34952
## 18       Medium          1     28.34952
## 19       Medium          1     28.34952
## 20       Medium          0     28.34952
## 21          Low          0     28.34952
## 22       Medium          0     28.34952
## 23          Low          0     28.34952
## 24          Low          0     28.34952
## 25       Medium          1     28.34952
## 26       Medium          1     28.34952
## 27          Low          0     28.34952
## 28       Medium          0     35.43690
## 29       Medium          1     37.70487
## 30       Medium          1     28.34952
## 31          Low          1     28.34952
## 32       Medium          0     28.34952
## 33          Low          0     28.34952
## 34       Medium          0     28.34952
## 35       Medium          0     28.34952
## 36       Medium          1     28.34952
## 37       Medium          0     28.34952
## 38       Medium          1     28.34952
## 39       Medium          0     28.34952
## 40         High          0     36.85438
## 41       Medium          0     28.34952
## 42          Low          0     28.34952
## 43       Medium          1     28.34952
## 44          Low          0     28.34952
## 45         High          1     28.34952
## 46         High          1     28.34952
## 47         High          1     42.52428
## 48          Low          0     28.34952
## 49       Medium          0     28.34952
## 50         High          0     37.70487
## 51          Low          0     28.34952
## 52         High          0     35.43690
## 53       Medium          1     37.70487
## 54          Low          0     28.34952
## 55          Low          0     14.17476
## 56          Low          0     14.17476
## 57          Low          0     28.34952
## 58          Low          0     28.34952
## 59       Medium          1     37.70487
## 60          Low          0     28.34952
## 61          Low          0     28.34952
## 62       Medium          0     28.34952
## 63       Medium          0     28.34952
## 64          Low          0     23.53010
## 65          Low          0     28.34952
## 66          Low          0     28.34952
## 67       Medium          1     28.34952
## 68       Medium          0     28.34952
## 69          Low          0     28.34952
## 70       Medium          0     28.34952
## 71         High          1     42.52428
## 72          Low          0     28.34952
## 73       Medium          0     28.34952
## 74       Medium          1     28.34952
## 75          Low          0     28.34952
## 76          Low          0     28.34952
## 77       Medium          0     28.34952
(c) A basic for loop

Write a basic for loop that prints the column name and class type of the first 16 columns in the cereal data set. You can check the class of a vector by using the class() function.The output format should be the following:

The variable ___ has class type ____.

names_columns <- c(colnames(cereal))
names_columns
##  [1] "name"            "manufacturer"    "hot_cold"        "calories"       
##  [5] "protein"         "fat"             "sodium"          "fiber"          
##  [9] "carbohydrates"   "sugars"          "potassium"       "vitamins"       
## [13] "display_shelf"   "weight_ounces"   "cups_in_serving" "rating"         
## [17] "hot_cold_new"    "calorie_bins"    "high_sugar"      "weight_grams"
for(i in colnames(cereal))
  paste0("The variable ", colnames(cereal[,1:16]), " has a class type ", class(i))

Problem 6: Slightly More Advanced Functions and Loops

(a) Writing a trimmed mean function

Write a function that calculates the mean of a numeric vector x, ignoring the s smallest and l largest values (this is a trimmed mean).

E.g., if x = c(1, 7, 3, 2, 5, 0.5, 9, 10), s = 1, and l = 2, your function would return the mean of c(1, 7, 3, 2, 5) (this is x with the 1 smallest value (0.5) and the 2 largest values (9, 10) removed).

Your function should use the length() function to check if x has at least s + l + 1 values. If x is shorter than s + l + 1, your function should use the message() function to tell the user that the vector can’t be trimmed as requested. If x is at least length s + l + 1, your function should return the trimmed mean.

It is useful to break down this problem into it’s various parts before you start writing the code. E.g:
Step 1: Get the smallest and largest values of the vector (It may be useful to recall the sort() function we learned about)
Step 2: Check the length of the vector
Step 3: IF the length of the vector is less than s+l+1 THEN give message ELSE
Step 4: Get the mean of all values of the vector excluding the S smallest and L largest elements

HINT: Remember, there are many ways to write this function. ONE way you might consider is to calculate the mean of only the relevant indexes of the vector after sorting.

# Here's a function skeleton to get you started

# x <- c(sort(x))
# s <- min(x)
# l <- max(x)
# min_length <- length(s)+length(l)+1
# 
# if(length(x) >= min_length){
#   paste0("The trimmed mean after remiving the smallest and largets value of the vector is : ", mean(x, trim=0.2))
# }else {
#   sprintf("The vector can not be trimmed as requested because the length of vactor is very small")
#   }

x=  c(1,7,3,2,5,0.5,9,10)

trimmedMean <- function (x, s=0 ,l=0){sort(x)
  if(length(x) < s+l+1){
    message("Sorry! The vector can't be trimmed as requested because it is too short, please write downa vector that has more number of entries.")}
    else{a <- head(x,s) ##creating a vector that we want to be deleted from the start
    a2 <- x[!x %in% a]  # x except the deleted small entries
    b <- tail(x,l)     ## creating a vector of elements taht we want to be deleted from the end
    b2 <- a2[!x %in% b] #x except the specified number of highest and lowest values
    mean(b2)
    }
}

  
trimmedMean(x, s=0, l=3)
## [1] 3.6
# This function calculates the mean of a numeric vector ignoring the specified number of smallest and largest values

Note: The s = 0 and l = 0 specified in the function definition are the default settings. i.e., this syntax ensures that if s and l are not provided by the user, they are both set to 0. Thus the default behaviour is that the trimmedMean function doesn’t trim anything, and hence is the same as the mean function.

(b) Apply your function with a for loop

The below code creates a random list for you to apply your new function on.

set.seed(201802) # Sets seed to make sure everyone's random vectors are generated the same
list.random <- list(x = rnorm(50), 
                    y = rexp(65),
                    z = rt(100, df = 1.5))

# Here's a Figure showing histograms of the data
par(mfrow = c(1,3))
hist(list.random$x, breaks = 15, col = 'grey')
hist(list.random$y, breaks = 10, col = 'forestgreen')
hist(list.random$z, breaks = 20, col = 'steelblue')

Using a for loop and your function from part (a), create a vector whose elements are the trimmed means of the vectors in list.random, taking s = 5 and l = 5.

Note: you will need to create an empty vector first to store the results of your for loop in.

trimmed_means5 <-c()

for(list in list.random){
  trimmed_means5[length(trimmed_means5)+1] <- trimmedMean(list, s=5, l=5) ##filling the empty vector and increasing the lenth by one each time
}
trimmed_means5
## [1] -0.1961599  1.0634269  0.3059701
(c) Calculate the un-trimmed means for each of the vectors in the list. How do these compare to the trimmed means you calculated in part (b)? Explain your findings.
means <-c()

for(list in list.random){
  means[length(means)+1] <- mean(list) ##filling the empty vector and increasing the lenth by one each time
}
means
## [1] -0.22171186  1.03795962  0.06563583

Explanation:

Even after removing the five highest and lowest values the means aren’t much different and only have a diffrence of 0.1 or 0.01 from trimmed means.

(d) lapply(), sapply()

Repeat part (b), using the lapply and sapply functions instead of a for loop. Your lapply command should return a list of trimmed means, and your sapply command should return a vector of trimmed means.

## Your answer here

lapply(list.random, trimmedMean)
## $x
## [1] -0.2217119
## 
## $y
## [1] 1.03796
## 
## $z
## [1] 0.06563583
sapply(list.random, trimmedMean)
##           x           y           z 
## -0.22171186  1.03795962  0.06563583

Hint lapply and sapply can take arguments that you wish to pass to the trimmedMean function. E.g., if you were applying the function sort, which has an argument decreasing, you could use the syntax lapply(..., FUN = sort, decreasing = TRUE).