Download the coding_assignment_2.Rmd file from LMS.
Open coding_assignment_2.Rmd in RStudio.
Replace the “Your Name Here” text in the author: field with your own name.
Supply your solutions to the homework by editing coding_assignment_2.Rmd.
When you have completed the homework and have checked that your code both runs in the Console and knits correctly when you click Knit HTML, rename the R Markdown file to coding_assignment_2_YourNameHere.Rmd, and submit BOTH your rmarkdown AND html file on LMS (YourNameHere should be changed to your own name.)
| Keystroke | Description |
|---|---|
<tab> |
Autocompletes commands and filenames, and lists arguments for functions. |
<up> |
Cycles through previous commands in the console prompt |
<ctrl-up> |
Lists history of previous commands matching an unfinished one |
<ctrl-enter> |
Runs current line from source window to Console. Good for trying things out ideas from a source file. |
<ESC> |
Aborts an unfinished command and get out of the + prompt |
Note: Shown above are the Windows/Linux keys. For Mac OS X, the <ctrl> key should be substituted with the <command> (⌘) key.
Instead of sending code line-by-line with <ctrl-enter>, you can send entire code chunks, and even run all of the code chunks in your .Rmd file. Look under the
Run your code in the Console and Knit HTML frequently to check for errors.
You may find it easier to solve a problem by interacting only with the Console at first.
We’ll start by loading the cereal dataset that has been provided to you LMS. Call your dataset cereal. Once you have loaded your dataset. Rename the variables in your dataset using the following mapping:
| original_name | new_name |
|---|---|
<name> |
name |
<mfr> |
manufacturer |
<type> |
hot_cold |
<calories> |
calories |
<protein> |
protein |
<fat> |
fat |
<sodium> |
sodium |
<fiber> |
fiber |
<carbo> |
carbohydrates |
<sugars> |
sugars |
<potass> |
potassium |
<vitamins> |
vitamins |
<shelf> |
display_shelf |
<weight> |
weight_ounces |
<cups> |
cups_in_serving |
<rating> |
rating |
cereal <-read.csv("E:/work/semester 3/Telling stories with data/assignments/coding assignment 2/cereal.csv")
cols <- c("name",
"manufacturer",
"hot_cold",
"calories",
"protein",
"fat",
"sodium",
"fiber",
"carbohydrates",
"sugars",
"potassium",
"vitamins",
"display_shelf",
"weight_ounces",
"cups_in_serving",
"rating")
colnames(cereal) <- cols
head(cereal)
## name manufacturer hot_cold calories protein fat sodium
## 1 100% Bran N Cold 70 4 1 130
## 2 100% Natural Bran Q Cold 120 3 5 15
## 3 All-Bran K Cold 70 4 1 260
## 4 All-Bran with Extra Fiber K Cold 50 4 0 140
## 5 Almond Delight R COLD 110 2 2 200
## 6 Apple Cinnamon Cheerios G Cold 110 2 2 180
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1 10.0 5.0 6 280 25 3 1
## 2 2.0 8.0 8 135 0 3 1
## 3 9.0 7.0 5 320 25 3 1
## 4 14.0 8.0 0 330 25 3 1
## 5 1.0 14.0 8 -1 25 3 1
## 6 1.5 10.5 10 70 25 1 1
## cups_in_serving rating
## 1 0.33 68.40297
## 2 1.00 33.98368
## 3 0.33 59.42551
## 4 0.50 93.70491
## 5 0.75 34.38484
## 6 0.75 29.50954
Run a few commands to get a feel of the dataset.
head(cereal)
## name manufacturer hot_cold calories protein fat sodium
## 1 100% Bran N Cold 70 4 1 130
## 2 100% Natural Bran Q Cold 120 3 5 15
## 3 All-Bran K Cold 70 4 1 260
## 4 All-Bran with Extra Fiber K Cold 50 4 0 140
## 5 Almond Delight R COLD 110 2 2 200
## 6 Apple Cinnamon Cheerios G Cold 110 2 2 180
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1 10.0 5.0 6 280 25 3 1
## 2 2.0 8.0 8 135 0 3 1
## 3 9.0 7.0 5 320 25 3 1
## 4 14.0 8.0 0 330 25 3 1
## 5 1.0 14.0 8 -1 25 3 1
## 6 1.5 10.5 10 70 25 1 1
## cups_in_serving rating
## 1 0.33 68.40297
## 2 1.00 33.98368
## 3 0.33 59.42551
## 4 0.50 93.70491
## 5 0.75 34.38484
## 6 0.75 29.50954
tail(cereal)
## name manufacturer hot_cold calories protein fat sodium fiber
## 72 Total Whole Grain G Cold 100 3 1 200 3
## 73 Triples G Cold 110 2 1 250 0
## 74 Trix G Cold 110 1 1 140 0
## 75 Wheat Chex R Cold 100 3 1 230 3
## 76 Wheaties G Cold 100 3 1 200 3
## 77 Wheaties Honey Gold G cold 110 2 1 200 1
## carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 72 16 3 110 100 3 1
## 73 21 3 60 25 3 1
## 74 13 12 25 25 2 1
## 75 17 3 115 25 1 1
## 76 17 3 110 25 1 1
## 77 16 8 60 25 1 1
## cups_in_serving rating
## 72 1.00 46.65884
## 73 0.75 39.10617
## 74 1.00 27.75330
## 75 0.67 49.78744
## 76 1.00 51.59219
## 77 0.75 36.18756
cereal[1]
## name
## 1 100% Bran
## 2 100% Natural Bran
## 3 All-Bran
## 4 All-Bran with Extra Fiber
## 5 Almond Delight
## 6 Apple Cinnamon Cheerios
## 7 Apple Jacks
## 8 Basic 4
## 9 Bran Chex
## 10 Bran Flakes
## 11 Cap'n'Crunch
## 12 Cheerios
## 13 Cinnamon Toast Crunch
## 14 Clusters
## 15 Cocoa Puffs
## 16 Corn Chex
## 17 Corn Flakes
## 18 Corn Pops
## 19 Count Chocula
## 20 Cracklin' Oat Bran
## 21 Cream of Wheat (Quick)
## 22 Crispix
## 23 Crispy Wheat & Raisins
## 24 Double Chex
## 25 Froot Loops
## 26 Frosted Flakes
## 27 Frosted Mini-Wheats
## 28 Fruit & Fibre Dates; Walnuts; and Oats
## 29 Fruitful Bran
## 30 Fruity Pebbles
## 31 Golden Crisp
## 32 Golden Grahams
## 33 Grape Nuts Flakes
## 34 Grape-Nuts
## 35 Great Grains Pecan
## 36 Honey Graham Ohs
## 37 Honey Nut Cheerios
## 38 Honey-comb
## 39 Just Right Crunchy Nuggets
## 40 Just Right Fruit & Nut
## 41 Kix
## 42 Life
## 43 Lucky Charms
## 44 Maypo
## 45 Muesli Raisins; Dates; & Almonds
## 46 Muesli Raisins; Peaches; & Pecans
## 47 Mueslix Crispy Blend
## 48 Multi-Grain Cheerios
## 49 Nut&Honey Crunch
## 50 Nutri-Grain Almond-Raisin
## 51 Nutri-grain Wheat
## 52 Oatmeal Raisin Crisp
## 53 Post Nat. Raisin Bran
## 54 Product 19
## 55 Puffed Rice
## 56 Puffed Wheat
## 57 Quaker Oat Squares
## 58 Quaker Oatmeal
## 59 Raisin Bran
## 60 Raisin Nut Bran
## 61 Raisin Squares
## 62 Rice Chex
## 63 Rice Krispies
## 64 Shredded Wheat
## 65 Shredded Wheat 'n'Bran
## 66 Shredded Wheat spoon size
## 67 Smacks
## 68 Special K
## 69 Strawberry Fruit Wheats
## 70 Total Corn Flakes
## 71 Total Raisin Bran
## 72 Total Whole Grain
## 73 Triples
## 74 Trix
## 75 Wheat Chex
## 76 Wheaties
## 77 Wheaties Honey Gold
cereal["manufacturer"]
## manufacturer
## 1 N
## 2 Q
## 3 K
## 4 K
## 5 R
## 6 G
## 7 K
## 8 G
## 9 R
## 10 P
## 11 Q
## 12 G
## 13 G
## 14 G
## 15 G
## 16 R
## 17 K
## 18 K
## 19 G
## 20 K
## 21 N
## 22 K
## 23 G
## 24 R
## 25 K
## 26 K
## 27 K
## 28 P
## 29 K
## 30 P
## 31 P
## 32 G
## 33 P
## 34 P
## 35 P
## 36 Q
## 37 G
## 38 P
## 39 K
## 40 K
## 41 G
## 42 Q
## 43 G
## 44 A
## 45 R
## 46 R
## 47 K
## 48 G
## 49 K
## 50 K
## 51 K
## 52 G
## 53 P
## 54 K
## 55 Q
## 56 Q
## 57 Q
## 58 Q
## 59 K
## 60 G
## 61 K
## 62 R
## 63 K
## 64 N
## 65 N
## 66 N
## 67 K
## 68 K
## 69 N
## 70 G
## 71 G
## 72 G
## 73 G
## 74 G
## 75 R
## 76 G
## 77 G
max(cereal$ rating)
## [1] 93.70491
min(cereal$ rating)
## [1] 18.04285
The maximum is 93.70491 and minimum is 18.04285. I think the rating is out of 100.
mean<- mean (cereal $ sodium)
paste0("the mean is: ", mean )
## [1] "the mean is: 159.675324675325"
median <- median(cereal $ sodium)
paste0("the median is: ", median)
## [1] "the median is: 180"
mean <= median
## [1] TRUE
sprintf("Since the mean is less than the median, the sodium variable is skewed to the left")
## [1] "Since the mean is less than the median, the sodium variable is skewed to the left"
is.ordered(cereal $ hot_cold)
## [1] FALSE
paste0("The column hot_cold is not ordered. The variable is a : ", typeof(cereal$hot_cold) )
## [1] "The column hot_cold is not ordered. The variable is a : character"
For this problem you will need the <plyr> and <dplyr> packages. Run the code chunk below to load them into your working space.
Note: Your will need to install them if you haven’t already.
library(plyr)
library(dplyr)
Write code that checks the levels of the manufacturer column.
levels(cereal$ manufacturer)
## NULL
paste0("There are ", length(unique(cereal $ manufacturer)), " levels in manufaturer column")
## [1] "There are 7 levels in manufaturer column"
unique(cereal $ manufacturer)
## [1] "N" "Q" "K" "R" "G" "P" "A"
Let’s give the manufacturer variable more meaningful names. Write code that alters the manufacturer column to reflect the following mapping:
Manufacturer of cereal
A = American Home Food Products
G = General Mills
K = Kelloggs
N = Nabisco
P = Post
Q = Quaker Oats
R = Ralston Purina
# cereal_1 <- cereal[cereal$manufacturer == "A" , "American Home Food Products"]
# cereal_1
# cereal_1 <-replace(cereal$manufacturer, cereal$manufacture == "A","American Home Food Products")
# cereal_1
cereal$manufacturer[cereal$manufacturer== "A"] <- "American Food Products"
cereal$manufacturer[cereal$manufacturer== "G"] <- "General Mills"
cereal$manufacturer[cereal$manufacturer== "K"] <- "Kelloggs"
cereal$manufacturer[cereal$manufacturer== "N"] <- "Nabisco"
cereal$manufacturer[cereal$manufacturer== "P"] <- "Post"
cereal$manufacturer[cereal$manufacturer== "Q"] <- "Quaker Oats"
cereal$manufacturer[cereal$manufacturer== "R"] <- "Ralston Purina"
head(cereal)
## name manufacturer hot_cold calories protein fat sodium
## 1 100% Bran Nabisco Cold 70 4 1 130
## 2 100% Natural Bran Quaker Oats Cold 120 3 5 15
## 3 All-Bran Kelloggs Cold 70 4 1 260
## 4 All-Bran with Extra Fiber Kelloggs Cold 50 4 0 140
## 5 Almond Delight Ralston Purina COLD 110 2 2 200
## 6 Apple Cinnamon Cheerios General Mills Cold 110 2 2 180
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1 10.0 5.0 6 280 25 3 1
## 2 2.0 8.0 8 135 0 3 1
## 3 9.0 7.0 5 320 25 3 1
## 4 14.0 8.0 0 330 25 3 1
## 5 1.0 14.0 8 -1 25 3 1
## 6 1.5 10.5 10 70 25 1 1
## cups_in_serving rating
## 1 0.33 68.40297
## 2 1.00 33.98368
## 3 0.33 59.42551
## 4 0.50 93.70491
## 5 0.75 34.38484
## 6 0.75 29.50954
Now write code that checks the levels of the hot_cold column. Use the space below to describe a problem with this column.
cereal$hot_cold <- as.factor(cereal$hot_cold)
class(cereal$hot_cold)
## [1] "factor"
levels(cereal$hot_cold)
## [1] "cold" "Cold" "CoLD" "COLD" "Hot"
Description of the problem:
The reason why we couldn’t check the levels of this column at first was that the observations were in strings and not factors. After converting the column into a factor column and checking the levels, we see that R is case sensitive and reads cold, CoLD and COLD as different factor levels even if they are the same thing, so this means that the capitalization or small leteers should be consistent in order to be counted in the same factor level.
Use the mapvalues and mutate functions to fix the hot_cold column by mapping all of the lowercase and mixed case instances to a consistent case. Create a new dataset called cereal2, and leave the dataset you created in part (b) untouched. Make sure you check your remapping was successful.
# library(magrittr)
# library(dplyr)
# cereal12 <- cereal
# cereal12 %>% mutate(hot_cold=recode(hot_cold,
# `CoLD`="Cold",
# `COLD`="Cold",
# `cold`= "Cold")) ##using recode
# cereal12
# levels(cereal12$hot_cold)
## using mutate and mapvalues
library(plyr)
library(magrittr)
cereal12 <- cereal
cereal12 %<>% mutate (hot_cold=plyr::mapvalues(hot_cold, c("CoLD","cold","COLD"), c("Cold","Cold","Cold")))
head(cereal12)
## name manufacturer hot_cold calories protein fat sodium
## 1 100% Bran Nabisco Cold 70 4 1 130
## 2 100% Natural Bran Quaker Oats Cold 120 3 5 15
## 3 All-Bran Kelloggs Cold 70 4 1 260
## 4 All-Bran with Extra Fiber Kelloggs Cold 50 4 0 140
## 5 Almond Delight Ralston Purina Cold 110 2 2 200
## 6 Apple Cinnamon Cheerios General Mills Cold 110 2 2 180
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1 10.0 5.0 6 280 25 3 1
## 2 2.0 8.0 8 135 0 3 1
## 3 9.0 7.0 5 320 25 3 1
## 4 14.0 8.0 0 330 25 3 1
## 5 1.0 14.0 8 -1 25 3 1
## 6 1.5 10.5 10 70 25 1 1
## cups_in_serving rating
## 1 0.33 68.40297
## 2 1.00 33.98368
## 3 0.33 59.42551
## 4 0.50 93.70491
## 5 0.75 34.38484
## 6 0.75 29.50954
levels(cereal12$hot_cold)
## [1] "Cold" "Hot"
The toupper() function takes an array of character strings and converts all letters to uppercase. Alternatively, tolower() can take an array of character strings and convert all the letters to lowercase.
Use toupper() OR tolower() and mutate to perform the same data cleaning task as in part (d) on the dataset from part(b). Save the results in a new column called <hot_cold_new> Check if the remapping was successful.
Note: Make sure you turn the new column into a factor
cereal %<>% mutate (hot_cold_new= tolower(cereal$hot_cold))
cereal$hot_cold_new <- as.factor(cereal$hot_cold_new)
levels(cereal$hot_cold_new)
## [1] "cold" "hot"
head(cereal)
## name manufacturer hot_cold calories protein fat sodium
## 1 100% Bran Nabisco Cold 70 4 1 130
## 2 100% Natural Bran Quaker Oats Cold 120 3 5 15
## 3 All-Bran Kelloggs Cold 70 4 1 260
## 4 All-Bran with Extra Fiber Kelloggs Cold 50 4 0 140
## 5 Almond Delight Ralston Purina COLD 110 2 2 200
## 6 Apple Cinnamon Cheerios General Mills Cold 110 2 2 180
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 1 10.0 5.0 6 280 25 3 1
## 2 2.0 8.0 8 135 0 3 1
## 3 9.0 7.0 5 320 25 3 1
## 4 14.0 8.0 0 330 25 3 1
## 5 1.0 14.0 8 -1 25 3 1
## 6 1.5 10.5 10 70 25 1 1
## cups_in_serving rating hot_cold_new
## 1 0.33 68.40297 cold
## 2 1.00 33.98368 cold
## 3 0.33 59.42551 cold
## 4 0.50 93.70491 cold
## 5 0.75 34.38484 cold
## 6 0.75 29.50954 cold
Work on the dataset returned by Problem 2 part (e).
Write code that creates a dataframe for cereals manufacturered ONLY by Quaker Oats or Kellogs and have less than a 100 calories.
Check the first 3 rows of this new dataframe.
cereal_restricted <- subset(cereal, manufacturer == "Quaker Oats"| manufacturer=="Kelloggs")
cereal_restricted <- subset(cereal_restricted, calories <100)
cereal_restricted[1:3,]
## name manufacturer hot_cold calories protein fat sodium
## 3 All-Bran Kelloggs Cold 70 4 1 260
## 4 All-Bran with Extra Fiber Kelloggs Cold 50 4 0 140
## 51 Nutri-grain Wheat Kelloggs Cold 90 3 0 170
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 3 9 7 5 320 25 3 1
## 4 14 8 0 330 25 3 1
## 51 3 18 2 90 25 3 1
## cups_in_serving rating hot_cold_new
## 3 0.33 59.42551 cold
## 4 0.50 93.70491 cold
## 51 1.00 59.64284 cold
Run the code below to create a variable called calorie bins that ranks cereals based on the number of calories it has. Once the new variable <calorie_bins>. has been created, write code to order it from low, medium, high.
Note: This code chunk has been set to eval = FALSE, set this to TRUE before you knit your html.
## Creating Bins
b <- c(-Inf, 100, 120, Inf )
#Create a vector of names for break points:
names <- c("Low", "Medium", "High")
cereal$calorie_bins <- cut(cereal$calories, breaks = b, labels = names)
levels(cereal$calorie_bins)
## [1] "Low" "Medium" "High"
cereal[order(cereal$calorie_bins),]
How many cereals in our dataset fall into the high calorie bin we defined?
cereal_high_calorie <- cereal[cereal$calorie_bins == "High",]
head(cereal_high_calorie)
## name manufacturer hot_cold calories protein
## 8 Basic 4 General Mills Cold 130 3
## 40 Just Right Fruit & Nut Kelloggs Cold 140 3
## 45 Muesli Raisins; Dates; & Almonds Ralston Purina Cold 150 4
## 46 Muesli Raisins; Peaches; & Pecans Ralston Purina Cold 150 4
## 47 Mueslix Crispy Blend Kelloggs Cold 160 3
## 50 Nutri-Grain Almond-Raisin Kelloggs Cold 140 3
## fat sodium fiber carbohydrates sugars potassium vitamins display_shelf
## 8 2 210 2 18 8 100 25 3
## 40 1 170 2 20 9 95 100 3
## 45 3 95 3 16 11 170 25 3
## 46 3 150 3 16 11 170 25 3
## 47 2 150 3 17 13 160 25 3
## 50 2 220 3 21 7 130 25 3
## weight_ounces cups_in_serving rating hot_cold_new calorie_bins
## 8 1.33 0.75 37.03856 cold High
## 40 1.30 0.75 36.47151 cold High
## 45 1.00 1.00 37.13686 cold High
## 46 1.00 1.00 34.13976 cold High
## 47 1.50 0.67 30.31335 cold High
## 50 1.33 0.67 40.69232 cold High
length(cereal_high_calorie)
## [1] 18
Eighteen cereals fall into our high calorie factor level.
Write code to find the mean calories for cereals produced by Quaker Oats and the mean calories for cereals produced by Nabisco. Between these two options, which manufacturer do you think makes healthier cereals?
quaker <- cereal[cereal$manufacturer=="Quaker Oats",]
mean(quaker$calories)
## [1] 95
nabisco <- cereal[cereal$manufacturer=="Nabisco",]
mean(nabisco$calories)
## [1] 86.66667
The more calories a cereal provides, the healthier it is. The mean of calories provided by all cereals of Quaker oats is 95 and that of Nabisco is 86.6. So comparitively Quaker oats is healthier.
Create a new variable high_sugar which takes the value 1 if sugars are greater than 10 and zero otherwise.
# cereal <- cereal %<>% mutate(high_sugar= 1 if sugars> 10 else 0)
cereal <- cereal %>%
mutate(high_sugar = if_else(sugars > 10, 1, 0))
cereal[6:9,]
## name manufacturer hot_cold calories protein fat sodium
## 6 Apple Cinnamon Cheerios General Mills Cold 110 2 2 180
## 7 Apple Jacks Kelloggs Cold 110 2 0 125
## 8 Basic 4 General Mills Cold 130 3 2 210
## 9 Bran Chex Ralston Purina Cold 90 2 1 200
## fiber carbohydrates sugars potassium vitamins display_shelf weight_ounces
## 6 1.5 10.5 10 70 25 1 1.00
## 7 1.0 11.0 14 30 25 2 1.00
## 8 2.0 18.0 8 100 25 3 1.33
## 9 4.0 15.0 6 125 25 1 1.00
## cups_in_serving rating hot_cold_new calorie_bins high_sugar
## 6 0.75 29.50954 cold Medium 0
## 7 1.00 33.17409 cold Medium 1
## 8 0.75 37.03856 cold High 0
## 9 0.67 49.12025 cold Low 0
Write code to find the mean calories for cereals produced by Quaker Oats and the mean calories for cereals produced by Kellogs, but this time also add an additional constraint for whether or not the cereal is high sugar.
Does the number of sugars in a cereal have any impact on its calories?
quaker <- cereal[cereal$manufacturer=="Quaker Oats",]
quaker$ high_sugar == 1
## [1] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
mean(quaker$calories)
## [1] 95
nabisco <- cereal[cereal$manufacturer=="Nabisco",]
nabisco$ high_sugar ==1
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
mean(nabisco$calories)
## [1] 86.66667
Quaker Oats has manufactured more cereals that are high in sugar whereas nabisco comparitively has none. The higher calories might be due to high sugar but a higher sugar index means less healthy cereal.
Notice that the dataset contains each cereal’s weight in ounces. Write a function take converts weight in ounces to weight in grams, rounded off to the nearest 2 decimal places. Make sure you test the function you create.
HINT: You may find it useful to google the ounce to gram conversion before you implement in your code.
# Here's a function skeleton to get you started
# This function converts ounces to grams
install.packages("measurements",repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/HP/Documents/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'measurements' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\HP\AppData\Local\Temp\RtmpacNGMh\downloaded_packages
library(measurements)
conversion <- conv_unit(cereal$weight_ounces, from= "oz", to= "g")
ounces_to_grams <- conversion
Use the function you created above to add a new column called weight_grams to your cereal dataset, to each cereal’s weight in ounces.
conversion <- conv_unit(cereal$weight_ounces, from= "oz", to= "g")
ounces_to_grams <- conversion
cereal <- cereal %<>% mutate(weight_grams= c(ounces_to_grams))
cereal
## name manufacturer hot_cold
## 1 100% Bran Nabisco Cold
## 2 100% Natural Bran Quaker Oats Cold
## 3 All-Bran Kelloggs Cold
## 4 All-Bran with Extra Fiber Kelloggs Cold
## 5 Almond Delight Ralston Purina COLD
## 6 Apple Cinnamon Cheerios General Mills Cold
## 7 Apple Jacks Kelloggs Cold
## 8 Basic 4 General Mills Cold
## 9 Bran Chex Ralston Purina Cold
## 10 Bran Flakes Post Cold
## 11 Cap'n'Crunch Quaker Oats Cold
## 12 Cheerios General Mills Cold
## 13 Cinnamon Toast Crunch General Mills Cold
## 14 Clusters General Mills Cold
## 15 Cocoa Puffs General Mills Cold
## 16 Corn Chex Ralston Purina Cold
## 17 Corn Flakes Kelloggs Cold
## 18 Corn Pops Kelloggs Cold
## 19 Count Chocula General Mills Cold
## 20 Cracklin' Oat Bran Kelloggs Cold
## 21 Cream of Wheat (Quick) Nabisco Hot
## 22 Crispix Kelloggs Cold
## 23 Crispy Wheat & Raisins General Mills Cold
## 24 Double Chex Ralston Purina Cold
## 25 Froot Loops Kelloggs Cold
## 26 Frosted Flakes Kelloggs Cold
## 27 Frosted Mini-Wheats Kelloggs Cold
## 28 Fruit & Fibre Dates; Walnuts; and Oats Post Cold
## 29 Fruitful Bran Kelloggs Cold
## 30 Fruity Pebbles Post Cold
## 31 Golden Crisp Post Cold
## 32 Golden Grahams General Mills Cold
## 33 Grape Nuts Flakes Post Cold
## 34 Grape-Nuts Post Cold
## 35 Great Grains Pecan Post Cold
## 36 Honey Graham Ohs Quaker Oats Cold
## 37 Honey Nut Cheerios General Mills Cold
## 38 Honey-comb Post Cold
## 39 Just Right Crunchy Nuggets Kelloggs Cold
## 40 Just Right Fruit & Nut Kelloggs Cold
## 41 Kix General Mills Cold
## 42 Life Quaker Oats Cold
## 43 Lucky Charms General Mills Cold
## 44 Maypo American Food Products Hot
## 45 Muesli Raisins; Dates; & Almonds Ralston Purina Cold
## 46 Muesli Raisins; Peaches; & Pecans Ralston Purina Cold
## 47 Mueslix Crispy Blend Kelloggs Cold
## 48 Multi-Grain Cheerios General Mills Cold
## 49 Nut&Honey Crunch Kelloggs Cold
## 50 Nutri-Grain Almond-Raisin Kelloggs Cold
## 51 Nutri-grain Wheat Kelloggs Cold
## 52 Oatmeal Raisin Crisp General Mills Cold
## 53 Post Nat. Raisin Bran Post Cold
## 54 Product 19 Kelloggs Cold
## 55 Puffed Rice Quaker Oats Cold
## 56 Puffed Wheat Quaker Oats Cold
## 57 Quaker Oat Squares Quaker Oats Cold
## 58 Quaker Oatmeal Quaker Oats Hot
## 59 Raisin Bran Kelloggs Cold
## 60 Raisin Nut Bran General Mills Cold
## 61 Raisin Squares Kelloggs Cold
## 62 Rice Chex Ralston Purina Cold
## 63 Rice Krispies Kelloggs Cold
## 64 Shredded Wheat Nabisco CoLD
## 65 Shredded Wheat 'n'Bran Nabisco Cold
## 66 Shredded Wheat spoon size Nabisco Cold
## 67 Smacks Kelloggs Cold
## 68 Special K Kelloggs Cold
## 69 Strawberry Fruit Wheats Nabisco Cold
## 70 Total Corn Flakes General Mills Cold
## 71 Total Raisin Bran General Mills Cold
## 72 Total Whole Grain General Mills Cold
## 73 Triples General Mills Cold
## 74 Trix General Mills Cold
## 75 Wheat Chex Ralston Purina Cold
## 76 Wheaties General Mills Cold
## 77 Wheaties Honey Gold General Mills cold
## calories protein fat sodium fiber carbohydrates sugars potassium vitamins
## 1 70 4 1 130 10.0 5.0 6 280 25
## 2 120 3 5 15 2.0 8.0 8 135 0
## 3 70 4 1 260 9.0 7.0 5 320 25
## 4 50 4 0 140 14.0 8.0 0 330 25
## 5 110 2 2 200 1.0 14.0 8 -1 25
## 6 110 2 2 180 1.5 10.5 10 70 25
## 7 110 2 0 125 1.0 11.0 14 30 25
## 8 130 3 2 210 2.0 18.0 8 100 25
## 9 90 2 1 200 4.0 15.0 6 125 25
## 10 90 3 0 210 5.0 13.0 5 190 25
## 11 120 1 2 220 0.0 12.0 12 35 25
## 12 110 6 2 290 2.0 17.0 1 105 25
## 13 120 1 3 210 0.0 13.0 9 45 25
## 14 110 3 2 140 2.0 13.0 7 105 25
## 15 110 1 1 180 0.0 12.0 13 55 25
## 16 110 2 0 280 0.0 22.0 3 25 25
## 17 100 2 0 290 1.0 21.0 2 35 25
## 18 110 1 0 90 1.0 13.0 12 20 25
## 19 110 1 1 180 0.0 12.0 13 65 25
## 20 110 3 3 140 4.0 10.0 7 160 25
## 21 100 3 0 80 1.0 21.0 0 -1 0
## 22 110 2 0 220 1.0 21.0 3 30 25
## 23 100 2 1 140 2.0 11.0 10 120 25
## 24 100 2 0 190 1.0 18.0 5 80 25
## 25 110 2 1 125 1.0 11.0 13 30 25
## 26 110 1 0 200 1.0 14.0 11 25 25
## 27 100 3 0 0 3.0 14.0 7 100 25
## 28 120 3 2 160 5.0 12.0 10 200 25
## 29 120 3 0 240 5.0 14.0 12 190 25
## 30 110 1 1 135 0.0 13.0 12 25 25
## 31 100 2 0 45 0.0 11.0 15 40 25
## 32 110 1 1 280 0.0 15.0 9 45 25
## 33 100 3 1 140 3.0 15.0 5 85 25
## 34 110 3 0 170 3.0 17.0 3 90 25
## 35 120 3 3 75 3.0 13.0 4 100 25
## 36 120 1 2 220 1.0 12.0 11 45 25
## 37 110 3 1 250 1.5 11.5 10 90 25
## 38 110 1 0 180 0.0 14.0 11 35 25
## 39 110 2 1 170 1.0 17.0 6 60 100
## 40 140 3 1 170 2.0 20.0 9 95 100
## 41 110 2 1 260 0.0 21.0 3 40 25
## 42 100 4 2 150 2.0 12.0 6 95 25
## 43 110 2 1 180 0.0 12.0 12 55 25
## 44 100 4 1 0 0.0 16.0 3 95 25
## 45 150 4 3 95 3.0 16.0 11 170 25
## 46 150 4 3 150 3.0 16.0 11 170 25
## 47 160 3 2 150 3.0 17.0 13 160 25
## 48 100 2 1 220 2.0 15.0 6 90 25
## 49 120 2 1 190 0.0 15.0 9 40 25
## 50 140 3 2 220 3.0 21.0 7 130 25
## 51 90 3 0 170 3.0 18.0 2 90 25
## 52 130 3 2 170 1.5 13.5 10 120 25
## 53 120 3 1 200 6.0 11.0 14 260 25
## 54 100 3 0 320 1.0 20.0 3 45 100
## 55 50 1 0 0 0.0 13.0 0 15 0
## 56 50 2 0 0 1.0 10.0 0 50 0
## 57 100 4 1 135 2.0 14.0 6 110 25
## 58 100 5 2 0 2.7 -1.0 -1 110 0
## 59 120 3 1 210 5.0 14.0 12 240 25
## 60 100 3 2 140 2.5 10.5 8 140 25
## 61 90 2 0 0 2.0 15.0 6 110 25
## 62 110 1 0 240 0.0 23.0 2 30 25
## 63 110 2 0 290 0.0 22.0 3 35 25
## 64 80 2 0 0 3.0 16.0 0 95 0
## 65 90 3 0 0 4.0 19.0 0 140 0
## 66 90 3 0 0 3.0 20.0 0 120 0
## 67 110 2 1 70 1.0 9.0 15 40 25
## 68 110 6 0 230 1.0 16.0 3 55 25
## 69 90 2 0 15 3.0 15.0 5 90 25
## 70 110 2 1 200 0.0 21.0 3 35 100
## 71 140 3 1 190 4.0 15.0 14 230 100
## 72 100 3 1 200 3.0 16.0 3 110 100
## 73 110 2 1 250 0.0 21.0 3 60 25
## 74 110 1 1 140 0.0 13.0 12 25 25
## 75 100 3 1 230 3.0 17.0 3 115 25
## 76 100 3 1 200 3.0 17.0 3 110 25
## 77 110 2 1 200 1.0 16.0 8 60 25
## display_shelf weight_ounces cups_in_serving rating hot_cold_new
## 1 3 1.00 0.33 68.40297 cold
## 2 3 1.00 1.00 33.98368 cold
## 3 3 1.00 0.33 59.42551 cold
## 4 3 1.00 0.50 93.70491 cold
## 5 3 1.00 0.75 34.38484 cold
## 6 1 1.00 0.75 29.50954 cold
## 7 2 1.00 1.00 33.17409 cold
## 8 3 1.33 0.75 37.03856 cold
## 9 1 1.00 0.67 49.12025 cold
## 10 3 1.00 0.67 53.31381 cold
## 11 2 1.00 0.75 18.04285 cold
## 12 1 1.00 1.25 50.76500 cold
## 13 2 1.00 0.75 19.82357 cold
## 14 3 1.00 0.50 40.40021 cold
## 15 2 1.00 1.00 22.73645 cold
## 16 1 1.00 1.00 41.44502 cold
## 17 1 1.00 1.00 45.86332 cold
## 18 2 1.00 1.00 35.78279 cold
## 19 2 1.00 1.00 22.39651 cold
## 20 3 1.00 0.50 40.44877 cold
## 21 2 1.00 1.00 64.53382 hot
## 22 3 1.00 1.00 46.89564 cold
## 23 3 1.00 0.75 36.17620 cold
## 24 3 1.00 0.75 44.33086 cold
## 25 2 1.00 1.00 32.20758 cold
## 26 1 1.00 0.75 31.43597 cold
## 27 2 1.00 0.80 58.34514 cold
## 28 3 1.25 0.67 40.91705 cold
## 29 3 1.33 0.67 41.01549 cold
## 30 2 1.00 0.75 28.02576 cold
## 31 1 1.00 0.88 35.25244 cold
## 32 2 1.00 0.75 23.80404 cold
## 33 3 1.00 0.88 52.07690 cold
## 34 3 1.00 0.25 53.37101 cold
## 35 3 1.00 0.33 45.81172 cold
## 36 2 1.00 1.00 21.87129 cold
## 37 1 1.00 0.75 31.07222 cold
## 38 1 1.00 1.33 28.74241 cold
## 39 3 1.00 1.00 36.52368 cold
## 40 3 1.30 0.75 36.47151 cold
## 41 2 1.00 1.50 39.24111 cold
## 42 2 1.00 0.67 45.32807 cold
## 43 2 1.00 1.00 26.73451 cold
## 44 2 1.00 1.00 54.85092 hot
## 45 3 1.00 1.00 37.13686 cold
## 46 3 1.00 1.00 34.13976 cold
## 47 3 1.50 0.67 30.31335 cold
## 48 1 1.00 1.00 40.10596 cold
## 49 2 1.00 0.67 29.92429 cold
## 50 3 1.33 0.67 40.69232 cold
## 51 3 1.00 1.00 59.64284 cold
## 52 3 1.25 0.50 30.45084 cold
## 53 3 1.33 0.67 37.84059 cold
## 54 3 1.00 1.00 41.50354 cold
## 55 3 0.50 1.00 60.75611 cold
## 56 3 0.50 1.00 63.00565 cold
## 57 3 1.00 0.50 49.51187 cold
## 58 1 1.00 0.67 50.82839 hot
## 59 2 1.33 0.75 39.25920 cold
## 60 3 1.00 0.50 39.70340 cold
## 61 3 1.00 0.50 55.33314 cold
## 62 1 1.00 1.13 41.99893 cold
## 63 1 1.00 1.00 40.56016 cold
## 64 1 0.83 1.00 68.23588 cold
## 65 1 1.00 0.67 74.47295 cold
## 66 1 1.00 0.67 72.80179 cold
## 67 2 1.00 0.75 31.23005 cold
## 68 1 1.00 1.00 53.13132 cold
## 69 2 1.00 1.00 59.36399 cold
## 70 3 1.00 1.00 38.83975 cold
## 71 3 1.50 1.00 28.59278 cold
## 72 3 1.00 1.00 46.65884 cold
## 73 3 1.00 0.75 39.10617 cold
## 74 2 1.00 1.00 27.75330 cold
## 75 1 1.00 0.67 49.78744 cold
## 76 1 1.00 1.00 51.59219 cold
## 77 1 1.00 0.75 36.18756 cold
## calorie_bins high_sugar weight_grams
## 1 Low 0 28.34952
## 2 Medium 0 28.34952
## 3 Low 0 28.34952
## 4 Low 0 28.34952
## 5 Medium 0 28.34952
## 6 Medium 0 28.34952
## 7 Medium 1 28.34952
## 8 High 0 37.70487
## 9 Low 0 28.34952
## 10 Low 0 28.34952
## 11 Medium 1 28.34952
## 12 Medium 0 28.34952
## 13 Medium 0 28.34952
## 14 Medium 0 28.34952
## 15 Medium 1 28.34952
## 16 Medium 0 28.34952
## 17 Low 0 28.34952
## 18 Medium 1 28.34952
## 19 Medium 1 28.34952
## 20 Medium 0 28.34952
## 21 Low 0 28.34952
## 22 Medium 0 28.34952
## 23 Low 0 28.34952
## 24 Low 0 28.34952
## 25 Medium 1 28.34952
## 26 Medium 1 28.34952
## 27 Low 0 28.34952
## 28 Medium 0 35.43690
## 29 Medium 1 37.70487
## 30 Medium 1 28.34952
## 31 Low 1 28.34952
## 32 Medium 0 28.34952
## 33 Low 0 28.34952
## 34 Medium 0 28.34952
## 35 Medium 0 28.34952
## 36 Medium 1 28.34952
## 37 Medium 0 28.34952
## 38 Medium 1 28.34952
## 39 Medium 0 28.34952
## 40 High 0 36.85438
## 41 Medium 0 28.34952
## 42 Low 0 28.34952
## 43 Medium 1 28.34952
## 44 Low 0 28.34952
## 45 High 1 28.34952
## 46 High 1 28.34952
## 47 High 1 42.52428
## 48 Low 0 28.34952
## 49 Medium 0 28.34952
## 50 High 0 37.70487
## 51 Low 0 28.34952
## 52 High 0 35.43690
## 53 Medium 1 37.70487
## 54 Low 0 28.34952
## 55 Low 0 14.17476
## 56 Low 0 14.17476
## 57 Low 0 28.34952
## 58 Low 0 28.34952
## 59 Medium 1 37.70487
## 60 Low 0 28.34952
## 61 Low 0 28.34952
## 62 Medium 0 28.34952
## 63 Medium 0 28.34952
## 64 Low 0 23.53010
## 65 Low 0 28.34952
## 66 Low 0 28.34952
## 67 Medium 1 28.34952
## 68 Medium 0 28.34952
## 69 Low 0 28.34952
## 70 Medium 0 28.34952
## 71 High 1 42.52428
## 72 Low 0 28.34952
## 73 Medium 0 28.34952
## 74 Medium 1 28.34952
## 75 Low 0 28.34952
## 76 Low 0 28.34952
## 77 Medium 0 28.34952
Write a basic for loop that prints the column name and class type of the first 16 columns in the cereal data set. You can check the class of a vector by using the class() function.The output format should be the following:
The variable ___ has class type ____.
names_columns <- c(colnames(cereal))
names_columns
## [1] "name" "manufacturer" "hot_cold" "calories"
## [5] "protein" "fat" "sodium" "fiber"
## [9] "carbohydrates" "sugars" "potassium" "vitamins"
## [13] "display_shelf" "weight_ounces" "cups_in_serving" "rating"
## [17] "hot_cold_new" "calorie_bins" "high_sugar" "weight_grams"
for(i in colnames(cereal))
paste0("The variable ", colnames(cereal[,1:16]), " has a class type ", class(i))
Write a function that calculates the mean of a numeric vector x, ignoring the s smallest and l largest values (this is a trimmed mean).
E.g., if x = c(1, 7, 3, 2, 5, 0.5, 9, 10), s = 1, and l = 2, your function would return the mean of c(1, 7, 3, 2, 5) (this is x with the 1 smallest value (0.5) and the 2 largest values (9, 10) removed).
Your function should use the length() function to check if x has at least s + l + 1 values. If x is shorter than s + l + 1, your function should use the message() function to tell the user that the vector can’t be trimmed as requested. If x is at least length s + l + 1, your function should return the trimmed mean.
It is useful to break down this problem into it’s various parts before you start writing the code. E.g:
Step 1: Get the smallest and largest values of the vector (It may be useful to recall the sort() function we learned about)
Step 2: Check the length of the vector
Step 3: IF the length of the vector is less than s+l+1 THEN give message ELSE
Step 4: Get the mean of all values of the vector excluding the S smallest and L largest elements
HINT: Remember, there are many ways to write this function. ONE way you might consider is to calculate the mean of only the relevant indexes of the vector after sorting.
# Here's a function skeleton to get you started
# x <- c(sort(x))
# s <- min(x)
# l <- max(x)
# min_length <- length(s)+length(l)+1
#
# if(length(x) >= min_length){
# paste0("The trimmed mean after remiving the smallest and largets value of the vector is : ", mean(x, trim=0.2))
# }else {
# sprintf("The vector can not be trimmed as requested because the length of vactor is very small")
# }
x= c(1,7,3,2,5,0.5,9,10)
trimmedMean <- function (x, s=0 ,l=0){sort(x)
if(length(x) < s+l+1){
message("Sorry! The vector can't be trimmed as requested because it is too short, please write downa vector that has more number of entries.")}
else{a <- head(x,s) ##creating a vector that we want to be deleted from the start
a2 <- x[!x %in% a] # x except the deleted small entries
b <- tail(x,l) ## creating a vector of elements taht we want to be deleted from the end
b2 <- a2[!x %in% b] #x except the specified number of highest and lowest values
mean(b2)
}
}
trimmedMean(x, s=0, l=3)
## [1] 3.6
# This function calculates the mean of a numeric vector ignoring the specified number of smallest and largest values
Note: The s = 0 and l = 0 specified in the function definition are the default settings. i.e., this syntax ensures that if s and l are not provided by the user, they are both set to 0. Thus the default behaviour is that the trimmedMean function doesn’t trim anything, and hence is the same as the mean function.
The below code creates a random list for you to apply your new function on.
set.seed(201802) # Sets seed to make sure everyone's random vectors are generated the same
list.random <- list(x = rnorm(50),
y = rexp(65),
z = rt(100, df = 1.5))
# Here's a Figure showing histograms of the data
par(mfrow = c(1,3))
hist(list.random$x, breaks = 15, col = 'grey')
hist(list.random$y, breaks = 10, col = 'forestgreen')
hist(list.random$z, breaks = 20, col = 'steelblue')
Using a for loop and your function from part (a), create a vector whose elements are the trimmed means of the vectors in list.random, taking s = 5 and l = 5.
Note: you will need to create an empty vector first to store the results of your for loop in.
trimmed_means5 <-c()
for(list in list.random){
trimmed_means5[length(trimmed_means5)+1] <- trimmedMean(list, s=5, l=5) ##filling the empty vector and increasing the lenth by one each time
}
trimmed_means5
## [1] -0.1961599 1.0634269 0.3059701
means <-c()
for(list in list.random){
means[length(means)+1] <- mean(list) ##filling the empty vector and increasing the lenth by one each time
}
means
## [1] -0.22171186 1.03795962 0.06563583
Explanation:
Even after removing the five highest and lowest values the means aren’t much different and only have a diffrence of 0.1 or 0.01 from trimmed means.
Repeat part (b), using the lapply and sapply functions instead of a for loop. Your lapply command should return a list of trimmed means, and your sapply command should return a vector of trimmed means.
## Your answer here
lapply(list.random, trimmedMean)
## $x
## [1] -0.2217119
##
## $y
## [1] 1.03796
##
## $z
## [1] 0.06563583
sapply(list.random, trimmedMean)
## x y z
## -0.22171186 1.03795962 0.06563583
Hint lapply and sapply can take arguments that you wish to pass to the trimmedMean function. E.g., if you were applying the function sort, which has an argument decreasing, you could use the syntax lapply(..., FUN = sort, decreasing = TRUE).