1 Introduction

First, save the Rmd file in a folder with the rest of your course work. Put the file cereal.csv in the same folder. Go to Session > Set Working Directory > To Source File Location. Now you may run the below code that will load the data.

cereal <- read.csv("cereal.csv")
attach(cereal)

The object cereal is a data frame with 77 rows and 16 columns. The 77 cereals have information recordered about their nutrients on a per serving basis. Information is also given about the number cups in one serving. Full details on the variable names are given below.

name: name of cereal
mfr: manufacturer of cereal
- A = American Home Food Products;
- G = General Mills
- K = Kelloggs
- N = Nabisco
- P = Post
- Q = Quaker Oats
- R = Ralston Purina
type: (C) cold, (H) hot
calories: calories per serving
protein: grams of protein
fat: grams of fat
sodium: milligrams of sodium
fiber: grams of dietary fiber
carbo: grams of complex carbohydrates
sugars: grams of sugars
potass: milligrams of potassium
vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended
shelf: display shelf (1, 2, or 3, counting from the floor)
weight: weight in ounces of one serving
cups: number of cups in one serving
rating: a rating of the cereals

The R command attach(cereal) allows you to access each vector in the data frame by simply typing the variable’s name in R. For example,

potass

 [1] 280 135 320 330  -1  70  30 100 125 190  35 105  45 105  55  25  35
[18]  20  65 160  -1  30 120  80  30  25 100 200 190  25  40  45  85  90
[35] 100  45  90  35  60  95  40  95  55  95 170 170 160  90  40 130  90
[52] 120 260  45  15  50 110 110 240 140 110  30  35  95 140 120  40  55
[69]  90  35 230 110  60  25 115 110  60

displays the vector of potassium values from the cereal data frame.

2 Operators

2.1 Assignment operator

Objects and variables are created with the assignment operator, <- or =. Remeber that in RStudio you can use the shortcut Alt + - to create the <- assignment operator. The shortcut includes convenient spacing before and after the assignment operator. While the equal sign will work, it is best practice to stick to using <-.

Let’s look at the assignment operator in action. We can compute the protein to fat ratio per serving by

protein.to.fat <- protein / fat
head(protein.to.fat)

[1] 4.0 0.6 4.0 Inf 1.0 1.0

The protein to carbohydrate ratio can be computed as

protein.to.carb <- protein / carbo
head(protein.to.carb)

[1] 0.8000000 0.3750000 0.5714286 0.5000000 0.1428571 0.1904762

Try to be descriptive and concise with your variable names. For example, protein.to.fat is better than ptf.

2.2 Comparison operators

Comparisons are binary operators; they take two objects and give a boolean (TRUE or FALSE) response.

Command	Description
`>`	greater than
`<`	less than
`>=`	greater than or equal to
`<=`	less than or equal to
`==`	equal to (recall that `=` is for assignment and not checking equality)
`!=`	not equal to
`&`	and (ex: `(5 > 7) & (6*7 == 42)` will return the value FALSE)
`\|`	or (ex: `(5 > 7) \| (6*7 == 42)` will return the value TRUE)

2.3 Exercises

Give an example using each of the comparison operators.
How many cereals have more than 100 calories per serving?
How many cereals have 0 grams of sugar per serving?

3 Vectors

3.1 Vector creation

The function c combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.

To create a vector of names and numbers consider

school <- c("MSU", "UM", "OSU", "PSU")
number <- c(50085, 43625, 58322, 45518)

school

[1] "MSU" "UM"  "OSU" "PSU"

number

[1] 50085 43625 58322 45518

3.2 Vector subsetting

Vectors can be subset by position or name (when applicable). For example, consider the vector x, where

x <- letters[1:10]
x

 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

x[5]

[1] "e"

x[c(1, 5, 9)]

[1] "a" "e" "i"

x[-c(4, 6, 10)]

[1] "a" "b" "c" "e" "g" "h" "i"

x[11]

[1] NA

As another example,

grades <- c(99, 85, 89, 92)
names(grades) <- c("STT", "CMSE", "MTH", "STT")
grades

 STT CMSE  MTH  STT 
  99   85   89   92

grades["STT"]

STT 
 99

grades["CMSE"]

CMSE 
  85

grades[4] <- 82
grades[4]

STT 
 82

3.3 Some built-in functions that work on vectors

Assume that vec is a vector of appropriate variable type.

Command	Description
`sum(vec)`	sums up all the elements of `vec`
`mean(vec)`	mean of `vec`
`median(vec)`	median of `vec`
`min(vec), max(vec)`	the smallest or largest element of `vec`
`sd(vec), var(vec)`	the standard deviation and variance of `vec`
`length(vec)`	the number of components in `vec`
`sort(vec)`	returns the `vec` in ascending or descending order
`order(vec)`	returns the index that sorts `vec`
`unique(vec)`	lists the unique elements of `vec`
`summary(vec)`	computes the five-number summary

3.4 Exercises

Use cereal to answer the questions that follow. You may find some of the above functions helpful. Use a single code chunk to answer each question. Below is a helpful example on subsetting a vector with a logical vector.

x <- c(1, 4, -5, sqrt(3), exp(2))
y <- c(T, F, T, T, F)
x[y]

[1]  1.000000 -5.000000  1.732051

How many cereals are in the data set cereal?
What is the average number of grams of sugar per serving among all cereals?
What are the three largest grams of sugar per serving values?
What are the five smallest grams of sugar per serving values?
Which four cereals have the most grams of sugar per serving?
How many cereals have more than 4 grams of protein per serving and less than 3 grams of sugar per serving?

Where can I find the sugary cereals?

What is the mean amount of grams of sugar per serving for cereals on shelf 1, 2, and 3, respectively?
Plot the relationship between shelf and grams of sugar per serving.

Comparison box plots would be better.

boxplot(sugars ~ shelf, xlab = "Shelf", ylab = "Sugar per serving (grams)")

Do you see any issue with analyzing and comparing the data on a per serving basis? Think of and propose a better way given the available data.
Redo the above boxplot using your new normalized metric from 9.

Do these results align with your expectations as to where you would find the sugary cereals?

4 References

https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

Sugary Cereal Vectors

Shawn Santo

January 17, 2019