1 Introduction

First, save the Rmd file in a folder with the rest of your course work. Put the file cereal.csv in the same folder. Go to Session > Set Working Directory > To Source File Location. Now you may run the below code that will load the data.

cereal <- read.csv("cereal.csv")
attach(cereal)

The object cereal is a data frame with 77 rows and 16 columns. The 77 cereals have information recordered about their nutrients on a per serving basis. Information is also given about the number cups in one serving. Full details on the variable names are given below.

  • name: name of cereal
  • mfr: manufacturer of cereal
    • A = American Home Food Products;
    • G = General Mills
    • K = Kelloggs
    • N = Nabisco
    • P = Post
    • Q = Quaker Oats
    • R = Ralston Purina
  • type: (C) cold, (H) hot
  • calories: calories per serving
  • protein: grams of protein
  • fat: grams of fat
  • sodium: milligrams of sodium
  • fiber: grams of dietary fiber
  • carbo: grams of complex carbohydrates
  • sugars: grams of sugars
  • potass: milligrams of potassium
  • vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended
  • shelf: display shelf (1, 2, or 3, counting from the floor)
  • weight: weight in ounces of one serving
  • cups: number of cups in one serving
  • rating: a rating of the cereals

The R command attach(cereal) allows you to access each vector in the data frame by simply typing the variable’s name in R. For example,

potass
 [1] 280 135 320 330  -1  70  30 100 125 190  35 105  45 105  55  25  35
[18]  20  65 160  -1  30 120  80  30  25 100 200 190  25  40  45  85  90
[35] 100  45  90  35  60  95  40  95  55  95 170 170 160  90  40 130  90
[52] 120 260  45  15  50 110 110 240 140 110  30  35  95 140 120  40  55
[69]  90  35 230 110  60  25 115 110  60

displays the vector of potassium values from the cereal data frame.

2 Operators

2.1 Assignment operator

Objects and variables are created with the assignment operator, <- or =. Remeber that in RStudio you can use the shortcut Alt + - to create the <- assignment operator. The shortcut includes convenient spacing before and after the assignment operator. While the equal sign will work, it is best practice to stick to using <-.

Let’s look at the assignment operator in action. We can compute the protein to fat ratio per serving by

protein.to.fat <- protein / fat
head(protein.to.fat)
[1] 4.0 0.6 4.0 Inf 1.0 1.0

The protein to carbohydrate ratio can be computed as

protein.to.carb <- protein / carbo
head(protein.to.carb)
[1] 0.8000000 0.3750000 0.5714286 0.5000000 0.1428571 0.1904762

Try to be descriptive and concise with your variable names. For example, protein.to.fat is better than ptf.

2.2 Comparison operators

Comparisons are binary operators; they take two objects and give a boolean (TRUE or FALSE) response.

Command Description
> greater than
< less than
>= greater than or equal to
<= less than or equal to
== equal to (recall that = is for assignment and not checking equality)
!= not equal to
& and (ex: (5 > 7) & (6*7 == 42) will return the value FALSE)
| or (ex: (5 > 7) | (6*7 == 42) will return the value TRUE)

2.3 Exercises

  1. Give an example using each of the comparison operators.
  2. How many cereals have more than 100 calories per serving?
  3. How many cereals have 0 grams of sugar per serving?

3 Vectors

3.1 Vector creation

The function c combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.

To create a vector of names and numbers consider

school <- c("MSU", "UM", "OSU", "PSU")
number <- c(50085, 43625, 58322, 45518)

school
[1] "MSU" "UM"  "OSU" "PSU"
number
[1] 50085 43625 58322 45518

3.2 Vector subsetting

Vectors can be subset by position or name (when applicable). For example, consider the vector x, where

x <- letters[1:10]
x
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
x[5]
[1] "e"
x[c(1, 5, 9)]
[1] "a" "e" "i"
x[-c(4, 6, 10)]
[1] "a" "b" "c" "e" "g" "h" "i"
x[11]
[1] NA

As another example,

grades <- c(99, 85, 89, 92)
names(grades) <- c("STT", "CMSE", "MTH", "STT")
grades
 STT CMSE  MTH  STT 
  99   85   89   92 
grades["STT"]
STT 
 99 
grades["CMSE"]
CMSE 
  85 
grades[4] <- 82
grades[4]
STT 
 82 

3.3 Some built-in functions that work on vectors

Assume that vec is a vector of appropriate variable type.

Command Description
sum(vec) sums up all the elements of vec
mean(vec) mean of vec
median(vec) median of vec
min(vec), max(vec) the smallest or largest element of vec
sd(vec), var(vec) the standard deviation and variance of vec
length(vec) the number of components in vec
sort(vec) returns the vec in ascending or descending order
order(vec) returns the index that sorts vec
unique(vec) lists the unique elements of vec
summary(vec) computes the five-number summary

3.4 Exercises

Use cereal to answer the questions that follow. You may find some of the above functions helpful. Use a single code chunk to answer each question. Below is a helpful example on subsetting a vector with a logical vector.

x <- c(1, 4, -5, sqrt(3), exp(2))
y <- c(T, F, T, T, F)
x[y]
[1]  1.000000 -5.000000  1.732051
  1. How many cereals are in the data set cereal?
  2. What is the average number of grams of sugar per serving among all cereals?
  3. What are the three largest grams of sugar per serving values?
  4. What are the five smallest grams of sugar per serving values?
  5. Which four cereals have the most grams of sugar per serving?
  6. How many cereals have more than 4 grams of protein per serving and less than 3 grams of sugar per serving?

Where can I find the sugary cereals?

  1. What is the mean amount of grams of sugar per serving for cereals on shelf 1, 2, and 3, respectively?
  2. Plot the relationship between shelf and grams of sugar per serving.

Comparison box plots would be better.

boxplot(sugars ~ shelf, xlab = "Shelf", ylab = "Sugar per serving (grams)")

  1. Do you see any issue with analyzing and comparing the data on a per serving basis? Think of and propose a better way given the available data.
  2. Redo the above boxplot using your new normalized metric from 9.

Do these results align with your expectations as to where you would find the sugary cereals?