First, save the Rmd file in a folder with the rest of your course work. Put the file cereal.csv
in the same folder. Go to Session > Set Working Directory > To Source File Location. Now you may run the below code that will load the data.
cereal <- read.csv("cereal.csv")
attach(cereal)
The object cereal
is a data frame with 77 rows and 16 columns. The 77 cereals have information recordered about their nutrients on a per serving basis. Information is also given about the number cups in one serving. Full details on the variable names are given below.
The R command attach(cereal)
allows you to access each vector in the data frame by simply typing the variable’s name in R. For example,
potass
[1] 280 135 320 330 -1 70 30 100 125 190 35 105 45 105 55 25 35
[18] 20 65 160 -1 30 120 80 30 25 100 200 190 25 40 45 85 90
[35] 100 45 90 35 60 95 40 95 55 95 170 170 160 90 40 130 90
[52] 120 260 45 15 50 110 110 240 140 110 30 35 95 140 120 40 55
[69] 90 35 230 110 60 25 115 110 60
displays the vector of potassium values from the cereal data frame.
Objects and variables are created with the assignment operator, <-
or =
. Remeber that in RStudio you can use the shortcut Alt + -
to create the <-
assignment operator. The shortcut includes convenient spacing before and after the assignment operator. While the equal sign will work, it is best practice to stick to using <-
.
Let’s look at the assignment operator in action. We can compute the protein to fat ratio per serving by
protein.to.fat <- protein / fat
head(protein.to.fat)
[1] 4.0 0.6 4.0 Inf 1.0 1.0
The protein to carbohydrate ratio can be computed as
protein.to.carb <- protein / carbo
head(protein.to.carb)
[1] 0.8000000 0.3750000 0.5714286 0.5000000 0.1428571 0.1904762
Try to be descriptive and concise with your variable names. For example, protein.to.fat
is better than ptf
.
Comparisons are binary operators; they take two objects and give a boolean (TRUE or FALSE) response.
Command | Description |
---|---|
> |
greater than |
< |
less than |
>= |
greater than or equal to |
<= |
less than or equal to |
== |
equal to (recall that = is for assignment and not checking equality) |
!= |
not equal to |
& |
and (ex: (5 > 7) & (6*7 == 42) will return the value FALSE) |
| |
or (ex: (5 > 7) | (6*7 == 42) will return the value TRUE) |
The function c
combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed.
To create a vector of names and numbers consider
school <- c("MSU", "UM", "OSU", "PSU")
number <- c(50085, 43625, 58322, 45518)
school
[1] "MSU" "UM" "OSU" "PSU"
number
[1] 50085 43625 58322 45518
Vectors can be subset by position or name (when applicable). For example, consider the vector x
, where
x <- letters[1:10]
x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
x[5]
[1] "e"
x[c(1, 5, 9)]
[1] "a" "e" "i"
x[-c(4, 6, 10)]
[1] "a" "b" "c" "e" "g" "h" "i"
x[11]
[1] NA
As another example,
grades <- c(99, 85, 89, 92)
names(grades) <- c("STT", "CMSE", "MTH", "STT")
grades
STT CMSE MTH STT
99 85 89 92
grades["STT"]
STT
99
grades["CMSE"]
CMSE
85
grades[4] <- 82
grades[4]
STT
82
Assume that vec
is a vector of appropriate variable type.
Command | Description |
---|---|
sum(vec) |
sums up all the elements of vec |
mean(vec) |
mean of vec |
median(vec) |
median of vec |
min(vec), max(vec) |
the smallest or largest element of vec |
sd(vec), var(vec) |
the standard deviation and variance of vec |
length(vec) |
the number of components in vec |
sort(vec) |
returns the vec in ascending or descending order |
order(vec) |
returns the index that sorts vec |
unique(vec) |
lists the unique elements of vec |
summary(vec) |
computes the five-number summary |
Use cereal
to answer the questions that follow. You may find some of the above functions helpful. Use a single code chunk to answer each question. Below is a helpful example on subsetting a vector with a logical vector.
x <- c(1, 4, -5, sqrt(3), exp(2))
y <- c(T, F, T, T, F)
x[y]
[1] 1.000000 -5.000000 1.732051
cereal
?Where can I find the sugary cereals?
Comparison box plots would be better.
boxplot(sugars ~ shelf, xlab = "Shelf", ylab = "Sugar per serving (grams)")
Do these results align with your expectations as to where you would find the sugary cereals?