M. Drew LaMar
February 1 & 2, 2017
if (number < 10) {
if (number < 5) {result <- "extra small"}
else {result <- "small"}
} else if (number < 100) {
result <- "medium"
} else {result <- "large"}
print(result)
if (number < 10) {
if (number < 5) {
result <- "extra small"
} else {
result <- "small"
}
} else if (number < 100) {
result <- "medium"
} else {
result <- "large"
}
print(result)
# NO
fred <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03q21YeastMutantGrowth.csv")
# YES
yeastData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03q21YeastMutantGrowth.csv")
TL;DR: If a variable is in the Global Environment, it DOES NOT mean that code chunks in R-Markdown can access it.
The Global Environment tells you what is available for code in the console only.
If you want to use a variable in R-Markdown, you must create or load the variable before attempting to access it in R-Markdown.
Finally, a variable created or loaded in a previous R-chunk is accessible in any subsequent R-chunk.
For Homework #3, it helps to know how to manipulate data in the following way:
Let's look at Example 3.2 (Spider running speed) from the book.
spiderData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03e2SpiderAmputation.csv")
head(spiderData)
spider speed treatment
1 1 1.25 before
2 2 2.94 before
3 3 2.38 before
4 4 3.09 before
5 5 3.41 before
6 6 3.00 before
head(spiderData)
spider speed treatment
1 1 1.25 before
2 2 2.94 before
3 3 2.38 before
4 4 3.09 before
5 5 3.41 before
6 6 3.00 before
str(spiderData)
'data.frame': 32 obs. of 3 variables:
$ spider : int 1 2 3 4 5 6 7 8 9 10 ...
$ speed : num 1.25 2.94 2.38 3.09 3.41 3 2.31 2.93 2.98 3.55 ...
$ treatment: Factor w/ 2 levels "after","before": 2 2 2 2 2 2 2 2 2 2 ...
Question: How do we compute and compare the average speed for spiders before and after pedipalp removal?
Let's first use selection vectors that we learned about from the Intro to R course. Start with “before” level…
spiderDataBefore <- spiderData[spiderData$treatment == "before",]
head(spiderDataBefore)
spider speed treatment
1 1 1.25 before
2 2 2.94 before
3 3 2.38 before
4 4 3.09 before
5 5 3.41 before
6 6 3.00 before
table(spiderDataBefore$treatment)
after before
0 16
Now “after” level…
spiderDataAfter <- spiderData[spiderData$treatment == "after",]
head(spiderDataAfter)
spider speed treatment
17 1 2.40 after
18 2 3.50 after
19 3 4.49 after
20 4 3.17 after
21 5 5.26 after
22 6 3.22 after
table(spiderDataAfter$treatment)
after before
16 0
Now let's calculate the mean of the speed variable of the resulting subsetted data frames.
mean(spiderDataBefore$speed)
[1] 2.668125
mean(spiderDataAfter$speed)
[1] 3.85375
More than one way to shear a sheep!! Subset the speed variable directly…
spiderSpeedBefore <- spiderData$speed[spiderData$treatment == "before"]
mean(spiderSpeedBefore)
[1] 2.668125
spiderSpeedAfter <- spiderData$speed[spiderData$treatment == "after"]
mean(spiderSpeedAfter)
[1] 3.85375
See the difference?
# spiderDataBefore is a subsetted data frame
spiderDataBefore <- spiderData[spiderData$treatment == "before",]
mean(spiderDataBefore$speed)
[1] 2.668125
# spiderSpeedBefore is a subsetted speed variable
spiderSpeedBefore <- spiderData$speed[spiderData$treatment == "before"]
mean(spiderSpeedBefore)
[1] 2.668125
Always know what data type objects in R are!!!!
Here's another way using the subset
function.
# Again, spiderDataBefore is a subsetted data frame
spiderDataBefore <- subset(spiderData, treatment == "before")
mean(spiderDataBefore$speed)
[1] 2.668125
# Same for spiderDataAfter
spiderDataAfter <- subset(spiderData, treatment == "after")
mean(spiderDataAfter$speed)
[1] 3.85375
General use:
"subsetted data frame"" <- subset("data frame", "subsetting condition")
tapply(spiderData$speed, spiderData$treatment, FUN = mean)
after before
3.853750 2.668125
General use:
tapply("numerical variable", "categorical variable", FUN = "statistical function")
In English: “Apply the statistical function
to the numerical variable
subsetted by each level of the categorical variable
.”
We'll learn a “better” way later in the course.