September 5, 2025
TL;DR: If a variable is in the Global Environment, it DOES NOT mean that code chunks in a Quarto document can access it.
The Global Environment tells you what is available for code in the console only.
If you want to use a variable in Quarto, you must create or load the variable before attempting to access it in Quarto.
Finally, a variable created or loaded in a previous R-chunk is accessible in any subsequent R-chunk.
For Homework #2, it helps to know how to manipulate data in the following way:
Let’s look at Example 3.2 (Spider running speed) from the book.
spider speed treatment
1 1 1.25 before
2 2 2.94 before
3 3 2.38 before
4 4 3.09 before
5 5 3.41 before
6 6 3.00 before
'data.frame': 32 obs. of 3 variables:
$ spider : int 1 2 3 4 5 6 7 8 9 10 ...
$ speed : num 1.25 2.94 2.38 3.09 3.41 3 2.31 2.93 2.98 3.55 ...
$ treatment: chr "before" "before" "before" "before" ...
Question: How do we compute and compare the average speed for spiders before and after pedipalp removal?
How to get a subset of observations in a data frame
Let’s first use selection vectors. Start with “before” level…
How to get a subset of observations in a data frame
Now “after” level…
How to get a subset of observations in a data frame
Now let’s calculate the mean of the speed variable of the resulting subsetted data frames.
How to get a subset of observations in a data frame
More than one way to shear a sheep!! Subset the speed variable directly…
How to get a subset of observations in a data frame
See the difference?
# spiderDataBefore is a subsetted data frame
spiderDataBefore <- spiderData[spiderData$treatment == "before",]
mean(spiderDataBefore$speed)
[1] 2.668125
# spiderSpeedBefore is a subsetted speed variable
spiderSpeedBefore <- spiderData$speed[spiderData$treatment == "before"]
mean(spiderSpeedBefore)
[1] 2.668125
Always know what data type objects in R are!!!!
How to get a subset of observations in a data frame
Here’s another way using the subset
function.
# Again, spiderDataBefore is a subsetted data frame
spiderDataBefore <- subset(spiderData, treatment == "before")
mean(spiderDataBefore$speed)
[1] 2.668125
# Same for spiderDataAfter
spiderDataAfter <- subset(spiderData, treatment == "after")
mean(spiderDataAfter$speed)
[1] 3.85375
General use:
"subsetted data frame"" <- subset("data frame", "subsetting condition")
How to compute a statistic over levels of a categorical variable
General use:
tapply("numerical variable", "categorical variable", FUN = "statistical function")
In English: “Apply the statistical function
to the numerical variable
subsetted by each level of the categorical variable
.”
We’ll learn a “better” way later in the course.
Introduction to Biostatistics, Fall 2025