Lab #3: R-Markdown and Programming

M. Drew LaMar
February 1 & 2, 2017

Good Programming Practices

When creating a script (or R-Markdown), add a new command, knit the document to check, add another new command, knit, etc.
Use indentation and lines - they are your friends. For example,

NO

if (number < 10) {
if (number < 5) {result <- "extra small"} 
else {result <- "small"}
} else if (number < 100) {
result <- "medium"
} else {result <- "large"}
print(result)

Good Programming Practices

When creating a script (or R-Markdown), add a new command, knit the document to check, add another new command, knit, etc.
Use indentation - it is your friend. For example,

YES

if (number < 10) {
  if (number < 5) {
    result <- "extra small"
  } else {
    result <- "small"
  }
} else if (number < 100) {
  result <- "medium"
} else {
  result <- "large"
}
print(result)

Good Programming Practices

When creating a script (or R-Markdown), add a new command, knit the document to check, add another new command, knit, etc.
Use indentation - it is your friend.
Use smart variable names!!!!

# NO
fred <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03q21YeastMutantGrowth.csv")

# YES
yeastData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03q21YeastMutantGrowth.csv")

Understanding Memory and Environments

TL;DR: If a variable is in the Global Environment, it ~~DOES NOT~~ mean that code chunks in R-Markdown can access it.

The Global Environment tells you what is available for code in the console only.

If you want to use a variable in R-Markdown, you must create or load the variable before attempting to access it in R-Markdown.

Finally, a variable created or loaded in a previous R-chunk is accessible in any subsequent R-chunk.

Data Manipulation 101

For Homework #3, it helps to know how to manipulate data in the following way:

How to get a subset of observations in a data frame
How to compute a statistic over levels of a categorical variable

Data Manipulation 101

Let's look at Example 3.2 (Spider running speed) from the book.

spiderData <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter03/chap03e2SpiderAmputation.csv")
head(spiderData)

  spider speed treatment
1      1  1.25    before
2      2  2.94    before
3      3  2.38    before
4      4  3.09    before
5      5  3.41    before
6      6  3.00    before

Data Manipulation 101

head(spiderData)

  spider speed treatment
1      1  1.25    before
2      2  2.94    before
3      3  2.38    before
4      4  3.09    before
5      5  3.41    before
6      6  3.00    before

str(spiderData)

'data.frame':   32 obs. of  3 variables:
 $ spider   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ speed    : num  1.25 2.94 2.38 3.09 3.41 3 2.31 2.93 2.98 3.55 ...
 $ treatment: Factor w/ 2 levels "after","before": 2 2 2 2 2 2 2 2 2 2 ...

Question: How do we compute and compare the average speed for spiders before and after pedipalp removal?

Data Manipulation 101

How to get a subset of observations in a data frame

Let's first use selection vectors that we learned about from the Intro to R course. Start with “before” level…

spiderDataBefore <- spiderData[spiderData$treatment == "before",]
head(spiderDataBefore)

  spider speed treatment
1      1  1.25    before
2      2  2.94    before
3      3  2.38    before
4      4  3.09    before
5      5  3.41    before
6      6  3.00    before

table(spiderDataBefore$treatment)


 after before 
     0     16

Data Manipulation 101

How to get a subset of observations in a data frame

Now “after” level…

spiderDataAfter <- spiderData[spiderData$treatment == "after",]
head(spiderDataAfter)

   spider speed treatment
17      1  2.40     after
18      2  3.50     after
19      3  4.49     after
20      4  3.17     after
21      5  5.26     after
22      6  3.22     after

table(spiderDataAfter$treatment)


 after before 
    16      0

Data Manipulation 101

How to get a subset of observations in a data frame

Now let's calculate the mean of the speed variable of the resulting subsetted data frames.

mean(spiderDataBefore$speed)

[1] 2.668125

mean(spiderDataAfter$speed)

[1] 3.85375

Data Manipulation 101

How to get a subset of observations in a data frame

More than one way to shear a sheep!! Subset the speed variable directly…

spiderSpeedBefore <- spiderData$speed[spiderData$treatment == "before"]
mean(spiderSpeedBefore)

[1] 2.668125

spiderSpeedAfter <- spiderData$speed[spiderData$treatment == "after"]
mean(spiderSpeedAfter)

[1] 3.85375

Data Manipulation 101

How to get a subset of observations in a data frame

See the difference?

# spiderDataBefore is a subsetted data frame
spiderDataBefore <- spiderData[spiderData$treatment == "before",] 
mean(spiderDataBefore$speed)

[1] 2.668125

# spiderSpeedBefore is a subsetted speed variable
spiderSpeedBefore <- spiderData$speed[spiderData$treatment == "before"] 
mean(spiderSpeedBefore)

[1] 2.668125

~~Always know what data type objects in R are!!!!~~

Data Manipulation 101

How to get a subset of observations in a data frame

Here's another way using the subset function.

# Again, spiderDataBefore is a subsetted data frame
spiderDataBefore <- subset(spiderData, treatment == "before")
mean(spiderDataBefore$speed)

[1] 2.668125

# Same for spiderDataAfter
spiderDataAfter <- subset(spiderData, treatment == "after")
mean(spiderDataAfter$speed)

[1] 3.85375

General use:

"subsetted data frame"" <- subset("data frame", "subsetting condition")

Data Manipulation 101

How to compute a statistic over levels of a categorical variable

tapply(spiderData$speed, spiderData$treatment, FUN = mean)

   after   before 
3.853750 2.668125

General use:

tapply("numerical variable", "categorical variable", FUN = "statistical function")

In English: “Apply the statistical function to the numerical variable subsetted by each level of the categorical variable.”

We'll learn a “better” way later in the course.