DATA151: Basics of Programming

Learning Objective:

In this session students will learn how to create an R markdown document that basics of programming through a case study of creating a randomized experiment.

How to create an R Markdown document: .Rmd
- Creating headers at different levels to organize and navigate the document
- Creating and working with code chunks
Basics of programming:
- Creating a sequence of consecutive integers
- Creating and working with matrices: matrix()
  - Matrix indexing
- Creating random values from R: sample()
- Working with for loops
- Using the %in% operator to define group membership
- Using conditional statements: if()
- How to concatenate values together to form a vector: c()

The Motivation

Consider the set up:

In a greenhouse experiment we want to study a single factors (fertilizer) with 4 levels
We have enough space for 24 experimental units (a potted plant)
To maintain balance in the experiment, we will have 6 replications of each treatment

1. Creating ID’s

We want to give each experimental unit an ID. Since ultimately we arae going to randomly draw from this list of IDs we can just assign number IDs from 1 to 24. This can be done by using the colon (:). The syntax for the colon function is starting integer : ending integer.

# STEP 1: Giving Id's to Experimental Units
# recall in our example from class there were 24 plants (experimental units)
# the colon is a way to create a vector of consecutive integers

ids<-1:24
# note that we are storing the output as a vector
ids

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Learning by doing:

Q1: What happens when you go from a bigger number to a smaller number?

# INSERT CODE HERE #

Q2: What happens when you go have a postitive and negative number?

# INSERT CODE HERE #

2. Organizing Your Lab Experiment

In the example we saw, the lab bench had 4 rows and 6 columns. We can use the matrix function to organize all our experimental units. Matrices have two dimensions, rows and columns (in that order). A matrix is a very useful way to store numbers there are also special mathematical operations that can be performed on matrices.

Start by reading the documentation about matrices:

# let's read the documentation about matrix function to learn about it arguments
?matrix

# the inputs of this function are the data, nrow, and ncol

Now let’s organize our experimental units:

## STEP 2: Organizing the experimental units into rows and columns 
# in the example we saw, the lab bench had 4 rows and 6 columns
labBench<-matrix(ids, nrow=4, ncol=6)
labBench

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    5    9   13   17   21
## [2,]    2    6   10   14   18   22
## [3,]    3    7   11   15   19   23
## [4,]    4    8   12   16   20   24

# this is a matrix!

Learning by doing:

What would happen if the number of columns was omitted? (ie don’t include ncol)

# INSERT CODE HERE #

What if product of the dimensions nrow \(\times\) ncol is not equal to the length of the data?

# INSERT CODE HERE #

3. Randomly Assigning Treatments

We want to randomly assign treatments to our experimental units to avoid confounding. We can use R to help us with this task.

The simplest form of an experimental design is a Completely Randomized Design. In this design we choose ID’s and randomly assign to treatments. In order to choose which IDs will go in which treatments, we can use the sample() function.

First, let’s learn about the sample function:

?sample

Learning by doing: `sample`

Let’s try it!

crd_samp<-sample(ids, replace=FALSE)
crd_samp

##  [1]  3 18 13 16  1 19  5 11  9 10  7 21 24 23  4 15 14 17 22  2  6 20 12  8

What happened?

ANSWER HERE:

Is the order of numbers that same as your neighbor’s?

ANSWER HERE:

Now let’s try assigning our IDs to treatments using a matrix. Here nrow will correspond to the number of treatment.

Let’s say that

Row 1 = Treatment A
Row 2 = Treatment B
Row 3 = Treatment C
Row 4 = Treatment D (Control)

A. Completely Randomized Design

## Completely randomized design
## choose ID's and randomly assign to treatments

# nrow will be the number of treatments
crd_mat<-matrix(crd_samp, nrow=4)
crd_mat

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    3    1    9   24   14    6
## [2,]   18   19   10   23   17   20
## [3,]   13    5    7    4   22   12
## [4,]   16   11   21   15    2    8

# we can also rename the rows
rownames(crd_mat)<-c("Treat A", "Treat B", "Treat C", "Treat D")

crd_mat

##         [,1] [,2] [,3] [,4] [,5] [,6]
## Treat A    3    1    9   24   14    6
## Treat B   18   19   10   23   17   20
## Treat C   13    5    7    4   22   12
## Treat D   16   11   21   15    2    8

Making an Experiment Map!

We can use the matrix from the previous step to know which experimental units are in their respective treatments; however, it might be easier if we made a map that showed where their treatments were located. To accomplish this task we’ll need to understand for loops, the %in% operator, and conditions.

A. `for` loops

For loops can be used to repeat the same basic task over and over again. Check this one out! What is it doing?

## FOR LOOPS
for(i in 1:5){
  print(i)
}

## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

B. `%in%` operator

This operator will identify if the element specified on the left hand side is contained by the set specified on the right hand side,\(A \subset B\).

## %in% operator
1 %in% c(1, 2, 3)

## [1] TRUE

5 %in% c(1, 2, 3)

## [1] FALSE

Bringing it all together!

This is a little complicated, so I will provide the code. Let’s pay careful attention to how each piece is working together.

## Making a map of this design
## here we will learn to create a loop and how to write conditionals
treats<-matrix(nrow=24) # create an empty vector with 24 spaces so we can hold the treatments later
for(i in 1:24){
  if(i %in% crd_mat[1,]){
    treats[i]<-"A"
  }
  if(i %in% crd_mat[2,]){
    treats[i]<-"B"
  }
  if(i %in% crd_mat[3,]){
    treats[i]<-"C"
  }
  if(i %in% crd_mat[4,]){
    treats[i]<-"D"
  }
}

treats

##       [,1]
##  [1,] "A" 
##  [2,] "D" 
##  [3,] "A" 
##  [4,] "C" 
##  [5,] "C" 
##  [6,] "A" 
##  [7,] "C" 
##  [8,] "D" 
##  [9,] "A" 
## [10,] "B" 
## [11,] "D" 
## [12,] "C" 
## [13,] "C" 
## [14,] "A" 
## [15,] "D" 
## [16,] "D" 
## [17,] "B" 
## [18,] "B" 
## [19,] "B" 
## [20,] "B" 
## [21,] "D" 
## [22,] "C" 
## [23,] "B" 
## [24,] "A"

## make the map!
expDes<-matrix(treats, nrow=4)
expDes

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "A"  "C"  "A"  "C"  "B"  "D" 
## [2,] "D"  "A"  "B"  "A"  "B"  "C" 
## [3,] "A"  "C"  "D"  "D"  "B"  "B" 
## [4,] "C"  "D"  "C"  "D"  "B"  "A"

Food for Thought:

How does this look?
Does it appear “random”?

Note: We might be tempted to think that just because there are clusters of the same treatment together that the design is not random; however, we used a random mechanism to assign IDs to treatment. Using a random mechanism does not guarantee that there won’t be these clusters.

Matrix Indexing

We have been using matrices a lot! When working with matrices its a good idea to know can we can call specific subsets within the matrix. Every cell of a matrix has an address which is defined by the row and column that its in.

In the following examples we will see how this can be used:

Learning by doing!

What treatments is to be assigned in row 3 column 2?

# EXAMPLE: 
expDes[3,2]

## [1] "C"

What treatments are to be assigned in row 3?

# INSERT CODE HERE #

What treatments are to be assigned in column 4?

# INSERT CODE HERE #

If a cat knocked over the plants in the 4th row what would the rest of my experiment look like?

# INSERT CODE HERE #

B. Blocked Design

If we know that there is a natural gradient or more homogeneous subgroups within our experiment we might consider blocking to improve our design. The hallmark of a randomized complete block design is that every treatment must be present in ever block. This allows us to avoid confounding treatment with block.

In this example columns are used as blocks. We will randomly assign where the treatments are placed within which block.

To do this we will see another way to store the out from our loop. We will do it this time with concatenating out loop output.

output<-c() # start will an empty list
for(i in 1:6){
  thisOut<-i
  output<-c(output, i) # the new output vector is the old output vector plus the new observation
  print(output)
}

## [1] 1
## [1] 1 2
## [1] 1 2 3
## [1] 1 2 3 4
## [1] 1 2 3 4 5
## [1] 1 2 3 4 5 6

Let’s do this with our experiment!

## STEP: Blocked Design
## example: if there is a gradient across columns there are 6 blocks

# we will learn how to concatenate here
# start with an empty list
blockTreats<-c()
for(i in 1:6){
  thisSample<-sample(c("A", "B", "C", "D"), replace=FALSE)
  blockTreats<-c(blockTreats, thisSample)
}
  
blockDes<-matrix(blockTreats, nrow=4)
blockDes

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "B"  "C"  "A"  "C"  "D"  "B" 
## [2,] "D"  "D"  "B"  "D"  "A"  "A" 
## [3,] "A"  "B"  "D"  "A"  "C"  "C" 
## [4,] "C"  "A"  "C"  "B"  "B"  "D"

DATA151: Basics of Programming

Kitada Smalley

Learning Objective:

The Motivation

1. Creating ID’s

Learning by doing:

2. Organizing Your Lab Experiment

Learning by doing:

3. Randomly Assigning Treatments

Learning by doing: sample

A. Completely Randomized Design

Making an Experiment Map!

A. for loops

B. %in% operator

Bringing it all together!

Food for Thought:

Matrix Indexing

Learning by doing!

B. Blocked Design

Learning by doing: `sample`

A. `for` loops

B. `%in%` operator