load(file="beer.RData")
beer$RandomNumber <- runif(nrow(beer), 0, 1) #to generate a random number uniformly distributed between 0 and 1
“nrow” in the line above returns the number of rows (i.e. number of observations) in beer, and it must be specified so that there is a distinct random number for each observation.
Now, create a rank variable, which tells us the ordering of observations based on RandomNumber. Since RandomNumber is random, the order given by rank is also random
beer$rank <- rank(beer$RandomNumber)
Let’s tag 10 observations out of the sample, either for a treatment or for an interview
beer$tag <- as.numeric(beer$rank <=10)
The tag takes the value of 1 if the rank is smaller or equal to 10, and 0 otherwise. Since rank is random, picking the first 10 observations (rank<=10) according to rank just picks 10 random observations from the sample. Now 10 can of course be replaced by any other number.
Suppose you have n treatments and you want an equal number of observations in each treatment. you can use the function for quantile procedure that you learned from the fourth week.
beer$CatRank <- cut(beer$rank, breaks=quantile(beer$rank, probs=seq(0,1, by=1/3)),include.lowest=TRUE) # call the new variable CatRank (for categories of rank)
summary(beer$CatRank)
## [1,71.3] (71.3,142] (142,212]
## 71 70 71