Picking observations for treatment or for inclusion in an interview sample

beer$RandomNumber <- runif(nrow(beer), 0, 1) #to generate a random number uniformly distributed between 0 and 1

“nrow” in the line above returns the number of rows (i.e. number of observations) in beer, and it must be specified so that there is a distinct random number for each observation.

Now, create a rank variable, which tells us the ordering of observations based on RandomNumber. Since RandomNumber is random, the order given by rank is also random

beer$rank <- rank(beer$RandomNumber)

Let’s tag 10 observations out of the sample, either for a treatment or for an interview

beer$tag <- as.numeric(beer$rank <=10)

The tag takes the value of 1 if the rank is smaller or equal to 10, and 0 otherwise. Since rank is random, picking the first 10 observations (rank<=10) according to rank just picks 10 random observations from the sample. Now 10 can of course be replaced by any other number.

Picking Observations for multiple treatments

Suppose you have n treatments and you want an equal number of observations in each treatment. you can use the function for quantile procedure that you learned from the fourth week.

beer$CatRank <- cut(beer$rank, breaks=quantile(beer$rank, probs=seq(0,1, by=1/3)),include.lowest=TRUE) # call the new variable CatRank (for categories of rank)
summary(beer$CatRank)

##   [1,71.3] (71.3,142]  (142,212] 
##         71         70         71

Picking observations for treatments

Seongho An

9/26/2017

Picking observations for treatment or for inclusion in an interview sample

Picking Observations for multiple treatments