Topic 5B: Study design with R

1 Simple random sampling: Generating random numbers

1.1

In Topic 3B, we learnt about simple random sampling. This is where we randomly select a sample from a larger population (the sampling frame). To facilitate this purpose, we can use a random number generator from a website such as https://www.random.org.

As we saw in the readings, the below random number generator will generate one random number from within a range. By simply clicking Generate, it will choose a random number between 1 and 100 (these are the default minimum and maximum values if we don’t specify them).

Suppose our population has 100 members with ID’s 1 to 100, and we wish to select a random sample of \(n = 10\). Use the below random number generator to randomly select 10 members. Write down their ID numbers as they are selected.

1.2

It is also possible to use computer programs such as R to generate random numbers. For example, consider the following code and the output it produces:

sample(1:100, size = 10, replace = FALSE)
##  [1] 68 39  1 34 87 43 14 82 59 51

This code can be understood as follows:

  • The sample function takes a random sample. Using this function, we can specify the size of the sample, and whether we want the sample to be with or without replacement

  • The first argument, 1:100, tells R the numbers from which we want to take a sample. The code 1:100 tells R we want to include numbers “1 to 100.” To see the vector this produces, we can run that part of the code by itself. Run the below code in R to see the vector it produces (you should see that it is simply a list of numbers from 1 to 100):

    1:100
  • The size = 10 argument tells R that we want to randomly select 10 numbers

  • The replace = FALSE argument tells R that we want to select this sample of 10 without replacement. This means that any given number can only get selected once. For a simple random sample, this is very important. Using the random number generator from Question 1.2, we were not able to specify this option.

  • Putting it all together, the code has produced the following random sample of 10 numbers: 68, 39, 1, 34, 87, 43, 14, 82, 59, 51.

1.3

Your turn: Supposing we have a population of 100 members with ID’s 1 to 100, run the below code in R to produce a random sample of \(n = 10\). Write down the numbers produced.

sample(1:100, size = 10, replace = FALSE)

1.4

Run the code a second time: it should produce a different list of 10 numbers now. Write down the new list of numbers produced.

1.5

Adapt your code to now select a random sample of \(n = 20\) from the population.

1.6

Adapt your code to now select a random sample of \(n = 50\) from a population of 400 members with ID’s from 101 to 500.

1.7

You may have noticed that every time we re-run the sample function, we get a different sample. It is possible to force the computer to select the same random sample in certain situations: this can be useful for reproducibility purposes. We can do this by using the set.seed function. For example, by setting the seed at 1 before taking the sample each time, we should get exactly the same results:

set.seed(1)
sample(1:100, size = 10, replace = FALSE)
##  [1] 68 39  1 34 87 43 14 82 59 51
set.seed(1)
sample(1:100, size = 10, replace = FALSE)
##  [1] 68 39  1 34 87 43 14 82 59 51

Try it for yourself.

*Note: we do not have to choose the number ‘1’ for the seed: we can in fact choose any number. This number simply tells the computer to start the randomisation process at a particular (arbitrary) point. It does not matter what that point is - what matters is that everything that follows is random.

2 Experimenal Design with R

As well as using R to help us randomly select a sample from a population, we can also use R to help with the experimental design. For example, suppose that as well as randomly selecting individuals from a population, we also wish to allocate them to groups. In this question, we will use an R package called agricolae to help us do that.

2.1

If you have not installed the agricolae package on your computer before, run the following code to install it:

install.packages("agricolae")

2.2

Run the following code to load the agricolae into your current R session:

library(agricolae)

2.3

Consider the Himalaya study (Bird et al. 2008) that we looked at in the Topic 4B readings. Suppose you are involved in the design of the study and that there will be a sample of \(n = 18\) subjects enrolled in the study. For a simplified design, we will randomly allocate each of the \(n = 18\) participants to one of the following treatment groups:

  • Himalaya
  • Refined cereal

Supposing there will be an equal number of 9 subjects in each group, we can picture the design like this:

As we can see, 9 subjects have been randomly allocated to the Himalaya group, and 9 to the refined cereal group.

In the Topic 4B readings, we considered four ways of managing confounding (restricting, blocking, analysing, and randomly allocating). Considering the above design and what we know so far, which one do you think we have used?

2.4

In R, we can use a function called design.crd from the agricolae package to allocate subjects randomly to each group as follows:

set.seed(1)
Groups <- c("Himalaya", "Refined cereal")
design <- design.crd(trt = Groups, r = 9)
head(design$book)
##   plots r         Groups
## 1   101 1 Refined cereal
## 2   102 1       Himalaya
## 3   103 2 Refined cereal
## 4   104 3 Refined cereal
## 5   105 2       Himalaya
## 6   106 3       Himalaya

This code can be understood as follows:

  • We have used the set.seed to ensure our design is reproducible later
  • We have created a vector called Groups which contains the names of the two groups
  • We have used two arguments in the design.crt function:
    • trt = Groups : this means that the function will know that the two groups are "Himalaya" and "Refined cereal" as specified in Groups
    • r = 9 : this means we want to have 9 “replicates” in each group
  • We have stored the results of the design.crd function in an object called design
  • The part of the object we want to see is called book : this is the part that shows us the allocation of subjects to groups. We have displayed the first six rows of the design using head(design$book).

Run the above code in your R session and make sure you understand it. If you have any questions, ask your computer lab demonstrator.

[Note that the plots column simply contains the ID’s of the subjects. The first subject’s ID is 101 and so on.]

2.5

Now suppose that instead of a sample of \(n = 18\) subjects, we will instead have \(n = 50\) subjects. Adapt your code to randomly allocate 25 subjects to each group. After you have done so, you can view the allocation of all 50 subjects by running the following code:

design$book

2.6

Now suppose that we will have a sample of \(n = 60\) subjects, and three treatment groups:

  • Himalaya
  • Refined cereal
  • Wholemeal wheat

Adapt your code to randomly allocate 20 subjects to each group.

3 Catching up

That’s it for this week! If you still have time, you can catch up on any of the questions from Computer Lab 3B or Computer Lab 4B if you need to.

References

Bird, Anthony R., Michelle S. Vuaran, Roger A. King, Manny Noakes, Jennifer Keogh, Matthew K. Morell, and David L. Topping. 2008. “Wholegrain Foods Made from a Novel High-Amylose Barley Variety (Himalaya 292) Improve Indices of Bowel Health in Human Subjects.” British Journal of Nutrition 99: 1032–40.


These notes have been prepared by Amanda Shaker. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.