- The Basic Components of R and Rstudio
- Vectors and Basic Math
Ben Weinstein
What is the sum of (1 through 10 multiplied by two) plus (1 through 20 divided by three).
Bonus: What is the mean number of 1 through 10 and 1 through 20 when taken together.
Generating random samples from a normal distribution
It is often very useful to be able to generate a sample from a specific distribution. To generate a sample of size 100 from a standard normal distribution (with mean 0 and standard deviation 1) we use the rnorm function. We only have to supply the n (sample size) argument since mean 0 and standard deviation 1 are the default values for the mean and stdev arguments.
norm <- rnorm(100)
head(norm)
## [1] -0.99115 0.16671 -0.08847 1.16380 -0.53200 0.93675
R has many distributions in the base package, including all commonly used in biological analysis. Depending on the distribution, each function has its own set of parameter arguments. For example, the 'rpois' function is the random number generator for the Poisson distribution and it has only the parameter argument lambda. The rbinom function is the random number generator for the binomial distribution and it takes two arguments: size and prob. The size argument specifies the number of Bernoulli trials and the prob argument specifies the probability of a success for each trial.
For now, its sufficient to know that a possion distribution is commonly used for count data, and has only paramater lambda, which is both the expected mean and var
Generating a random sample from a Poisson distribution with lambda=3
For each of the distributions there are four functions which will generate fundamental quantities of a distribution. Let's consider the normal distribution as an example. We have already given examples of the rnorm() function which will generate a random sample from a specific normal distribution.
The dnorm() function will generate the density (or point) probability for a specific value for a normal distribution. This function is very useful for creating a plot of a density function of a distribution. In the list of the random number generator functions all the functions started with an "r", similarly the density functions for all the distributions all start with a "d".
The other two functions pnorm(), and qnorm() will be covered during Biometry.
Histograms are the most common univariate plot. Histograms place data into "bins", and count the number of data falling into each bin. Bins are usually plotted as bars, with the x range on the x axis, and count on the y axis.
# Draw a thousand random normal points
pts <- rnorm(1000)
hist(pts)
Draw 10 random normal points and plot a histogram, then 100, then 1000, what do you notice about the plot?
Explore atleast one other distribution, look up ?distributions *Hint remember to use the r-nameofdistribution function to pull random samples
Plot your new distribution and compare with your neighbor.
Draw 1000 random normals with a mean of 0 and a sd of 1. Look at the hist help screen. How do you specify the size of the bin range? Try making bins from -4 to 4, with intervals of .01, .1, and 1. Hint Consider using the seq() in the "breaks"" argument within hist().
x <- seq(0, 4, 0.01)
dens <- dnorm(x, 2, 0.5)
plot(x, dnorm(dens, 2, 0.5), type = "l")
The base package has an immense number of plotting tools, let's look at the plot help screen
In R, it's very easy to take a random sample of numbers with the sample() command. Suppose we want to take a random sample of 20 numbers from a vector of 100?
x <- 1:100
sample(x, 20)
## [1] 37 81 27 78 71 56 18 21 14 92 5 6 42 100 24 96 44
## [18] 19 38 97
x <- 1:100
sample(x, 20, replace = TRUE)
## [1] 27 82 2 89 28 68 99 22 76 1 61 11 70 59 9 53 36 66 52 36