Mike McCann
22-23 January 2015
When do we use parentheses? Brackets? Commas? Semi-colons?
What is the [sum of (1 through 10)] multiplied by two?
What is the 12th and 45th position of seq(1,43,0.25)?
It is often useful to generate a sample from a specific statisical distribution.
Generating random samples from a normal distribution
To generate a sample of size 100 from a standard normal distribution (with mean 0 and standard deviation 1) we use the rnorm() function.
norm <- rnorm(100, mean=0, sd=1)
head(norm)
[1] 2.0413082 -0.5788982 0.9451796 0.5947909 -0.3119882 2.9333134
rnorm() function in help screen. Locate the three arguments we used.rpois() function is the random number generator for the Poisson distribution and it has only the parameter lambda. rbinom() function is the random number generator for the binomial distribution and it has two parameters: size and prob. The size argument specifies the number of Bernoulli trials and the prob argument specifies the probability of a success for each trial.BEE552 Students: Heather will pick up from here.
a.b.For each distribution there are four functions which will generate fundamental quantities of a distribution.
Let's consider the normal distribution as an example.
rnorm() a random sample from a specific normal distribution.dnorm() the density probability for a specific value for a normal distribution.pnorm() the distribution functionqnorm() the quantile functionpnorm(), and qnorm() will be covered during Biometry. They are less commonly used.
Histograms are a common univariate plot.
Histograms place data into “bins”, and count the number of data falling into each bin.
Bins are usually plotted as bars, with the x range on the x axis, and count on the y axis.
# Draw a thousand random normal points
pts <- rnorm(1000)
hist(pts)
Histograms are an effective way of visualizing distributions
Draw 10 random normal points and plot a histogram, then 100, then 1000. What do you notice about the plot?
Explore at least one other distribution, look up ?distributions. Hint: remember to use the r-nameofdistribution function to take random samples.
Plot your new distribution and share with your neighbor.
Draw 1000 random normals with a mean of 0 and a sd of 1. Look at the hist help screen. How do you specify the size of the bin range? Try making bins from -4 to 4, with intervals of 0.01, 0.1, and 1. Hint: Consider using seq() in the “breaks”“ argument within hist().
x <- seq(0,4,0.01)
dens <- dnorm(x, 2, 0.5)
plot(x, dens, type = "l")
Another option is to plot the distribution not in terms of raw counts, but in terms of density, so the histogram sums to 1.
x <- rnorm(100, mean=0, sd=2)
hist(x, freq=FALSE)
In R, it's very easy to take a random sample of numbers with the sample() command.
Take a random sample of 20 numbers from a vector of 1 to 100.
x <- 1:50
sample(x, 20)
[1] 30 42 3 2 45 16 8 37 24 10 36 21 33 11 26 39 49 15 13 35
x <- 1:50
sample(x, 20, replace=TRUE)
[1] 37 46 3 46 7 15 40 43 3 27 35 19 42 21 49 30 22 12 11 29
Scatterplots are useful for showing the relationship between two variables
x <- rnorm(n=100, mean=5, sd=0.05)
y <- x * rnorm(n=100, mean=1, sd=0.01)
plot(y~x)
Instead of writing the relationship as a formula i.e., y~x
You can write plot(x,y) where x and y are separated by a comma
plot(x,y)
You can add straight lines with abline()
plot(x,y)
abline(a=0, b=1, col="red")
a specifies the intercept. b the slope.
plot(x,y)
abline(lm(y~x), col="red")
lm() fits a linear relationship between x and y.