Worksheet 2 - Probability & Distributions

Understanding your data is a critical step in analysis. Describe in words the following data types. Is it numeric? Continious? Bounded at 0?

Don’t yet worry about the distributions, the point is to recognize fundamental characteristics of data.

Data Structure

  1. Draw 1000 random normal points with mean of 0 and sd of 1
  1. Draw 1000 random Poisson points lambda=1
  1. Draw 1000 random binomial points with prob =.5 (coin flip,n=1)

Univariate Plotting

  1. Create a histogram of each distribution above.
  2. Bin the histogram into fewer sections (e.g., 5). See ?hist

Fitting Distributions

Using curve(), fit a histogram of your data with a distribution curve

Following the example.

x<-rnorm(100)
head(x)
## [1]  1.74637874 -0.01461965  1.40216756 -0.35058284  0.65140411 -1.50337140
hist(x,prob=TRUE)
curve(dnorm(x,mean=mean(x),sd=sd(x)),add=TRUE,col="red")

What does the prob=TRUE argument do in histogram? Why is it needed?

  1. Fit questions 2 and 3 to their respective distributions.
  2. Plot all distributions together, using the add=TRUE parameters. Color them seperately and make note of which distribution is which color.